Saturday, October 5, 2013

Setup up DNS Server for SCAN IP for 11gr2 Grid (11.2)


Setup DNS Server and configure scan ips and modify the scan details in 11g grid (11.2)
======================================================================================

In this post we are going to discuss about below

 *  How to setup the yum installer.
 *  How to Configure DNS server.
 *  How to add the Client server details in the DNS Server.
 *  How to Modify the SCAN IP details in the 11gr2 Grid Infrastructure.

OS -  RHEL 5.7

Prepare Yum Install.
===================


1. Mount the RHEL ISO DVD on the server.

2. [root@standalone2 media]# mount /dev/cdrom /mnt
mount: block device /dev/cdrom is write-protected, mounting read-only
[root@standalone2 media]# cd /mnt

3. Install the FTP Server.

[root@standalone2 Server]# ls -lrt vsf*
-r--r--r-- 75 root root 143483 May 24  2011 vsftpd-2.0.5-21.el5.x86_64.rpm
[root@standalone2 Server]# rpm -ivh vsftpd-2.0.5-21.el5.x86_64.rpm
warning: vsftpd-2.0.5-21.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID 37017186
Preparing...                ########################################### [100%]
   1:vsftpd                 ########################################### [100%]


4. Copy the files under Server / images directory and RPM-GPG-KEY files to /var/ftp/pub directory.


[root@standalone2 Server]# cp -av /mnt/Server /var/ftp/pub/

[root@standalone2 Server]# cp -av /mnt/images /var/ftp/pub/

[root@standalone2 Server]# cp -av /mnt/RPM-GPG-KEY* /var/ftp/pub/


5. Install the create repository package.

[root@standalone2 ~]# cd /var/ftp/pub/Server/
[root@standalone2 Server]# rpm -ivh createrepo-0.4.11-3.el5.noarch.rpm
warning: createrepo-0.4.11-3.el5.noarch.rpm: Header V3 DSA signature: NOKEY, key ID 37017186
Preparing...                ########################################### [100%]
   1:createrepo             ########################################### [100%]


6. Create a Repository for the /var/ftp/pub directory

[root@standalone2 Server]# createrepo -v /var/ftp/pub

[root@standalone2 Server]# createrepo -g /var/ftp/pub/Server/repodata/comps-rhel5-server-core.xml /var/ftp/pub/

[root@standalone2 Server]# yum clean all
Loaded plugins: rhnplugin, security
Cleaning up Everything

7. Create an Repository file with below contents.

[root@standalone2 Server]# vi /etc/yum.repos.d/Server.repo

[ser]
name=standalone2.manzoor.com
baseurl=file:///var/ftp/pub
enabled=1
gpgcheck=0

8. Check yum installer tool by uninsalling and reinstalling a package


[root@standalone2 Server]# yum remove telnet
Loaded plugins: rhnplugin, security
This system is not registered with RHN.
RHN support will be disabled.
Setting up Remove Process
Resolving Dependencies
--> Running transaction check
---> Package telnet.x86_64 1:0.17-39.el5 set to be erased
--> Finished Dependency Resolution

Dependencies Resolved

=============================================================================================================================================================
 Package                            Arch                               Version                                   Repository                             Size
=============================================================================================================================================================
Removing:
 telnet                             x86_64                             1:0.17-39.el5                             installed                             105 k

Transaction Summary
=============================================================================================================================================================
Remove        1 Package(s)
Reinstall     0 Package(s)
Downgrade     0 Package(s)

Is this ok [y/N]: y
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
  Erasing        : telnet                                                                                                                                1/1

Removed:
  telnet.x86_64 1:0.17-39.el5

Complete!


[root@standalone2 Server]# yum install telnet
Loaded plugins: rhnplugin, security
This system is not registered with RHN.
RHN support will be disabled.
Server                                                                                                                                | 1.1 kB     00:00
Server/primary                                                                                                                        | 1.1 MB     00:00
Server                                                                                                                                             3261/3261
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package telnet.x86_64 1:0.17-39.el5 set to be updated
--> Finished Dependency Resolution

Dependencies Resolved

=============================================================================================================================================================
 Package                            Arch                               Version                                      Repository                          Size
=============================================================================================================================================================
Installing:
 telnet                             x86_64                             1:0.17-39.el5                                Server                              60 k

Transaction Summary
=============================================================================================================================================================
Install       1 Package(s)
Upgrade       0 Package(s)

Total download size: 60 k
Is this ok [y/N]: y
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : telnet                                                                                                                                1/1

Installed:
  telnet.x86_64 1:0.17-39.el5

Complete!


[root@standalone2 Server]# yum update




======= yum configuration completed =========================



DNS Server Configuration
========================


1) Install the necessary rpm (bind packages)  which are required to configure DNS Server.


[root@standalone2 ~]# yum install -y *bind* caching-nameserver


2) Notedown the Public IP address of the Server.

[root@standalone2 ~]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:0C:29:86:F8:24
          inet addr:192.168.0.30  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe86:f824/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:26338 errors:0 dropped:0 overruns:0 frame:0
          TX packets:40786 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1764870 (1.6 MiB)  TX bytes:8763994 (8.3 MiB)


IP Address - 192.168.0.30

3) Modify the named.conf coniguration files

[root@standalone2 ~]# cd /var/named/chroot/etc/

[root@standalone2 etc]# ls -lrt
total 16
-rw-r----- 1 root named  955 Dec  2  2010 named.rfc1912.zones
-rw-r----- 1 root named 1230 Dec  2  2010 named.caching-nameserver.conf
-rw-r--r-- 1 root root  2819 Oct 13  2012 localtime
-rw-r----- 1 root named  113 Oct  4 21:52 rndc.key


[root@standalone2 etc]# cp named.caching-nameserver.conf named.conf

[root@standalone2 etc]# vi named.conf

# edit the named.conf file...


Modify the below lines... 


Before Modification
===================

        listen-on port 53 { 127.0.0.1; };
        listen-on-v6 port 53 { ::1; };

        allow-query     { localhost; };
        allow-query-cache { localhost; };

        match-clients      { localhost; };
        match-destinations { localhost; };


After Modificaton
=================



        listen-on port 53 { 192.168.0.30; };
#       listen-on-v6 port 53 { ::1; };


        allow-query     { any; };
        allow-query-cache { any; };

        match-clients      { any; };
        match-destinations { 192.168.0.30; };



[root@standalone2 etc]# ls -lrt
total 20
-rw-r----- 1 root named  955 Dec  2  2010 named.rfc1912.zones
-rw-r----- 1 root named 1230 Dec  2  2010 named.caching-nameserver.conf
-rw-r--r-- 1 root root  2819 Oct 13  2012 localtime
-rw-r----- 1 root named  113 Oct  4 21:52 rndc.key
-rw-r----- 1 root root  1219 Oct  4 22:46 named.conf


4. Edit the zones files.


[root@standalone2 etc]# vi named.rfc1912.zones

# Now edit the zone file


Mofiy the below lines.


Before Modification.
====================


zone "localdomain" IN {
file "localdomain.zone";



zone "0.0.127.in-addr.arpa" IN {
file "named.local";



After Modification
===================

zone "manzoor.com" IN {
file "forward.zone";

zone "0.168.192.in-addr.arpa" IN {
file "reverse.zone";


[root@standalone2 etc]# chgrp named named.conf

[root@standalone2 etc]# ls -lrt
total 20
-rw-r----- 1 root named 1230 Dec  2  2010 named.caching-nameserver.conf
-rw-r--r-- 1 root root  2819 Oct 13  2012 localtime
-rw-r----- 1 root named  113 Oct  4 21:52 rndc.key
-rw-r----- 1 root named 1219 Oct  4 22:46 named.conf
-rw-r----- 1 root named  954 Oct  4 23:20 named.rfc1912.zones


[root@standalone2 etc]# cd /var/named/chroot/var/named

[root@standalone2 named]# ls -lrt
total 36
drwxrwx--- 2 named named 4096 Jul 27  2004 slaves
drwxrwx--- 2 named named 4096 Aug 25  2004 data
-rw-r----- 1 root  named  427 Dec  2  2010 named.zero
-rw-r----- 1 root  named  426 Dec  2  2010 named.local
-rw-r----- 1 root  named  424 Dec  2  2010 named.ip6.local
-rw-r----- 1 root  named 1892 Dec  2  2010 named.ca
-rw-r----- 1 root  named  427 Dec  2  2010 named.broadcast
-rw-r----- 1 root  named  195 Dec  2  2010 localhost.zone
-rw-r----- 1 root  named  198 Dec  2  2010 localdomain.zone


-- Before in the zone file we have changed the localdoamin.zone to forward.zone and named.local to reverse.zone
   so copy the below files with the mentioned name and edit it 

[root@standalone2 named]# cp localdomain.zone forward.zone
[root@standalone2 named]# cp named.local reverse.zone


[root@standalone2 named]# vi forward.zone


# Whole file before modification.
================================

$TTL    86400
@               IN SOA  localhost root (
                                        42              ; serial (d. adams)
                                        3H              ; refresh
                                        15M             ; retry
                                        1W              ; expiry
                                        1D )            ; minimum
                IN NS           localhost
localhost       IN A            127.0.0.1



# whole file after modification.
================================

$TTL    86400
@               IN SOA  standalone2.manzoor.com. root.standalone2.manzoor.com. (
                                        42              ; serial (d. adams)
                                        3H              ; refresh
                                        15M             ; retry
                                        1W              ; expiry
                                        1D )            ; minimum
                IN NS           standalone2.manzoor.com.
standalone2     IN A            192.168.0.30


[root@standalone2 named]# vi reverse.zone

# Whole file before modification.
================================

$TTL    86400
@       IN      SOA     localhost. root.localhost.  (
                                      1997022700 ; Serial
                                      28800      ; Refresh
                                      14400      ; Retry
                                      3600000    ; Expire
                                      86400 )    ; Minimum
        IN      NS      localhost.
1       IN      PTR     localhost.


# whole file after modification.
================================

$TTL    86400
@       IN      SOA     standalone2.manzoor.com. root.standalone2.manzoor.com.  (
                                      1997022700 ; Serial
                                      28800      ; Refresh
                                      14400      ; Retry
                                      3600000    ; Expire
                                      86400 )    ; Minimum
        IN      NS      standalone2.manzoor.com.
30      IN      PTR     standalone2.manzoor.com.



-- in the above 30 is the last pointer in the ip address 192.168.0.30


-- Change the group of forward.zone and reverse.zone files to named group.

[root@standalone2 named]# chgrp named forward.zone
[root@standalone2 named]# chgrp named reverse.zone


[root@standalone2 named]# ls -lrt
total 44
drwxrwx--- 2 named named 4096 Jul 27  2004 slaves
drwxrwx--- 2 named named 4096 Aug 25  2004 data
-rw-r----- 1 root  named  427 Dec  2  2010 named.zero
-rw-r----- 1 root  named  426 Dec  2  2010 named.local
-rw-r----- 1 root  named  424 Dec  2  2010 named.ip6.local
-rw-r----- 1 root  named 1892 Dec  2  2010 named.ca
-rw-r----- 1 root  named  427 Dec  2  2010 named.broadcast
-rw-r----- 1 root  named  195 Dec  2  2010 localhost.zone
-rw-r----- 1 root  named  198 Dec  2  2010 localdomain.zone
-rw-r----- 1 root  named  258 Oct  4 23:25 forward.zone
-rw-r----- 1 root  named  482 Oct  4 23:28 reverse.zone


[root@standalone2 named]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
::1                     localhost6.localdomain6 localhost6
##################################################
#### Public ips #################################
192.168.0.30    standalone2.manzoor.com         standalone2



5) Edit the resolv.conf file modify the localdomain to your domain name
and the nameserver ip address to the public ip of this server.


[root@standalone2 named]# vi /etc/resolv.conf

# Edit file as per below details.


search manzoor.com
nameserver 192.168.0.30


-- Host name should be updated in network file as below

[root@standalone2 named]# cat /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=standalone2.manzoor.com


-- Restart the named service


[root@standalone2 named]# service named restart
Stopping named:                                            [  OK  ]
Starting named:                                            [  OK  ]

-- Test the dns

[root@standalone2 named]# dig standalone2.manzoor.com

; <<>> DiG 9.3.6-P1-RedHat-9.3.6-16.P1.el5 <<>> standalone2.manzoor.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- 6354="" div="" id:="" noerror="" opcode:="" query="" status:="">
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;standalone2.manzoor.com.       IN      A

;; ANSWER SECTION:
standalone2.manzoor.com. 86400  IN      A       192.168.0.30

;; AUTHORITY SECTION:
manzoor.com.            86400   IN      NS      standalone2.manzoor.com.

;; Query time: 4 msec
;; SERVER: 192.168.0.30#53(192.168.0.30)
;; WHEN: Fri Oct  4 23:32:47 2013
;; MSG SIZE  rcvd: 71


- We got the answer without error.

[root@standalone2 named]# nslookup standalone2.manzoor.com
Server:         192.168.0.30
Address:        192.168.0.30#53

Name:   standalone2.manzoor.com
Address: 192.168.0.30

[root@standalone2 named]# nslookup 192.168.0.30
Server:         192.168.0.30
Address:        192.168.0.30#53

30.0.168.192.in-addr.arpa       name = standalone2.manzoor.com.



== DNS Configuration for the server has been completed =====================



Steps to Add client to the DNS server.
======================================


1) Update the client server details in the forward.zone file.

Here our clinet server is hostname is urac1rac2-scan.manzoor.com and the IP address for
this host is 192.168.0.27 / 192.168.0.28 and 192.168.0.29

-- Note in this eg. we are using three ip address for the same host beacuse we are 
going to setup scan ip for the Oracle 11g grid.


2. Edit the forward zone file and add the client server hostname and ip address as below.

[root@standalone2 named]# vi forward.zone

$TTL    86400
@               IN SOA  standalone2.manzoor.com. root.standalone2.manzoor.com. (
                                        42              ; serial (d. adams)
                                        3H              ; refresh
                                        15M             ; retry
                                        1W              ; expiry
                                        1D )            ; minimum
                IN NS           standalone2.manzoor.com.
                IN NS           urac1rac2-scan.manzoor.com.
standalone2     IN A            192.168.0.30
urac1rac2-scan  IN A            192.168.0.27
urac1rac2-scan  IN A            192.168.0.28
urac1rac2-scan  IN A            192.168.0.29


-- Note
NS --  Denotes Named server
A  --  Denotes Address. 

We have updated the NS and A for the client.

2) Update the clienter server details in the reverse.zone file.

[root@standalone2 named]# vi reverse.zone

$TTL    86400
@       IN      SOA     standalone2.manzoor.com. root.standalone2.manzoor.com.  (
                                      1997022700 ; Serial
                                      28800      ; Refresh
                                      14400      ; Retry
                                      3600000    ; Expire
                                      86400 )    ; Minimum
        IN      NS      standalone2.manzoor.com.
        IN      NS      urac1rac2-scan.manzoor.com.
30      IN      PTR     standalone2.manzoor.com.
27      IN      PTR     urac1rac2-scan.manzoor.com.
28      IN      PTR     urac1rac2-scan.manzoor.com.
29      IN      PTR     urac1rac2-scan.manzoor.com.

-- Note

PTR -- Here the PTR denotes the last pointer of the IP address.

4)  Now Test this 

[root@standalone2 named]# nslookup urac1rac2-scan.manzoor.com
Server:         192.168.0.30
Address:        192.168.0.30#53

** server can't find urac1rac2-scan.manzoor.com: NXDOMAIN


[root@standalone2 named]# service named restart
Stopping named:                                            [  OK  ]
Starting named:      


-- We have assigned three ips for urac1rac2-scan.manzoor.com so it should listen is round robin fashion.


[root@standalone2 named]# nslookup urac1rac2-scan.manzoor.com
Server:         192.168.0.30
Address:        192.168.0.30#53

Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.27
Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.28
Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.29

[root@standalone2 named]# nslookup urac1rac2-scan.manzoor.com
Server:         192.168.0.30
Address:        192.168.0.30#53

Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.28
Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.29
Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.27

[root@standalone2 named]# nslookup urac1rac2-scan.manzoor.com
Server:         192.168.0.30
Address:        192.168.0.30#53

Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.29
Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.27
Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.28


Update the /etc/resovl.conf file in the client to update the DNS server address.

----- Client Configuration in DNS Server is Completed ------------------------------------


Updating the SCAN IP in 11gr2 Grid.
===================================


Currently we have a two node rac setup running with one scan since we dont have dns, 
and have used the /etc/hosts file for resolving the SCAN ip. 

Now we have setup the DNS server and have updated three ips for scan (urac1rac2-scan.manzoor.com).


Current scan detail in Grid.

[oracle@rhel11gr2rac1 bin]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node rhel11gr2rac2

[oracle@rhel11gr2rac1 bin]$ ./srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node rhel11gr2rac2



[oracle@rhel11gr2rac1 bin]$ srvctl config scan
SCAN name: urac1rac2-scan.manzoor.com, Network: 1/192.168.0.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /urac1rac2-scan.manzoor.com/192.168.0.28


-- As we see currently its running with 1 ip 192.168.0.28


1) Update the DNS server ip details on both the rac nodes.

[root@rhel11gr2rac1 ~]# vi /etc/resolv.conf

search manzoor.com
nameserver 192.168.0.30

[root@rhel11gr2rac2 ~]# vi /etc/resolv.conf

; generated by /sbin/dhclient-script
search manzoor.com
nameserver 192.168.0.30


2) Check whether the nslookup is returning the details properly.

[root@rhel11gr2rac2 ~]# nslookup urac1rac2-scan.manzoor.com
Server:         192.168.0.30
Address:        192.168.0.30#53

Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.28
Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.29
Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.27

[root@rhel11gr2rac2 ~]# nslookup urac1rac2-scan.manzoor.com
Server:         192.168.0.30
Address:        192.168.0.30#53

Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.29
Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.27
Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.28

[root@rhel11gr2rac2 ~]# nslookup urac1rac2-scan.manzoor.com
Server:         192.168.0.30
Address:        192.168.0.30#53

Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.27
Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.28
Name:   urac1rac2-scan.manzoor.com
Address: 192.168.0.29

3) Remove the Scan entry from /etc/hosts file on all the nodes.

4) Stop the scan listener and scan.

[oracle@rhel11gr2rac1 bin]$ ./srvctl stop scan_listener
[oracle@rhel11gr2rac1 bin]$ ./srvctl stop scan


5) Modify scan as root user.

[root@rhel11gr2rac1 bin]# ./srvctl modify scan -n urac1rac2-scan.manzoor.com


[oracle@rhel11gr2rac1 bin]$ ./srvctl modify scan_listener -u

6) Start the Scan listener.

[oracle@rhel11gr2rac1 bin]$ ./srvctl start scan_listener

6) Check the status of the scan.

[oracle@rhel11gr2rac1 bin]$ ./srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node rhel11gr2rac1
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node rhel11gr2rac2
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node rhel11gr2rac1

[oracle@rhel11gr2rac1 bin]$ ./srvctl config scan
SCAN name: urac1rac2-scan.manzoor.com, Network: 1/192.168.0.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /urac1rac2-scan.manzoor.com/192.168.0.28
SCAN VIP name: scan2, IP: /urac1rac2-scan.manzoor.com/192.168.0.29
SCAN VIP name: scan3, IP: /urac1rac2-scan.manzoor.com/192.168.0.27

[oracle@rhel11gr2rac1 bin]$ ./srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node rhel11gr2rac1
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node rhel11gr2rac2
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node rhel11gr2rac1


-- Scan configuration has been completed.


Reference:- 

How to Modify SCAN Setting or SCAN Listener Port after Installation (Doc ID 972500.1)
Linux: How to Configure the DNS Server for 11gR2 SCAN (Doc ID 1107295.1)
How To Convert an 11gR2 GNS Configuration To A Standard Configuration Using DNS Only[Article ID 1489121.1
http://www.youtube.com/watch?v=XLcryY6Ndlg 







Friday, October 4, 2013

HAIP - Configure Multiple Private interconnect interface in Linux (11.2)

How to add one more network to Private interconnect (11.2)
==========================================================


1) Below is the current setup

a) Two node RAC with 11.2.0.3 oracle Clusterware.

b) Node details.

[oracle@rhel11gr2rac1 bin]$ ./olsnodes -n -i -s
rhel11gr2rac1   1       rhel11gr2rac1-vip       Active
rhel11gr2rac2   2       rhel11gr2rac2-vip       Active

c) Private interconnect ips.

[oracle@rhel11gr2rac1 bin]$ ./olsnodes -l -p
rhel11gr2rac1   10.10.10.20

[oracle@rhel11gr2rac2 bin]$ ./olsnodes -l -p
rhel11gr2rac2   10.10.10.21


2) Below is the new interface we are going to add to the private interconnect, update the /etc/hosts file with the below details.

10.10.10.30     rhel11gr2rac1-priv2.manzoor.com rhel11gr2rac1-priv2
10.10.10.31     rhel11gr2rac2-priv2.manzoor.com rhel11gr2rac2-priv2


3) Configure the Network and assign the above ips to the added new network.


Node 1 -

[root@rhel11gr2rac1 ~]# cd /etc/sysconfig/network-scripts/
[root@rhel11gr2rac1 network-scripts]# ifdown eth2

-- Open the eth2 config details and update the necessary details ( can refer the eth1 details)

[root@rhel11gr2rac1 network-scripts]# vi ifcfg-eth2
# Intel Corporation 82545EM Gigabit Ethernet Controller (Copper)
DEVICE=eth2
HWADDR=00:0c:29:89:94:4d
ONBOOT=yes
HOTPLUG=no
BOOTPROTO=none
NETMASK=255.255.255.0
IPADDR=10.10.10.30
GATEWAY=10.10.10.0
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes

[root@rhel11gr2rac1 network-scripts]# ifup eth2
[root@rhel11gr2rac1 network-scripts]# ifconfig eth2

eth2      Link encap:Ethernet  HWaddr 00:0C:29:89:94:4D
          inet addr:10.10.10.30  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe89:944d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:375 errors:0 dropped:0 overruns:0 frame:0
          TX packets:215 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:77632 (75.8 KiB)  TX bytes:35114 (34.2 KiB)


Node 2 -

[root@rhel11gr2rac2 ~]# cd /etc/sysconfig/network-scripts/
[root@rhel11gr2rac2 network-scripts]# ifdown eth2
[root@rhel11gr2rac2 network-scripts]# vi ifcfg-eth2

# Intel Corporation 82545EM Gigabit Ethernet Controller (Copper)
DEVICE=eth2
HWADDR=00:0c:29:75:b5:10
ONBOOT=yes
HOTPLUG=no
BOOTPROTO=none
NETMASK=255.255.255.0
IPADDR=10.10.10.31
GATEWAY=10.10.10.0
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes

[root@rhel11gr2rac2 network-scripts]# ifup eth2
[root@rhel11gr2rac2 network-scripts]# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 00:0C:29:75:B5:10
          inet addr:10.10.10.31  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe75:b510/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:489 errors:0 dropped:0 overruns:0 frame:0
          TX packets:186 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:103027 (100.6 KiB)  TX bytes:27884 (27.2 KiB)


4) Follow the steps below to add the new interface to the private network.


a) As of 11.2 Grid Infrastructure, the private network configuration is not only stored in OCR but also
in the gpnp profile.  If the private network is not available or its definition is incorrect, the
CRSD process will not start and any subsequent changes to the OCR will be impossible.             Therefore care needs to be taken when making modifications to the configuration of the private network. It is important to perform the changes in the correct order. Please also note that manual modification of gpnp profile is not supported.

b) Take the backup of the profile.xml file in all the nodes.

Node 1

[oracle@rhel11gr2rac1 ~]$ cd /grid/11.2/gpnp/rhel11gr2rac1/profiles/peer/
[oracle@rhel11gr2rac1 peer]$ cp profile.xml profile.xml_bkp_4thoct
[oracle@rhel11gr2rac1 peer]$ ls -lrt
total 20
-rw-r--r-- 1 oracle oinstall 1873 Mar 23  2013 profile_orig.xml
-rw-r--r-- 1 oracle oinstall 1880 Mar 23  2013 profile.old
-rw-r--r-- 1 oracle oinstall 1886 Mar 23  2013 profile.xml
-rw-r--r-- 1 oracle oinstall 1886 Oct  3 18:35 pending.xml
-rw-r--r-- 1 oracle oinstall 1886 Oct  3 19:48 profile.xml_bkp_4thoct


Node 2

[oracle@rhel11gr2rac2 peer]$ cd /grid/11.2/gpnp/rhel11gr2rac2/profiles/peer
[oracle@rhel11gr2rac2 peer]$ cp profile.xml profile.xml_bkp_4thoct
[oracle@rhel11gr2rac2 peer]$ ls -lrt
total 20
-rw-r--r-- 1 oracle oinstall 1873 Mar 23  2013 profile_orig.xml
-rw-r--r-- 1 oracle oinstall 1880 Mar 23  2013 profile.old
-rw-r--r-- 1 oracle oinstall 1886 Mar 23  2013 profile.xml
-rw-r--r-- 1 oracle oinstall 1886 Oct  3 18:35 pending.xml
-rw-r--r-- 1 oracle oinstall 1886 Oct  3 19:45 profile.xml_bkp_4thoct


c) Ensuare the oracle clusterware is up and running in all the nodes.

Node 1

[oracle@rhel11gr2rac1 bin]$ ./crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online


Node 2


[oracle@rhel11gr2rac2 bin]$ ./crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

d) We need to use the oifcfg tool for configuring the network.

[oracle@rhel11gr2rac1 bin]$ ./oifcfg -h

Name:
        oifcfg - Oracle Interface Configuration Tool.

Usage:  oifcfg iflist [-p [-n]]
        oifcfg setif {-node | -global} {/:}...
        oifcfg getif [-node | -global] [ -if [/] [-type ] ]
        oifcfg delif {{-node | -global} [[/]] [-force] | -force}
        oifcfg [-help]

        - name of the host, as known to a communications network
         - name by which the interface is configured in the system
          - subnet address of the interface
         - type of the interface { cluster_interconnect | public }



e) Get the current configuration details.


[oracle@rhel11gr2rac1 bin]$ ./oifcfg getif
eth0  192.168.0.0  global  public
eth1  10.10.10.0   global  cluster_interconnect


f) Add the new cluster interconnect information.


$ oifcfg setif -global /:cluster_interconnect

interface -- eth2
subnet -- We are going to add the new interface in the same subnet of previous interconnect. (10.10.10.0)

-- We can use the below command to find the subnet of an interface.

[oracle@rhel11gr2rac1 bin]$ ./oifcfg iflist
eth0  192.168.0.0
eth1  10.10.10.0
eth1  169.254.0.0
eth2  10.10.10.0


-- Our new network interface is eth2 and hence the subnet is 10.10.10.0

-- Note

i) This can be done with -global option even if the interface is not available yet, but this can not be done
with -node option if the interface is not available, it will lead to node eviction.

ii) If your adding a 2nd private network, not replacing the existing private network, please ensure MTU size of both interfaces are the same, otherwise instance startup will report below error:


ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if MTU failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcini2
ORA-27303: additional information: requested interface lan1:801 has a different MTU (1500) than lan3:801 (9000), which is not supported. Check output from ifconfig command


Check the MTU of the private interface.

-- Node 1


[root@rhel11gr2rac1 ~]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:0C:29:89:94:43
          inet addr:10.10.10.20  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe89:9443/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:220044 errors:0 dropped:0 overruns:0 frame:0
          TX packets:186665 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:147597750 (140.7 MiB)  TX bytes:108973996 (103.9 MiB)


[root@rhel11gr2rac1 ~]# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 00:0C:29:89:94:4D
          inet addr:10.10.10.30  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe89:944d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:410 errors:0 dropped:0 overruns:0 frame:0
          TX packets:215 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:84329 (82.3 KiB)  TX bytes:35114 (34.2 KiB)




[root@rhel11gr2rac2 ~]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:0C:29:75:B5:06
          inet addr:10.10.10.21  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe75:b506/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:187806 errors:0 dropped:0 overruns:0 frame:0
          TX packets:220819 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:109461264 (104.3 MiB)  TX bytes:148275753 (141.4 MiB)



[root@rhel11gr2rac2 ~]# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 00:0C:29:75:B5:10
          inet addr:10.10.10.31  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe75:b510/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:498 errors:0 dropped:0 overruns:0 frame:0
          TX packets:186 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:103963 (101.5 KiB)  TX bytes:27884 (27.2 KiB)

-- All the above interface are having the same MTU 1500.

Now add the interface as below.


[oracle@rhel11gr2rac1 bin]$ ./oifcfg setif -global eth2/10.10.10.0:cluster_interconnect

verify the changes made.

[oracle@rhel11gr2rac1 bin]$ ./oifcfg getif
eth0  192.168.0.0  global  public
eth1  10.10.10.0  global  cluster_interconnect
eth2  10.10.10.0  global  cluster_interconnect

[oracle@rhel11gr2rac2 bin]$ ./oifcfg getif
eth0  192.168.0.0  global  public
eth1  10.10.10.0  global  cluster_interconnect
eth2  10.10.10.0  global  cluster_interconnect


g) Shutdown the clusterware in all the nodes.


[root@rhel11gr2rac1 bin]# ./crsctl stop crs
[root@rhel11gr2rac2 bin]# ./crsctl stop crs


h) If you have configured the oifcfg before the network card is available then now make the changes at the
os level and check whether the network is available before bring up the crs.


Ping test

Node 1

[root@rhel11gr2rac1 bin]# ping rhel11gr2rac1-priv2
PING rhel11gr2rac1-priv2.manzoor.com (10.10.10.30) 56(84) bytes of data.
64 bytes from rhel11gr2rac1-priv2.manzoor.com (10.10.10.30): icmp_seq=1 ttl=64 time=0.042 ms
64 bytes from rhel11gr2rac1-priv2.manzoor.com (10.10.10.30): icmp_seq=2 ttl=64 time=0.038 ms
64 bytes from rhel11gr2rac1-priv2.manzoor.com (10.10.10.30): icmp_seq=3 ttl=64 time=0.040 ms

--- rhel11gr2rac1-priv2.manzoor.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.038/0.040/0.042/0.001 ms

[root@rhel11gr2rac1 bin]# ping rhel11gr2rac2-priv2
PING rhel11gr2rac2-priv2.manzoor.com (10.10.10.31) 56(84) bytes of data.
64 bytes from rhel11gr2rac2-priv2.manzoor.com (10.10.10.31): icmp_seq=1 ttl=64 time=1.77 ms
64 bytes from rhel11gr2rac2-priv2.manzoor.com (10.10.10.31): icmp_seq=2 ttl=64 time=0.333 ms
64 bytes from rhel11gr2rac2-priv2.manzoor.com (10.10.10.31): icmp_seq=3 ttl=64 time=0.292 ms
64 bytes from rhel11gr2rac2-priv2.manzoor.com (10.10.10.31): icmp_seq=4 ttl=64 time=0.300 ms
64 bytes from rhel11gr2rac2-priv2.manzoor.com (10.10.10.31): icmp_seq=5 ttl=64 time=0.299 ms
64 bytes from rhel11gr2rac2-priv2.manzoor.com (10.10.10.31): icmp_seq=6 ttl=64 time=0.463 ms

--- rhel11gr2rac2-priv2.manzoor.com ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 4999ms
rtt min/avg/max/mdev = 0.292/0.576/1.772/0.538 ms


Node 2


[root@rhel11gr2rac2 bin]# ping rhel11gr2rac2-priv2
PING rhel11gr2rac2-priv2.manzoor.com (10.10.10.31) 56(84) bytes of data.
64 bytes from rhel11gr2rac2-priv2.manzoor.com (10.10.10.31): icmp_seq=1 ttl=64 time=0.048 ms
64 bytes from rhel11gr2rac2-priv2.manzoor.com (10.10.10.31): icmp_seq=2 ttl=64 time=0.050 ms
64 bytes from rhel11gr2rac2-priv2.manzoor.com (10.10.10.31): icmp_seq=3 ttl=64 time=0.045 ms

--- rhel11gr2rac2-priv2.manzoor.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.045/0.047/0.050/0.008 ms
[root@rhel11gr2rac2 bin]# ping rhel11gr2rac1-priv2
PING rhel11gr2rac1-priv2.manzoor.com (10.10.10.30) 56(84) bytes of data.
64 bytes from rhel11gr2rac1-priv2.manzoor.com (10.10.10.30): icmp_seq=1 ttl=64 time=2.20 ms
64 bytes from rhel11gr2rac1-priv2.manzoor.com (10.10.10.30): icmp_seq=2 ttl=64 time=0.401 ms
64 bytes from rhel11gr2rac1-priv2.manzoor.com (10.10.10.30): icmp_seq=3 ttl=64 time=0.321 ms

--- rhel11gr2rac1-priv2.manzoor.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.321/0.976/2.207/0.871 ms


i) Start the CRS.


[root@rhel11gr2rac1 bin]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

[root@rhel11gr2rac2 bin]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.


j) Now verify the status

Node 1

[oracle@rhel11gr2rac1 bin]$ ./oifcfg getif
eth0  192.168.0.0  global  public
eth1  10.10.10.0  global  cluster_interconnect
eth2  10.10.10.0  global  cluster_interconnect

[oracle@rhel11gr2rac1 bin]$ ./olsnodes -l -p
rhel11gr2rac1   10.10.10.20,10.10.10.30


-- Both the private network are getting listed.


[oracle@rhel11gr2rac2 bin]$ ./oifcfg getif
eth0  192.168.0.0  global  public
eth1  10.10.10.0  global  cluster_interconnect
eth2  10.10.10.0  global  cluster_interconnect

[oracle@rhel11gr2rac2 bin]$ ./olsnodes -l -p
rhel11gr2rac2   10.10.10.21,10.10.10.31




========================================================================
For 11.2.0.2+: (HAIP address will show in alert log instead of private IP)
eg.

Cluster communication is configured to use the following interface(s) for this instance
  169.254.86.97
=======================================================================================

From alert log
==============

Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
  [name='eth1:1', type=1, ip=169.254.48.13, mac=00-0c-29-75-b5-06, net=169.254.0.0/17, mask=255.255.128.0, use=haip:cluster_interconnect/62]
Private Interface 'eth2:1' configured from GPnP for use as a private interconnect.
  [name='eth2:1', type=1, ip=169.254.227.73, mac=00-0c-29-75-b5-10, net=169.254.128.0/17, mask=255.255.128.0, use=haip:cluster_interconnect/62]

.....
Cluster communication is configured to use the following interface(s) for this instance
  169.254.48.13
  169.254.227.73



Note: interconnect communication will use all two virtual private IPs; in case of network failure, as long as there is one private network adapter functioning, all two IPs will remain active.


From Database


SQL> select * from GV$configured_interconnects where is_public = 'NO';

   INST_ID NAME            IP_ADDRESS       IS_ SOURCE
---------- --------------- ---------------- --- -------------------------------
         2 eth1:1          169.254.48.13    NO
         2 eth2:1          169.254.227.73   NO
         1 eth1:1          169.254.62.58    NO
         1 eth2:1          169.254.250.70   NO


Here each private interface will have an virtual ip i.e the eth1 is having the vip as 169.254.62.58 and eth2 vip is 169.254.250.70 like wise for node 2 the eth1 vip is 169.254.48.13 and eth2 vip is 169.254.227.73.

VIP is used for failover, i.e. if one network interface goes down then the vip will be failed over to the other
available interface.

Eg.

If in node 1 the interface eth1 got failured then the vip 169.254.62.58 will be failed over to the eth2. Thus as long as there is one private newtork adapter functioning all the two ips will remain active.



Testing..

   INST_ID NAME            IP_ADDRESS       IS_ SOURCE
---------- --------------- ---------------- --- -------------------------------
         2 eth1:1          169.254.48.13    NO
         2 eth2:1          169.254.227.73   NO
         1 eth1:1          169.254.62.58    NO
         1 eth2:1          169.254.250.70   NO


Let bring down the interface eth1 in node 1.

[root@rhel11gr2rac1 ~]# ifdown eth1


Snap from the node 1 db alter log

Thu Oct 03 23:38:45 2013
SKGXP: ospid 16542: network interface query failed for IP address 169.254.62.58.
SKGXP: [error 11132]


ifconfig

--output

eth1      Link encap:Ethernet  HWaddr 00:0C:29:89:94:43
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:375106 errors:0 dropped:0 overruns:0 frame:0
          TX packets:310254 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:251856284 (240.1 MiB)  TX bytes:186387264 (177.7 MiB)

eth2      Link encap:Ethernet  HWaddr 00:0C:29:89:94:4D
          inet addr:10.10.10.30  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe89:944d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:157698 errors:0 dropped:0 overruns:0 frame:0
          TX packets:139343 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:113436206 (108.1 MiB)  TX bytes:74727012 (71.2 MiB)

eth2:1    Link encap:Ethernet  HWaddr 00:0C:29:89:94:4D
          inet addr:169.254.62.58  Bcast:169.254.127.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth2:2    Link encap:Ethernet  HWaddr 00:0C:29:89:94:4D
          inet addr:169.254.250.70  Bcast:169.254.255.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1



--- Since the eth1 interface is down the vip 169.254.62.58 has been failed over to the eth2 interface, which is eth2:1




[root@rhel11gr2rac2 ~]# ifdown eth1


ifconfig output

eth1      Link encap:Ethernet  HWaddr 00:0C:29:75:B5:06
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:326893 errors:0 dropped:0 overruns:0 frame:0
          TX packets:377493 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:192462174 (183.5 MiB)  TX bytes:259305297 (247.2 MiB)

eth2      Link encap:Ethernet  HWaddr 00:0C:29:75:B5:10
          inet addr:10.10.10.31  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe75:b510/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:132279 errors:0 dropped:0 overruns:0 frame:0
          TX packets:165414 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:72715227 (69.3 MiB)  TX bytes:114056247 (108.7 MiB)

eth2:1    Link encap:Ethernet  HWaddr 00:0C:29:75:B5:10
          inet addr:169.254.48.13  Bcast:169.254.127.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth2:2    Link encap:Ethernet  HWaddr 00:0C:29:75:B5:10
          inet addr:169.254.227.73  Bcast:169.254.255.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1


-- Since the eth1 interface id down the vip 169.254.48.13 has been failed over to eth2, which is eth2:1.


-- Eventhough one interface is down on each node still we have two vips on both the nodes which been served by the remaining network interface.

The oifcfg output as below.

[root@rhel11gr2rac2 bin]# ./oifcfg iflist -n -p
eth0  192.168.0.0  PRIVATE  255.255.255.0
eth2  10.10.10.0  PRIVATE  255.255.255.0
eth2  169.254.0.0  UNKNOWN  255.255.128.0
eth2  169.254.128.0  UNKNOWN  255.255.128.0


-- Now lets bring up the eth1 on node 2.




[root@rhel11gr2rac2 bin]# ifup eth1


ifconfig output in node 2


eth1      Link encap:Ethernet  HWaddr 00:0C:29:75:B5:06
          inet addr:10.10.10.21  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe75:b506/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:327807 errors:0 dropped:0 overruns:0 frame:0
          TX packets:378590 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:192898043 (183.9 MiB)  TX bytes:260037599 (247.9 MiB)

eth1:1    Link encap:Ethernet  HWaddr 00:0C:29:75:B5:06
          inet addr:169.254.48.13  Bcast:169.254.127.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:2    Link encap:Ethernet  HWaddr 00:0C:29:75:B5:06
          inet addr:169.254.227.73  Bcast:169.254.255.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth2      Link encap:Ethernet  HWaddr 00:0C:29:75:B5:10
          inet addr:10.10.10.31  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe75:b510/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:138848 errors:0 dropped:0 overruns:0 frame:0
          TX packets:173925 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:75828753 (72.3 MiB)  TX bytes:120251886 (114.6 MiB)


--now both the vips are servered by eth1 eventhough eth2 is up and running this is because one interface is down on node 1.


[root@rhel11gr2rac1 ~]# ifup eth1

ifconfig output in node 1

eth1      Link encap:Ethernet  HWaddr 00:0C:29:89:94:43
          inet addr:10.10.10.20  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe89:9443/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:375296 errors:0 dropped:0 overruns:0 frame:0
          TX packets:310382 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:251983004 (240.3 MiB)  TX bytes:186445931 (177.8 MiB)

eth1:1    Link encap:Ethernet  HWaddr 00:0C:29:89:94:43
          inet addr:169.254.62.58  Bcast:169.254.127.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth2      Link encap:Ethernet  HWaddr 00:0C:29:89:94:4D
          inet addr:10.10.10.30  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe89:944d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:186819 errors:0 dropped:0 overruns:0 frame:0
          TX packets:161939 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:134733101 (128.4 MiB)  TX bytes:85910612 (81.9 MiB)

eth2:2    Link encap:Ethernet  HWaddr 00:0C:29:89:94:4D
          inet addr:169.254.250.70  Bcast:169.254.255.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1




ifconfig output in node 2 (after the eth1 is up on both the nodes)

eth1      Link encap:Ethernet  HWaddr 00:0C:29:75:B5:06
          inet addr:10.10.10.21  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe75:b506/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:333637 errors:0 dropped:0 overruns:0 frame:0
          TX packets:386233 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:196228732 (187.1 MiB)  TX bytes:265869284 (253.5 MiB)

eth1:1    Link encap:Ethernet  HWaddr 00:0C:29:75:B5:06
          inet addr:169.254.48.13  Bcast:169.254.127.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth2      Link encap:Ethernet  HWaddr 00:0C:29:75:B5:10
          inet addr:10.10.10.31  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe75:b510/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:140802 errors:0 dropped:0 overruns:0 frame:0
          TX packets:175889 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:76612446 (73.0 MiB)  TX bytes:121438787 (115.8 MiB)

eth2:1    Link encap:Ethernet  HWaddr 00:0C:29:75:B5:10
          inet addr:169.254.227.73  Bcast:169.254.255.255  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1



-- As long as one interface is healthy then there wont be any impact to the asm/db instance.

now we bought down eth2 in node 1 and eth1 in node 2. Below is the oifcfg output.

node 1

[root@rhel11gr2rac1 bin]# ./oifcfg iflist -n -p
eth0  192.168.0.0  PRIVATE  255.255.255.0
eth1  10.10.10.0  PRIVATE  255.255.255.0
eth1  169.254.0.0  UNKNOWN  255.255.128.0
eth1  169.254.128.0  UNKNOWN  255.255.128.0


Node 2

[root@rhel11gr2rac2 bin]# ./oifcfg iflist -n -p
eth0  192.168.0.0  PRIVATE  255.255.255.0
eth2  10.10.10.0  PRIVATE  255.255.255.0
eth2  169.254.128.0  UNKNOWN  255.255.128.0
eth2  169.254.0.0  UNKNOWN  255.255.128.0


Below is the oifcfg output when both the interface are up in both the nodes.

Node 1


[root@rhel11gr2rac1 bin]# ./oifcfg iflist -n -p
eth0  192.168.0.0  PRIVATE  255.255.255.0
eth1  10.10.10.0  PRIVATE  255.255.255.0
eth1  169.254.0.0  UNKNOWN  255.255.128.0
eth2  10.10.10.0  PRIVATE  255.255.255.0
eth2  169.254.128.0  UNKNOWN  255.255.128.0


Node 2


[root@rhel11gr2rac2 bin]# ./oifcfg iflist -n -p
eth0  192.168.0.0  PRIVATE  255.255.255.0
eth1  10.10.10.0  PRIVATE  255.255.255.0
eth1  169.254.0.0  UNKNOWN  255.255.128.0
eth2  10.10.10.0  PRIVATE  255.255.255.0
eth2  169.254.128.0  UNKNOWN  255.255.128.0





Reference:-
11gR2 Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip (Doc ID 1210883.1)
How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1)

Wednesday, October 2, 2013

Oracle Golden Gate High Availability using Oracle Clusterware (11.2)

OGG High Availability using Oracle clusterware
================================
1) Oracle Golden Gate cluster high availability Pre-requsite.

a. Oracle Golden gate runs on one server at any time.
b. In the event of the failure on one node, the oracle GG can be started on the another node.
c. In order to resume processing on the another node, we need to maintain/store the
recover relagted files (checkpoint file & trailfiles) on shared location.
d. Oracle ACFS is the recommended cluster file system for Oracle Golden Gate binaries and
trail files in Real Application Cluster configurations for ease of management and high availability.


Note: ACFS can be used for Oracle Golden Gate trail files with no restrictions. Oracle GoldenGate installation can be done on ACFS and you can also store the recovery-related files in a cluster configuration in ACFS to make them accessible to all nodes. However if your Oracle Grid Infrastructure version is older than 11.2.0.3 then ACFS mounted on multiple servers concurrently does not currently support file locking, thus you would need to mount ACFS on only one server.

If ACFS is mounted on one server at a time then file locking is supported in pre 11.2.0.3 Grid Infrastructure releases. This file locking issue has been resolved in Oracle Grid Infrastructure release 12c and the fix has been back ported up to version 11.2.0.3.


2) Oracle clusterware

a) Oracle clusterware provides the capability to manage the third-party applications.
b) There are commands to register an application and instruct Oracle Clusterware how to manage the application in a clustered environment.
c) This capability will be used to register the Oracle GoldenGate manager process as an application managed through Oracle Clusterware.
d) Oracle Clusterware can be installed standalone without an Oracle RAC database and still manage a cluster of  servers and various applications running on these servers. As such Oracle Clusterware can also be installed on more than just the database servers to form a single cluster.

3) Oracle Golden Gate Installations.

a) You may choose to perform a local installation on every server, or a single installation on a shared file system. You will need shared storage for the recovery-related files. On a Unix/Linux platform you can use a symbolic link to a central location for the shared directories.

4) Virtual IP.

a) Oracle Clusterware uses the concept of a Virtual IP address (VIP) to manage high availability for applications that require incoming network traffic (including the Oracle RAC database).
b) A VIP is an IP address on the public subnet that can be used to access a server. If the server hosting the VIP were to go down, then Oracle Clusterware will migrate the VIP to a surviving server to minimize interruptions for the application  accessing the server (through the VIP).
c) This concept enables faster failovers compared to time-out based failovers on a server's actual IP address in case of a server failure.
d) For Oracle GoldenGate, you should use a VIP to access the manager process to isolate access to the manager process from the physical server that is running Oracle GoldenGate. Remote pumps must use the VIP to contact the Oracle GoldenGate manager. The VIP must be an available IP address on the public subnet and cannot be determined through DHCP.
Ask a system administrator for an available fixed IP address for Oracle GoldenGate managed through Oracle Clusterware.

5. We need to instruct Oracle clusterware  how to start, stop, check process.

   i) Start
a) Oracle GoldenGate manager is the process that starts all other Oracle GoldenGate processes. The only process that Oracle Clusterware should start is the manager process. Use the AUTOSTART parameter in the manager parameter file to start extract and replicat processes. You can use wild cards (AUTOSTART ER *) to start all extract and replicat processes.
b) Also note that once manager is started through Oracle Clusterware, it is Oracle Clusterware that manages its availability. If you would stop manager through the command interface ggsci, then Oracle Clusterware will attempt to restart it.  Use the Oracle Clusterware commands to stop Oracle GoldenGate and prevent Oracle Clusterware from attempting to restart it.

   ii) check
a) The validation whether Oracle GoldenGate is running is equivalent to making sure the Oracle GoldenGate manager runs.

   iii) Stop
a) Stop must stop all Oracle GoldenGate processes, including manager. Stop may be called during a planned downtime (e.g. a server is taken out of a cluster for maintenance reasons) and/or if you manually instruct Oracle Clusterware to relocate Oracle GoldenGate to a different server (e.g. to change the load on a server). If a server crashes then all processes will go down with it, in which case they can be started on another server.


Setup
=====


1 .   As of now below setup is running.

a. Source is two node rac where the GG is configured using ACFS.
        b. Configured one extract and one pump process on the source.
c. Target is standalone db.
d. One Replicat process is configured in the target.



2. Now we need to register the Golden gate in Oracle cluster ware. We need to use oracle clusterware commands to create, register and set privileges on the VIP and the Oracle Golden gate application.Once registered, use the Oracle Clusterware commands to start, relocate and stop Oracle GoldenGate.


3. Add an application VIP.

a) The first step is to create an application VIP. The VIP will be used to access Oracle GoldenGate.
   Oracle Clusterware will assign the VIP to a physical server, and migrate the VIP if that server were to go down or   if you instruct Clusterware to do so.


b. Update the below vip in the /etc/hosts file on both the nodes , ( the vip should be on the same subnet of the public ip).

########## VIP FOR GOLDENGATE ################################

192.168.0.22    goldengate-vip.manzoor.com      goldengate-vip


c. Create a application VIP using below command as root user.


[root@rhel11gr2rac1 bin]# cd /grid/11.2/bin

[root@rhel11gr2rac1 bin]# ./appvipcfg -help
Production Copyright 2007, 2008, Oracle.All rights reserved
Unknown option: help

  Usage: appvipcfg create -network= -ip= -vipname=
                          -user=[-group=] [-failback=0 | 1]
                   delete -vipname=


To identifiy the network number execute the below command.

[root@rhel11gr2rac1 bin]# ./crsctl stat res -p | grep  -ie.network -ie subnet | grep -ie name -ie subnet
NAME=ora.net1.network
USR_ORA_SUBNET=192.168.0.0

here ora.net1 in NAME denotes the network number which is 1, and the USR_ORA_SUBNET denotes the subnet under which
the vip will be created.

Execute the below command to create the application vip.

./appvipcfg create -network=1 -ip=192.168.0.22 -vipname=goldengate-vip -user=root
Production Copyright 2007, 2008, Oracle.All rights reserved
2013-10-01 23:18:11: Creating Resource Type
2013-10-01 23:18:11: Executing /grid/11.2/bin/crsctl add type app.appvip_net1.type -basetype ora.cluster_vip_net1.type -file /grid/11.2/crs/template/appvip.type
2013-10-01 23:18:11: Executing cmd: /grid/11.2/bin/crsctl add type app.appvip_net1.type -basetype ora.cluster_vip_net1.type -file /grid/11.2/crs/template/appvip.type
2013-10-01 23:18:13: Create the Resource
2013-10-01 23:18:13: Executing /grid/11.2/bin/crsctl add resource goldengate-vip -type app.appvip_net1.type -attr "USR_ORA_VIP=192.168.0.22,START_DEPENDENCIES=hard(ora.net1.network) pullup(ora.net1.network),STOP_DEPENDENCIES=hard(ora.net1.network),ACL='owner:root:rwx,pgrp:root:r-x,other::r--,user:root:r-x',HOSTING_MEMBERS=rhel11gr2rac1.manzoor.com,APPSVIP_FAILBACK="
2013-10-01 23:18:13: Executing cmd: /grid/11.2/bin/crsctl add resource goldengate-vip -type app.appvip_net1.type -attr "USR_ORA_VIP=192.168.0.22,START_DEPENDENCIES=hard(ora.net1.network) pullup(ora.net1.network),STOP_DEPENDENCIES=hard(ora.net1.network),ACL='owner:root:rwx,pgrp:root:r-x,other::r--,user:root:r-x',HOSTING_MEMBERS=rhel11gr2rac1.manzoor.com,APPSVIP_FAILBACK="


d) Now allow the oracle clusterware owner (eg. oracle or grid) to run the script to start the VIP.

execute the below as root.

./crsctl setperm resource goldengate-vip -u user:oracle:r-x


e) As oracle user start the vip.


[oracle@rhel11gr2rac1 bin]$ ./crsctl start resource goldengate-vip
CRS-2672: Attempting to start 'goldengate-vip' on 'rhel11gr2rac1'
CRS-2676: Start of 'goldengate-vip' on 'rhel11gr2rac1' succeeded

f) Check the status of the vip.

[oracle@rhel11gr2rac1 bin]$ ./crsctl stat res goldengate-vip
NAME=goldengate-vip
TYPE=app.appvip_net1.type
TARGET=ONLINE
STATE=ONLINE on rhel11gr2rac1


e) Now we can able to ping the vip from the other nodes. Test it in node 2.

[root@rhel11gr2rac2 ~]# ping 192.168.0.22
PING 192.168.0.22 (192.168.0.22) 56(84) bytes of data.
64 bytes from 192.168.0.22: icmp_seq=1 ttl=64 time=2.29 ms
64 bytes from 192.168.0.22: icmp_seq=2 ttl=64 time=0.389 ms
64 bytes from 192.168.0.22: icmp_seq=3 ttl=64 time=0.372 ms

--- 192.168.0.22 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.372/1.018/2.295/0.903 ms



4) Now develop and agent script.

a) Oracle Clusterware runs resource-specific commands through an entity called an agent.
The agent script must be able to accept 5 parameter values: start, stop, check, clean and abort (optional).

b) Now we will create an script to  and will also place the script in the shared location, here we have placed the script under the gg home which will be accessed on both the nodes. (This is the sample script provided by oracle we can also have a customized script as per our requirement).

Script name = gg_monitor_start.sh


#!/bin/sh
#goldengate_action.scr
. ~oracle/.bash_profile
[ -z "$1" ]&& echo "ERROR!! Usage $0 "&& exit 99
GGS_HOME=/golden_gate
#specify delay after start before checking for successful start
start_delay_secs=5
#Include the Oracle GoldenGate home in the library path to start GGSCI
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${GGS_HOME}
#set the oracle home to the database to ensure Oracle GoldenGate will get
#the right environment settings to be able to connect to the database
export ORACLE_HOME=/u01/app/oracle/product/11.2/db
export CRS_HOME=/grid/11.2
#Set NLS_LANG otherwise it will default to US7ASCII

export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
logfile=/tmp/crs_gg_start.log

###########################
function log
###########################
{
DATETIME=`date +%d/%m/%y-%H:%M:%S`
echo $DATETIME "goldengate_action.scr>>" $1
echo $DATETIME "goldengate_action.scr>>" $1 >> $logfile
}
#check_process validates that a manager process is running at the PID
#that Oracle GoldenGate specifies.
check_process () {
dt=`date +%d/%m/%y-%H:%M:%S`
if ( [ -f "${GGS_HOME}/dirpcs/MGR.pcm" ] )
then
pid=`cut -f8 "${GGS_HOME}/dirpcs/MGR.pcm"`
if [ ${pid} = `ps -e |grep ${pid} |grep mgr |awk '{ print $1 }'` ]
then
#manager process is running on the PID . exit success
echo $dt "manager process is running on the PID . exit success">> /tmp/check.out
exit 0
else
#manager process is not running on the PID
echo $dt "manager process is not running on the PID" >> /tmp/check.out
exit 1
fi
else
#manager is not running because there is no PID file
echo $ dt"manager is not running because there is no PID file" >> /tmp/check.out
exit 1
fi
}
#call_ggsci is a generic routine that executes a ggsci command
call_ggsci () {
log "entering call_ggsci"
ggsci_command=$1
#log "about to execute $ggsci_command"
log "id= $USER"
cd ${GGS_HOME}
ggsci_output=`${GGS_HOME}/ggsci << EOF
${ggsci_command}
exit
EOF`
log "got output of : $ggsci_output"
}

case $1 in
'start')
#Updated by Sourav B (02/10/2011)
# During failover if the “mgr.pcm” file is not deleted at the node crash
# then Oracle clusterware won’t start the manager on the new node assuming the
# manager process is still running on the failed node. To get around this issue
# we will delete the “mgr.prm” file before starting up the manager on the new
# node. We will also delete the other process files with pc* extension and to
# avoid any file locking issue we will first backup the checkpoint files and then
# delete them from the dirchk directory.After that we will restore the checkpoint
# files from backup to the original location (dirchk directory).
log "removing *.pc* files from dirpcs directory..."
cd $GGS_HOME/dirpcs
rm -f *.pc*
log "creating tmp directory to backup checkpoint file...."
cd $GGS_HOME/dirchk
mkdir tmp
log "backing up checkpoint files..."
cp *.cp* $GGS_HOME/dirchk/tmp
log "Deleting checkpoint files under dirchk......"
rm -f *.cp*
log "Restore checkpoint files from backup to dirchk directory...."
cp $GGS_HOME/dirchk/tmp/*.cp* $GGS_HOME/dirchk
log "Deleting tmp directory...."
rm -rf tmp
log "starting manager"
call_ggsci 'start manager'


#there is a small delay between issuing the start manager command
#and the process being spawned on the OS . wait before checking
log "sleeping for start_delay_secs"
sleep ${start_delay_secs}
#check whether manager is running and exit accordingly
check_process
;;
'stop')
#attempt a clean stop for all non-manager processes
call_ggsci 'stop er *'
#ensure everything is stopped
call_ggsci 'stop er *!'
#stop manager without (y/n) confirmation
call_ggsci 'stop manager!'
#exit success
exit 0
;;
'check')
check_process
exit 0
;;
'clean')
#attempt a clean stop for all non-manager processes
call_ggsci 'stop er *'
#ensure everything is stopped
call_ggsci 'stop er *!'
#in case there are lingering processes
call_ggsci 'kill er *'
#stop manager without (y/n) confirmation
call_ggsci 'stop manager!'
#exit success
exit 0
;;
'abort')
#ensure everything is stopped
call_ggsci 'stop er *!'
#in case there are lingering processes
call_ggsci 'kill er *'
#stop manager without (y/n) confirmation
call_ggsci 'stop manager!'
#exit success
exit 0
;;
esac

c) Now we need to add a clusterware resource for the ggate application. As oracle user execute the
below command.


[oracle@rhel11gr2rac1 bin]$ ./crsctl add resource ggateapp -type cluster_resource -attr "ACTION_SCRIPT=/golden_gate/gg_monitor_start.sh,CHECK_INTERVAL=30,START_DEPENDENCIES='hard(goldengate-vip) pullup(goldengate-vip)', STOP_DEPENDENCIES='hard(goldengate-vip)'"


where ggateapp - is the application name we have given for golden gate resource.

START_DEPENDENCIES: there is a hard start dependency on goldengate-vip. This indicates that the VIP and the ggateapp application should  always start together.

STOP_DEPENDENCIES: there is a hard stop dependency on goldengate-vip. This indicates that the VIP and the ggateapp application should always stop together.

d) Now set the ownership of the oracle golden gate application if it is different from the oracle clusterware owner eg(ggowner) If oracle goldengate owner is same then ignore the below.

As root execute the below command.

./crsctl setperm resource ggateapp -o ggowner


e) Now start the resource using oracle user.

[oracle@rhel11gr2rac1 bin]$ ./crsctl start res ggateapp
CRS-2672: Attempting to start 'ggateapp' on 'rhel11gr2rac1'
CRS-2676: Start of 'ggateapp' on 'rhel11gr2rac1' succeeded

[oracle@rhel11gr2rac1 bin]$ ./crsctl status res ggateapp
NAME=ggateapp
TYPE=cluster_resource
TARGET=ONLINE
STATE=ONLINE on rhel11gr2rac1


[oracle@rhel11gr2rac1 bin]$ ./crsctl stop res ggateapp
CRS-2673: Attempting to stop 'ggateapp' on 'rhel11gr2rac1'
CRS-2677: Stop of 'ggateapp' on 'rhel11gr2rac1' succeeded


-- Now lets check the status in ggsci.


[oracle@rhel11gr2rac1 golden_gate]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:32:14

Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.



GGSCI (rhel11gr2rac1.manzoor.com) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     STOPPED
EXTRACT     STOPPED     PTBLS       00:00:00      00:00:29
EXTRACT     STOPPED     XTBLS       00:00:02      00:00:27



--- Its showing stopped..


[oracle@rhel11gr2rac1 bin]$ ./crsctl start res ggateapp
CRS-2672: Attempting to start 'ggateapp' on 'rhel11gr2rac1'
CRS-2676: Start of 'ggateapp' on 'rhel11gr2rac1' succeeded

[oracle@rhel11gr2rac1 golden_gate]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:32:14

Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.



GGSCI (rhel11gr2rac1.manzoor.com) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     PTBLS       00:00:00      00:00:10
EXTRACT     RUNNING     XTBLS       00:00:01      00:00:09


--- Now lets relocate the ggapp to the another node (-- scheduled downtime)


[oracle@rhel11gr2rac1 bin]$ ./crsctl relocate resource ggateapp -f
CRS-2673: Attempting to stop 'ggateapp' on 'rhel11gr2rac1'
CRS-2677: Stop of 'ggateapp' on 'rhel11gr2rac1' succeeded
CRS-2673: Attempting to stop 'goldengate-vip' on 'rhel11gr2rac1'
CRS-2677: Stop of 'goldengate-vip' on 'rhel11gr2rac1' succeeded
CRS-2672: Attempting to start 'goldengate-vip' on 'rhel11gr2rac2'
CRS-2676: Start of 'goldengate-vip' on 'rhel11gr2rac2' succeeded
CRS-2672: Attempting to start 'ggateapp' on 'rhel11gr2rac2'
CRS-2676: Start of 'ggateapp' on 'rhel11gr2rac2' succeeded

-- Lets check the gg process on node 2.

[oracle@rhel11gr2rac2 golden_gate]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:32:14

Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.



GGSCI (rhel11gr2rac2.manzoor.com) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     PTBLS       00:00:00      00:00:04
EXTRACT     RUNNING     XTBLS       00:00:05      00:00:06



[oracle@rhel11gr2rac1 bin]$ ./crsctl status resource ggateapp
NAME=ggateapp
TYPE=cluster_resource
TARGET=ONLINE
STATE=ONLINE on rhel11gr2rac2


-- Now lets check how the failover is working.

Lets crash the node 2...(power of from vmware home)

Below are the staus after the node 2 is down....

Cluster Resources
--------------------------------------------------------------------------------
ggateapp
      1        ONLINE  OFFLINE
goldengate-vip
      1        ONLINE  OFFLINE                               STARTING



Cluster Resources
--------------------------------------------------------------------------------
ggateapp
      1        ONLINE  ONLINE       rhel11gr2rac1
goldengate-vip
      1        ONLINE  ONLINE       rhel11gr2rac1



-- Could see the ggateapp resource and goldengate-vip has been failed over from
node 2 to node 1...


Below is the output from ggsci.


GGSCI (rhel11gr2rac1.manzoor.com) 3> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     ABENDED     PTBLS       00:00:00      00:04:10
EXTRACT     RUNNING     XTBLS       00:00:00      00:00:09



-- The manager and the extract process are started but the pump extract has been abended with the below
error.

2013-10-02 22:20:56  ERROR   OGG-01031  There is a problem in network communication, a remote file problem, encryption keys for target and source do not matc
h (if using ENCRYPT) or an unknown error. (Reply received is Unable to open file "./dirdat/XT000016" (error 11, Resource temporarily unavailable)).

2013-10-02 22:20:56  ERROR   OGG-01668  PROCESS ABENDING.



Source --


GGSCI (rhel11gr2rac1.manzoor.com) 17> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     ABENDED     PTBLS       00:00:00      00:08:18
EXTRACT     RUNNING     XTBLS       00:00:00      00:00:00


GGSCI (rhel11gr2rac1.manzoor.com) 18> info PTBLS

EXTRACT    PTBLS     Last Started 2013-10-02 22:24   Status ABENDED
Checkpoint Lag       00:00:00 (updated 00:08:24 ago)
Log Read Checkpoint  File ./dirdat/XT000019
                     2013-10-02 22:16:24.415084  RBA 1111



Target --


GGSCI (standalone2.manzoor.com) 2> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
REPLICAT    RUNNING     RTBLS       00:00:00      00:00:05


GGSCI (standalone2.manzoor.com) 3> info rtbls

REPLICAT   RTBLS     Last Started 2013-10-02 20:38   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:07 ago)
Log Read Checkpoint  File ./dirdat/XT000016
                     2013-10-02 22:16:24.422522  RBA 1991


GGSCI (standalone2.manzoor.com) 4> send rtbls status

Sending STATUS request to REPLICAT RTBLS ...
  Current status: At EOF
  Sequence #: 16
  RBA: 1991
  0 records in current transaction


--- The replicat process shows it is completed all the data and currently it is at end of file.



-- Now we will do an et (extact trail) rollover on the source and


Source--

GGSCI (rhel11gr2rac1.manzoor.com) 21> alter extract ptbls etrollover

2013-10-02 22:34:04  INFO    OGG-01520  Rollover performed.  For each affected output trail of Version 10 or higher format, after
starting the source extract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's scan to the new trail file;  it will not happen automatically.
EXTRACT altered.


GGSCI (rhel11gr2rac1.manzoor.com) 23> start extract ptbls

Sending START request to MANAGER ...
EXTRACT PTBLS starting


GGSCI (rhel11gr2rac1.manzoor.com) 24> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     PTBLS       00:00:00      00:00:58
EXTRACT     RUNNING     XTBLS       00:00:00      00:00:08

GGSCI (rhel11gr2rac1.manzoor.com) 31> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     PTBLS       00:00:00      00:00:05
EXTRACT     RUNNING     XTBLS       00:00:00      00:00:00

GGSCI (rhel11gr2rac1.manzoor.com) 32> info ptbls

EXTRACT    PTBLS     Last Started 2013-10-02 22:35   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:02 ago)
Log Read Checkpoint  File ./dirdat/XT000020
                     2013-10-02 22:20:46.216461  RBA 1111



--- Target

in target we need to move the replicat to start again from the next sequnce since we have
did an ET rollover on the source.

GGSCI (standalone2.manzoor.com) 3> stop replicat rtbls

Sending STOP request to REPLICAT RTBLS ...
Request processed.


GGSCI (standalone2.manzoor.com) 4> alter replicat rtbls extseqno 17 extrba 0
REPLICAT altered.



GGSCI (standalone2.manzoor.com) 5> start replicat rtbls


-- Now lets update some data on source.

SQL> select count(*) from emp;

  COUNT(*)
----------
      4000

SQL> begin
  2     for i in 4001..5000 loop
  3             insert into emp values (i, dbms_random.string('U',30),30);
  4     END LOOP;
  5     commit;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> select count(*) from emp;

  COUNT(*)
----------
      5000


-- Lets check whether it replicated to target.




SQL> select count(*) from emp;

  COUNT(*)
----------
      5000




---- Reference -- Oracle White Paper—Oracle GoldenGate high availability with Oracle Clusterware

Note -  1527310.1,