Monday, December 25, 2017

Protecting Oracle OHS/Apache using Oracle Clusterware 11g/12c

This month's article is about protecting Apache Application service using Oracle RAC Clusterware 12c. To do this we should create an application VIP, an Action Script and a Resource on the Clusterware.
Here we will see the step by step configuration to protect Apache application and similar configuration can be plagiarized to protect Oracle OHS or any other Application service.
The below steps will work for both 11g and 12c RAC Clusterware software,

Step1. As the root user, verify that the Apache RPMs, httpd, httpd-devel and httpd-manual are installed on the two nodes on which Oracle clusterware is installed and configured.
grid@host01# su -
root@host01# rpm -qa | grep httpd
httpd-2.4.6-40.0.1.el7.x86_64
httpd-manual-2.4.6-40.0.1.el7.noarch
httpd-tools-2.4.6-40.0.1.el7.x86_64

Repeat on second node
root@host02# rpm -qa | grep httpd
httpd-2.4.6-40.0.1.el7.x86_64
httpd-manual-2.4.6-40.0.1.el7.noarch
httpd-tools-2.4.6-40.0.1.el7.x86_64


Step2. As the root user, start the Apache application on first node,
# apachectl start
Now access the Apache home page and verify it is working,
http://host01.samiora.blogspot.com:7777

OHS is managed by OPMN, the command line interface to OPMN is opmnctl.
Start OPMN and all managed processed, if not already started,
# ./opmnctl startall
# ./opmnctl status -l
# ./opmnctl stopproc process-type=OHS
[appldev@host01 scripts]$ sh adopmnctl.sh status -l
You are running adopmnctl.sh version 120.0.12020000.2
Checking status of OPMN managed processes...
Processes in Instance: EBS_web_dev_OHS1
---------------------------------+--------------------+---------+----------+------------+----------+-----------+------
ias-component                    | process-type       |     pid | status   |        uid |  memused |    uptime | ports
---------------------------------+--------------------+---------+----------+------------+----------+-----------+------
EBS_web_dev                   | OHS                |    8356 | Alive    |  778125435 |  2125916 |   2:03:55 | https:4447,https:10004,http:7777

adopmnctl.sh: exiting with status 0
adopmnctl.sh: check the logfile /u01/dev/fs1/inst/apps/dev_host01/logs/appl/admin/log/adopmnctl.txt for more information ...

Verify OHS and Apache pages are appearing or not on host01 where the service is started,
 
 

Step3. Now create an action script to control the application. This script must be accessible by all nodes on which the application resource can be located.

A) As the root user, create a script on the first node called 'apache.scr' in /usr/local/bin that will start, stop, check status and clean up if the application does not exit cleanly. Make sure that the host specified in the WEBPAGECHECK variable is your first node.
root@host01# vi /usr/local/bin/apache.scr
#!/bin/bash
HTTPDCONFLOCATION=/etc/httpd/conf/httpd.conf
WEBPAGECHECK=http://host01.samiora.blogspot.com:80/icons/apache_pb.gif
case $1 in
'start')
/usr/sbin/apachectl -k start -f $HTTPDCONFLOCATION
RET=$?
;;
'stop')
/usr/sbin/apachectl -k stop
RET=$?
;;
'clean')
/usr/sbin/apachectl -k stop
RET=$?
;;
'check')
/usr/bin/wget -q --delete-after $WEBPAGECHECK
RET=$?
;;
*)
RET=0
;;
esac
# 0: success; 1 : error
if [ $RET -eq 0 ]; then
exit 0
else
exit 1
fi

root@host01# chmod 755 /usr/local/bin/apache.scr
root@host01# apache.scr start
Verify the web page and it should be working.
root@host01# apache.scr stop
Verify the web page and it will not be working.

B) As root, create a script on the second node called 'apache.scr' in /usr/bin/local that will start, stop, check status and clean up if the application does not exit cleanly. Make sure that the host specified in WEBPAGECHECK variable is your second node.
root@host02# vi /usr/local/bin/apache.scr
#!/bin/bash
HTTPDCONFLOCATION=/etc/httpd/conf/httpd.conf
WEBPAGECHECK=http://host02.samiora.blogspot.com:80/icons/apache_pb.gif
case $1 in
'start')
/usr/sbin/apachectl -k start -f $HTTPDCONFLOCATION
RET=$?
;;
'stop')
/usr/sbin/apachectl -k stop
RET=$?
;;
'clean')
/usr/sbin/apachectl -k stop
RET=$?
;;
'check')
/usr/bin/wget -q --delete-after $WEBPAGECHECK
RET=$?
;;
*)
RET=0
;;
esac
# 0: success; 1 : error
if [ $RET -eq 0 ]; then
exit 0
else
exit 1
fi

root@host02# chmod 755 /usr/local/bin/apache.scr
root@host02# apache.scr start
Verify the web page and it should be working.
root@host02# apache.scr stop
Verify the web page and it will not be working.

Step4. Next, you must validate the return code of a check failure using the new script. The Apache server should NOT be running on either node. Run 'apache.scr check' and immediately test the return code by issuing an 'echo $?' command. This must be run immediately after the 'apache.scr check' command because the shell variable $? holds the exit code of the previous command run from the shell. An unsuccessful check should return an exit code of 1. You should do this on both nodes.
root@host01# apache.scr check
root@host01# echo $?
1
root@host02# apache.scr check
root@host02# echo $?
1

Step5. As the grid user, create a server pool for the resource called myApache_sp. This pool contains your first two hosts of the cluster and is a child of the Generic pool.
grid@host01# id
uid=502(grid) gid=54321(oinstall) groups=504(asmadmin),505(asmdba),506(asmoper),54321(oinstall)
grid@host01# . oraenv
ORACLE_SID = [grid] ? +ASM1
The Oracle base has been set to /u01/app/grid
grid@host01# /u01/app/12.2.0/grid/bin/crsctl add serverpool myApache_sp -attr "PARENT_POOLS=Generic,SERVER_NAMES=host01 host02"

Step6. Check the status of the new pool of your cluster.
grid@host01# /u01/app/12.2.0/grid/bin/crsctl status server -f
NAME=host01
STATE=ONLINE
ACTIVE_POOLS=myApache_sp Generic
STATE_DETAILS=

NAME=host02
STATE=ONLINE
ACTIVE_POOLS=myApache_sp Generic
STATE_DETAILS=....

Step7. Add the Apache Resource, which can be called myApache, to the myApache_sp subpool that has Generic as a parent. It must be performed as root because the resource requires root authority because of listening on the default privileged port 80. set CHECK_INTERVAL to 30, RESTART_ATTEMPTS to 2 and PLACEMENT to restricted.
root@host01# su -
root@host01# id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel)
root@host01# /u01/app/12.2.0/grid/bin/crsctl add resource myApache -type cluster_resource -attr "ACTION_SCRIPT=/usr/local/bin/apache.scr, PLACEMENT='restricted', SERVER_POOLS=myApache_sp, CHECK_INTERVAL='30', RESTART_ATTEMPTS='2'"

Step8. View the static attributes of the myApache resource with the crsctl status resource myApache -p -f command.
root@host01# /u01/app/12.2.0/grid/bin/crsctl status resource myApache -f
NAME=myApache
TYPE=cluster_resource
STATE=OFFLINE
TARGET=ONLINE
ACL=owner:root:rwx,pgrp:root:r-x,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=/usr/local/bin/apache.scr
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/scriptagent
AUTO_START=restore
CARDINALITY=1
CARDINALITY_ID=0
CHECK_INTERVAL=30
CREATION_SEED=30
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION=
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=
ID=myApache
INSTANCE_FAILOVER=0
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PLACEMENT=restricted
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=2
SCRIPT_TIMEOUT=60
SERVER_POOLS=myApache_sp
START_DEPENDENCIES=
START_TIMEOUT=0
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=
STOP_TIMEOUT=0
UPTIME_THRESHOLD=1h


Step9. Use the 'crsctl start resource myApache' command to start the new resource. Use the 'crsctl status resource myApache' command to confirm that the resource is online on the first node. If you like, open a browser and verify the apache home page as shown in step 2 above.
root@host01# /u01/app/12.2.0/grid/bin/crsctl start resource myApache
root@host01# /u01/app/12.2.0/grid/bin/crsctl status resource myApache
resource myApache
NAME=myApache
TYPE=cluster_resource
TARGET=ONLINE
STATE=ONLINE on host01

Step10. Confirm that Apache is NOT running on your second node. The easiest way to do this is to check for the running '/usr/sbin/httpd -k start -f /etc/httpd/conf/httpd.confd' processes with the ps command.
root@host02# ps -ef | grep -i "httpd -k"

Step11. Next, simulate a node failure on your first node using the init command as root. Before issuing the reboot on the first node, open a VNC session on the second node and as the root user execute below script so that you can monitor the failover.
monitor.sh
while true
do
ps -ef | grep -i "httpd -k"
  sleep 1
done

root@host01# reboot ==>To initiate a reboot, simulating a node failure.
At the same time on second node run the below script as root user,
root@host02# sh monitor.sh ==> you will see that after sometime the httpd service will be started on host02.

Step12. Verify the failover from the host01 to host02 with the 'crsctl status resource myApache -t' command.
root@host02# /u01/app/12.2.0/grid/bin/crsctl status resource myApache -t
NAME      TARGET  STATE  SERVER  STATE_DETAILS 
Cluster Resources
myApache

1               ONLINE  ONLINE  host02
Now access Apache page on host02 and it should display while on host01 it will not.
http://host02.samiora.blogspot.com

Step13. Use the 'crsctl relocate resource' command to move the myApache resource back to host01.
root@host01# /u01/app/12.2.0/grid/bin/crsctl relocate resource myApache
CRS-2673: Attempting to stop 'myApache' on 'host02'
CRS-2677: Stop of 'myApache' on 'host02' succeeded
CRS-2672: Attempting to start 'myApache' on 'host01'
CRS-2676: Start of 'myApache' on 'host01' succeeded
Now access Apache page on host01 and it should display while on host02 it will not.
http://host01.samiora.blogspot.com

For any queries on Oracle RAC 11g or RAC 12c you can email me on samiappsdba@gmail.com