Using Oracle Clusterware for Non-RAC Purposes

In a recent post on the oracle-l list, a participant asked:

Hi, has anyone used 10.2 Clusterware with OCFS2 on RHEL5 to get single instance failover from one host to another?

My buddy Matt Zito (we’ve had beers before so we’re buddies) of GridApp followed up with:

I have a customer that does that – it apparently works very well […text deleted…]

However, the downside of CRS as single-instance is that both sides of the cluster need to be licensed for Oracle (as I understand the CRS license).

Licensing
Licensing is always the topic for interesting conversation. To get to the bottom of this, I sent an email to the first Oracle person I ever heard pitch the idea of CRS for non-RAC purposes-Marshall Presser. Hmm, I think I can call him my old buddy too since we also had beers. Or then again if I’m not mistaken Marshall is an old Pyramid_Technology guy and since I am an old Sequent_Computer_Systems guy, we are sort of long-lost cousins. Anyway, back to the topic. Marshall was nice enough to send me a very current reference for Oracle’s licensing terms for using CRS for non-RAC purposes with a quote from Oracle® Database Licensing Information11g Release 1 (11.1) Part Number B28287-01:

Oracle Clusterware can be installed and used to protect any Oracle or third-party software provided any of the following conditions are met:

1. The software being protected is from Oracle.

2. The software being protected uses an Oracle Database.

3. The software being protected is running on Oracle Unbreakable Linux.

4. The software being protected is running in a cluster where at least one machine involved in the cluster is licensed using the appropriate metric for either Oracle Database Enterprise Edition or Oracle Database Standard Edition. A cluster is defined to include all the machines that share the same Oracle Cluster Registry (OCR) and Voting Disk

Unclear Clarity
So, as is usually the case with licensing, we have unclear clarity. And, yes, I know this is 11g information and the original query was about 10g, but it stands to reason that with some digging there would be a 10g equivalent. I wonder why criteria 1 above is stated. Since only 1 criteria is needed, I suppose we can interpret as follows:

You can use CRS on Unbreakable Linux for anything you want (rule 3)
You can protect non-RAC Oracle databases on any platform (rule 1)
You can protect any software the connects to an Oracle database on any platform (rule 2)
You can protect anything on any cluster as long as one node in the cluster is running an instance of EE or SE (rule 4)

These are pretty liberal rules. I think Oracle is keen on widespread adoption of Oracle Clusterware for general purpose HA, but then I could be misreading the tea leaves.

What Does This Really Mean?
What we’re talking about here is using CRS to monitor (“check” in CRS parlance, “probe” in generic industry terms) an instance of Oracle and take action if the action program fails. In general failover HA terms, probes (checks in CRS terms) fail as follows:

The server is up but the database is down
The server is down

Failover
In case 1 above, the HA engine will restart the database and in case 2 it will fail the database over to another server. The HA engine (in this case CRS) is smart enough to fail the service over to a system that is actually alive and has functional disk access and network interfaces. That is one the roles of any HA clusterware (e.g., CRS, Steel Eye, VCS, Service Guard, HACMP, Red Hat Cluster Suite, PolyServe, etc).

Time Outs
The other way the HA engine will take action is if your probe (check script/program) seizes (times out). In that case, most HA engines will execute “restart” action which is generally a stop action followed by a start action and another probe (check). This is not an endless loop though. Most HA engines have a tunable max for retries (restart attempts in CRS) and then it will failover to the defined backup server. Be aware, however, that a seized service (such as a non-RAC database instance) could be so locked up it didn’t stop when the HA engine tried its restart action. In that case, you have Oracle processes with files open. If you failover to a server that accesses the database on a shared filesystem such as NFS or OCFS, you have some things to be concerned about. You won’t be able to start the instance until the $ORACLE_HOME/dbs/lk${ORACLE_SID} file is removed, but simply removing it still leaves that other catatonic instance up on the ill server. These solutions can become complex.

The topic of what probe (check) actions are appropriate is the subject matter of very long, drawn-out discussions rife with theory and prejudice. I’ve been there and I’ve done that. I bet most folks that use CRS to start/stop and check non-RAC databases will likely use the script interface. Note, as with all HA engines out there, you can write a C probe (or CRS action program) because all the engine is looking for is a return code (success/failure).

I think the most clever probe action I’ve heard to date came from fellow OakTable Network member Tim Gorman. Tim once suggested that a great probe action would be to make a purposeful failed attempt to connect such as:

$ sqlplus foo/bar <<EOF 2>&1 | grep 1017
> REM There is no user called foo...expect ORA 1017
> exit;
> EOF
ORA-01017: invalid username/password; logon denied
$ echo $?
0

If you get anything other than ORA-01017, something is ill. In this case, a success for grep(1) is a success for the probe/check. That is, if grep(1) gets it’s text, the server returned ORA-01017 thus the instance was well enough to perform the functionality of user authentication. Your check script would get this in grep(1)’s return code ($?).

Trying to connect as a bogus user actually tests quite a bit of server functionality (SQL parsing, user authentication and so forth). I think this may actually create a temporary session as well. It certainly tests the server’s ability to fork(2) sqlplus and exec(2) $ORACLE_HOME/bin/oracle so you are testing the OS VM, process slots, etc. All in all, it is a very clever probe (check action). If you wanted to use CRS to check both the health of SQL*Net and a non-RAC database instance, then you could do this same bogus connect attempt through the listener. If the listener is down, you’ll get the appropriate error text. Then again, if you wanted to make a heavy probe/check, you could connect as an application user and update a dummy row in a table or something like that. The sky is the limit with this sort of HA kit.

Additional Material
Oracle has more information in the form of whitepapers:

13 Responses to “Using Oracle Clusterware for Non-RAC Purposes”

Feed for this Entry Trackback Address

1 Matt Zito August 24, 2007 at 9:05 pm

Well, we ARE friends, Kevin, not just cause we’ve had beers together. It’s interesting how the mind plays tricks – I’d actually read that exact verbiage or equivalent from Oracle licensing (possibly in response to a TAR I opened), but interpreted it at the time as “if the box you’re using is licensed for Oracle then go nuts” – I somehow missed the piece with just having it be one node in the cluster.

As you said, unclear clarity. Not to mention, another situation where Oracle is aggressively trying to take revenue away from a partner company to help justify its own price point. It just very often seems to be poor Veritas/Symantec that’s getting the short end of the stick….

Reply
2 kevinclosson August 24, 2007 at 9:31 pm

Matt,

I read this as a complete free-for-all for clusters running Oracle Unbreakable Linux. But then that is confusing because Oracle Unbreakable Linux used to be a support program. Now it is synonymous with Oracle Enterprise Linux:

Click to access ubl-faq.pdf

http://www.oracle.com/technologies/linux/index.html

So I take this to mean this if you have Oracle Entperise Linux you can use CRS to provide HA for anything you’d like.

As for the affect this sort of stuff has on Veritas, they are not alone. Oracle wants in this space and they appear to be serious. Oracle takes a “co-opetitive” stance towards other players like Veritas, Steel Eye, PolyServe, etc.

Reply
3 Örjan August 25, 2007 at 6:53 pm

Hmm, how do I get my FTP server on Solaris to “connect to an Oracle database on any platform (rule 2) ”

Seems that I have to do some hacking…

I wish it should have been stated as

“You can protect any software the connects to Oracle sofware or is connected by Oracle sofware on any platform ”

Since it is Oracle BPEL PM that connects to the FTP server… 🙂

Reply
4 Örjan August 25, 2007 at 7:02 pm

Unless we run it on the Oracle rac nodes that is.. rule 4

Reply
5 Michael Norbert April 3, 2008 at 2:00 pm

I use this script on all databases I monitor. I got it years ago from lazydba, I think it was from a guy named Kirti. I cron it every 5 minutes and it works like a charm. It was pointed out that it wouldn’t catch an alter system enable restricted session, which I’m not too concerned about. Just last week the script caught a listener hang bug

sqlplus -silent < /tmp/$$.1
a/a@${DBNAME}
exit;
EOF
egrep ‘ORA-121|ORA-01034’ /tmp/$$.1 > /dev/null
if [[ $? = 0 ]]
then
mailx -s ” ${DBNAME} is not accessible – db is down” ${MAILLIST} /dev/null
if [[ $? = 0 ]]
then
rm /tmp/$$.1
exit 16
else
mailx -s ” ${DBNAME} is not accessible – db is down” ${MAILLIST} < /tmp/$$.1
fi
fi

rm /tmp/$$.1

exit

Reply
6 Jeff Wong May 1, 2008 at 11:28 pm

Hi Kevin

That single-instance PDF has apparently been pulled down from Oracle’s site… there seems to be an 11g version up now:

Click to access SI_DB_Failover_11g.pdf

Reply
7 Vijayaraghavan July 23, 2009 at 7:15 am

Hi Kevin,

My Scenario is this, I need to know whether we need to install Oracle Clusterware Services.

I have Two AIX BOX, Both of them have the same hardware architecture expect on has 2 CPU and the second one has 1 CPU.
Both of them have 16G of RAM.
All my Oracle files and AIX users is in my production server, when cluster is moved to second one all the mount points and the users are automatically moved.

My Doubt is, IBM configured HAMCP Cluster and cluster is moving from one node to another node. For the Oracle to work on the second server do we need to install Oracle Clusterware?

Regards,
Vijayaraghavan K

Reply
- 8 kevinclosson July 23, 2009 at 6:54 pm
  
  Hi Vijay,
  
  What you describe here is a failover-HA solution (non-Real Application Clusters). If you have host clusterware doing the monitor/failover in such a scenario, Oracle clusterware is not needed. This sort of failover HA has been deployed countless times for many years and sometimes satisfies up-time requirement. There are factions that would assert you should do the exact opposite under all circumstances. That is, use Oracle clusterware to do the monitoring and failover. That is the essence of what I’ve blogged about in this post–except, of course, for my obvious lack of near-religious fanaticism on the matter.
  
  The short answer is:
  
  1. Pretty much any clusterware these days can monitor the health of an application and fail it over to a surviving host. Such clusterware must be able to bring online all the resources needed for the application being failed over and that is what offerings like HACMP, VCS, Service Guard, PolyServe, and of course Oracle Clusterware do.
  2. When running non-RAC failover-HA you don’t need Oracle clusterware
  3. When running failover-HA you can use Oracle clusterware in place of offerings like HACMP, VCS, ServiceGuard, etc…
  
  I hope this answers your question.
  
  Reply
9 Sergio December 9, 2009 at 10:17 pm

Updated in this 11gR2 document:

http://download.oracle.com/docs/cd/E11882_01/license.112/e10594/editions.htm#CJAHFHBJ

Under “Grid Infrastructure”. Note the refinement “The server OS is supported by a valid Oracle Unbreakable Linux support contract”

Sergio

Reply
- 10 kevinclosson December 9, 2009 at 10:20 pm
  
  Thanks for the information, Sergio.
  
  Reply
  - 11 Markus Michalewicz January 3, 2011 at 7:26 pm
    
    The update might be new in the 11g Rel. 2 documentation, but the matter of fact was the case with 11g Rel. 1 already. This sentence in the respective 11g Rel. 1 licensing guide:
    
    “The software being protected is running on Oracle Unbreakable Linux.” was meant to say this, but was unfavorably worded. On this page: http://www.oracle.com/goto/clusterware , however, it says:
    
    Oracle Clusterware for Oracle Unbreakable Linux
    
    * Oracle Unbreakable Linux support customers at the Basic and Premier support levels can download and deploy Oracle Clusterware at no additional license fee or support cost.
    
    Please, scroll down to the end of the page.
    
    Hope that helps. Thanks,
    Markus
    
    Reply
  - 12 kevinclosson January 3, 2011 at 7:49 pm
    
    Thanks for the info, Markus.
    
    Reply

1 Oracle Clusterware for Non-Real Application Clusters Purposes. « Kevin Closson’s Oracle Blog: Platform, Storage & Clustering Topics Related to Oracle Databases Trackback on March 31, 2008 at 3:58 pm

	David Zheng on Announcing pgio (The SLOB Meth…
	Oracle redo log perf… on File Systems For A Database? C…
	Oracle redo log perf… on Yes, File Systems Still Need T…
	kevinclosson on Announcing SLOB 2.5.4
	pgio nutzen? - I/O W… on So pgio Does Not Accurately Re…

Kevin Closson's Blog: Platforms, Databases and Storage