Analysis and Workaround for the Solaris 10.2.0.3 Patchset Problem on VxFS Files

In the blog entry about the Solaris 10.2.0.3 patchset not functioning on VxFS, I reported that Metalink says the patchset does not work on VxFS. That is true. Since the Metalink notes have not been updated, I’ll blog a bit about what I’ve found out. Note, the Metalink note says not to use the patchset because of this bug. I am not here to fight Oracle support.

It turns out that what is happening is the Solaris porting group is now using an ioctl() that is not supported on VxFS files—but not calling the ioctl(2) directly. The bug results in an error stack a bit like this:

ORA-01501: CREATE DATABASE failed
ORA-00200: control file could not be created
ORA-00202: control file: ‘/some/path/control01.ctl’
ORA-27037: unable to obtain file status
SVR4 Error: 25: Inappropriate ioctl for device

The text in bug number 5747918 is nice enough to include the output of truss when the problem happens. The ioctl() is _ION. This is the ioctl(2) that is implemented within the directio(3C) library routine. No, don’t believe this developers.sun.com webpage when it refers to directio(3C) as a system call. It isn’t. However, they do provide an example of using the directio(3C) call in this small directio(3C) test program.

The Solaris directio(3C) call is used to push direct I/O onto a file. In the demonstration of the bug (5747918), the 10.2.0.3 patchset is trying to push direct I/O onto the file descriptor held on the control file stored in VxFS. That isn’t how you get direct I/O on VxFS. I wonder if this call to directio(3C) only happens if you have filesystemio_options=DirectIO|setall. That would make sense.

Workaround
If you use ODM on VxFS, this call to directio(3C) does not occur so you wont see the problem. Thanks to a reader comment on my blog and my age old friend still at Veritas (I mean Symantec) for verification that ODM works around the problem.

A Test Program

If you create a file in a VxFS mount called “foo”, like this:

$ dd if=/dev/zero of=foo bs=4096 count=16

And then compile and run the following small program, you will see the same problem Oracle 10.2.0.3 is exhibiting. The same program on UFS should work fine.

$ cat t.c
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#include <stdio.h>
#include <sys/types.h>
#include <sys/fcntl.h>

main ()
{

int ret, handle;
handle = open (“./foo”, O_RDONLY);
if ((ret = directio (handle, DIRECTIO_ON)) < 0)
{
printf (“Failure : return code is: %d\n”, ret);
}
else
{
printf
(“The ioctl embedded in the directio(3C) call functions on the file.\n”);
}
} /* End */

Another Potential Workaround
If you want to test the rest of what 10.2.0.3 has to offer without ODM—even with a VxFS database—I think you should be able to explicitly set filesystemio_options=none and get around the problem. Be aware I have not tested that however. The worst thing that could happen is that setting filesystemio_options in this manner is indeed a workaround that would allow you to test the many other reasons you actually need 10.2.0.3!

If you find otherwise, please comment on the blog.

16 Responses to “Analysis and Workaround for the Solaris 10.2.0.3 Patchset Problem on VxFS Files”


  1. 1 cschultz January 11, 2007 at 6:56 pm

    Great sleuthing! =) Now how long will it take for Oracle to re-release the patchset? Probably after they rigorously test it to determine there are no more bugs, eh? =)

  2. 2 Fairlie Rego January 11, 2007 at 10:51 pm

    I hope this change to make the ioctl call is not made because of the push to try to get OCFS working on Solaris

  3. 3 kevinclosson January 11, 2007 at 11:32 pm

    Fairlie,

    Not a snowball chance in Hades, Sheol, Gehenna or Tophet!

    Thanks for stopping by!

  4. 4 Amir Hameed January 13, 2007 at 12:27 am

    Kevin,
    I ran the test where I:
    a. Copied your code into a text file and compiled it
    b. Created a file via “dd if=/dev/zero of=foo bs=4096 count=1”
    c. Ran it separately on a VxFS-mounted filesystem and a UFS-mounted filesystem

    The following were the results:
    1. On the VxFS filesystem, I got the following output after running your code:
    “Failure : return code is: -1”
    2. On the UFS filesystem, I got the following output after running your code:
    “The ioctl embedded in the directio(3C) call functions on the file.”

  5. 6 Gary Chaika February 10, 2007 at 5:44 am

    We’re running into a problem with the 10.2.0.3 patchset on Solaris 2.9. When we installed it the ONS service on our RAC servers started crashing. So I searched Oracle and sure enough found another patch to fix their patchset. But low and behold even after that patch it’s STILL failing. I don’t know who’s QA’ing Oracle’s patches but they need to be fired.

  6. 7 kevinclosson February 10, 2007 at 4:12 pm

    Gary,

    Do you have a Metalink number for the patch for the patch in case anyone comes by here off a google search?

    QA?

  7. 8 Gary Chaika February 13, 2007 at 6:39 pm

    Yes, it’s bug fix #5749953. Like I said above we have applied this patch to two different RAC clusters and the ONS is still crashing. We currently have a case open with Oracle to resolve this. We were planning on using the 10.2.0.3 patchset to fix our Day Light Savings Time issues that are coming in in March of 2007.

    I’ll keep you updated with any results from Oracle’s S/R.

  8. 9 Gary Chaika February 26, 2007 at 8:08 pm

    Update on the ONS bug. My case sat open with Oracle Support for days so I decided to fix it on my own.

    It turns out that the Opatch#5749953 DID put down a new ONS executable but it only put it under $ORACLE_HOME/opmn/bin. When I checked where my ONS was starting from I noticed it was from $ORA_CRS_HOME/opmn/bin NOT the newly patched one under $ORACLE_HOME/opmn/bin.

    I tried changing my oracle $PATH to make sure $ORACLE_HOME came before $ORA_CRS_HOME but RAC still started it from $ORA_CRS_HOME.

    So I backed up the ons executable under $ORA_CRS_HOME/opmn/bin and copied the one the opatch #5749953 put under $ORACLE_HOME/opmn/bin to $ORA_CRS_HOME/opmn/bin.

    The Opatch also told you to make a change in the script onsctl which I did under both $ORACLE_HOME/bin AND $ORA_CRS_HOME/bin (back them up first).

    I restared CRS and all the databases and sure enough it seem to fix the issue.

    I’m waiting to hear back from Oracle Support whether this “band-aid” is acceptable or not.

    I will update here further if Oracle Support comes up with a better fix.

    Gary

  9. 10 Brian Michael March 5, 2007 at 7:03 pm

    See metalink note id: Note:405825.1

    The appropriate patchset for this bug appears to be 5752399

    download patch
    ftp://updates.oracle.com
    put metalink username/password in
    cd 5752399
    bin
    get p*SOLARIS64.zip
    quit

    unzip the patch
    cd 5752399
    $ORACLE_HOME/OPatch/opatch apply OPatch.SKIP_VERIFY=true

    It worked for me. System up and running

    Thanks for the original research. It helped narrow issue down.

    Sincerely,

    Brian P Michael
    Senior Consultant, TUSC
    http://www.tusc.com

  10. 12 Lalji Varsani March 21, 2007 at 1:54 pm

    Thanks Kevin for all the info available here.

    I encountered exactly same problem while installing 10.2.0.3 on Solaris2.9.
    Also Thanks to Brian Michael (TUSC) for providing bug info.
    After applying the patch – I managed to successfully create the database.

    Regards,
    Lalji Varsani
    T-Systems.co.uk

  11. 13 Jack van Zanen October 25, 2007 at 11:49 pm

    I had similar problems when I was preparing a machine for rman clone. It would not create the password file properly even though it did not give and error. and it gave me the OS error when trying to create spfile from pfile (which pointed me in the right direction). However with the OS error it gave me the error that it could not process the parameters indicating one or more of the parameters in my init file was wrong.
    Applying the above mentioned patch made it all go away.

  12. 14 anonymous September 24, 2008 at 1:38 am

    I ran into the same issue on Solaris Sparc.
    Looks like now Oracle has a patch #5752399 to fix this.

  13. 15 Alexander Murdoch February 12, 2009 at 7:56 pm

    I had a similar issue and the problem was that I had an init.ora file into dbs directory, and then somebody created an EMPTY spfile.ora as well.

    So … be sure you are using the right init file 🙂

    Bye.
    Alex.


  1. 1 Inappropriate ioctl for device and Oracle 10.2.0.3 « Malcolm's Tech Tips Trackback on January 6, 2010 at 8:21 pm

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.




DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 743 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.