Using Linux Perf(1) To Analyze Database Processes – Part I

Troubleshooting Runaway Processes
Everyone reads Tanel Poder’s material—for good reason.

I took particular interest in his recent post about investigating where an apparently runaway, cpu-bound, Oracle foreground process is spending its time.  That article can be found here.

I’ve been meaning to do some blogging about analyzing Oracle execution with perf(1). I think Tanel’s post is a good segue for me to do so. You might ask, however, why I would bother attempting to add value in a space Tanel has already blogged. Well, to that I would simply say that modern systems professionals need as many tools as they can get their hands on.

Tanel, please ping me if you object, for any reason, to my direct links to your scripts.

Monitoring Window
Tanel’s approach to monitoring his cpu-bound foreground process is based on the pstack(1) utility. Once he identified the spinning process he fired off a 100 pstack commands in a loop. For each iteration, the output of pstack was piped through a text processing script (os_explain).  From my reading of that script I estimate it probably takes about 10 milliseconds to process the data flowing into it. I’m estimating here so please bear with me. If pstack execution requires zero wall clock time I think these 100 snoops at the process stack likely occur in 1 second of wall clock time. According to Tanel, the SQL takes about 23 minutes to complete. If the foreground process is looping just a small amount of code then I see no problem monitoring a small portion of its overall execution time. Since you can see what Tanel is doing you know it would be simple to grab the process every so often throughout its life and monitor the approximate 1 second with the 100 iterations of pstack. The technique and tools Tanel shares here are extensible. That is good.

Perturbation
The Linux pstack utility stops the execution of a process in order to read its stack and produce output to standard out. Performance engineering techniques should always be weighed against the perturbation levied by the monitoring method.  I recall the olden days of CA Unicenter brutally punishing a system just to report performance metrics. Ouch.

Monitoring a cpu-bound process, even periodically, with pstack will perturb performance. There is such a thing as an acceptable amount of perturbation though. Once must decide for themselves what that acceptable level is.

I would like to offer an example of pstack perturbation. For this exercise I will use the silly core-stalling program I call “fat.” This program suffers a lot of processor stalls because it has very poor cache locality. The program can be found here along with its cousin, “skinny.” Aptly named, skinny fits in cache and does not stall.  The following shows that the program executes in 51.251 seconds on my Nehalem Xeon server when executed in isolation.  However, when I add 100 pokes with pstack the run time suffers 5% degradation.

$
$ cat p.sh
#!/bin/bash

time ./fat &
p=`ps -f | awk ' $NF ~ /.fat/ { print $2 }'`

for i in {1..100}
do
	pstack $p
done  > /dev/null 2>&1
wait
$
$ sh ./p.sh

real	0m53.836s
user	0m50.604s
sys	0m0.024s
$
$ time ./fat

real	0m51.251s
user	0m51.221s
sys	0m0.013s
$

I know 5% is not a lot but this is a cpu-bound process so that is a lot of cycles. Moreover, this is a process that is not sharing any data nor exercising any application concurrency mechanisms (e.g., spinlocks, semaphores).  I understand there is little danger in perturbing the performance of some runaway processes but if such a process is also sharing data there can be ancillary affect when troubleshooting a large-scale problem.  Before I move on I’d like to show how the execution time of “fat” changes when I gather 100 pstacks as soon as the process is invoked and then wait 10 seconds before gathering another 100 stack readings. As the following shows, the perturbation meter climbs up to 13%:

$ cat p2.sh
#!/bin/bash

time ./fat &
p=`ps -f | awk ' $NF ~ /.fat/ { print $2 }'`

for t in 1 2
do
	for i in {1..100}
	do
		pstack $p
	done  > /dev/null 2>&1
sleep 10
done

wait
$ sh ./p2.sh

real	0m57.930s
user	0m51.480s
sys	0m0.050s
$

Performance Profiling With perf(1)
As I said, I’ve been trying to work out the time to blog about how important perf(1) should be to you in your Oracle monitoring—or any performance analysis for that matter. It requires a modern Linux distribution (e.g., RHEL 6, OEL 6) though. I can’t write a tutorial on perf(1) in this post. However, since I’m on the topic of perturbation via monitoring tools, I’ll offer an example of perf-record(1) using my little “fat” program:

$ time perf record ./fat
[ perf record: Woken up 8 times to write data ]
[ perf record: Captured and wrote 1.812 MB perf.data (~79187 samples) ]

real	0m52.299s
user	0m52.234s
sys	0m0.046s
$

$ perf report --stdio | grep fat
# cmdline : /usr/bin/perf record ./fat
    99.81%      fat  fat                [.] main
     0.14%      fat  [kernel.kallsyms]  [k] __do_softirq
     0.02%      fat  libc-2.12.so       [.] __memset_sse2
     0.01%      fat  [kernel.kallsyms]  [k] clear_page_c
     0.01%      fat  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     0.00%      fat  [kernel.kallsyms]  [k] get_page_from_freelist
     0.00%      fat  [kernel.kallsyms]  [k] free_pages_prepare

So, we see that monitoring with perf(1) does levy a small tax—2%. However, I just monitored the entire execution of the program.

Monitor Oracle Foreground Process With perf-record(1)
Now I’ll run the program Tanel used in his example.  I’ll record 5 minutes of execution by identifying the process ID of the spinning shadow process and then using the –p option to perf-record(1). The way perf(1) works is to monitor the entire execution of the program you attach to—unless you give it a process to keep it busy (or use the -c option). Now don’t be confused. In the following example I’m telling perf-record(1) to monitor the shadow process. The usage of sleep 300 is just my way of telling it to finish in 5 minutes.

$ cat t.sh

sqlplus / as sysdba < /dev/null 2>&1 &
[1] 462
$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
oracle     462  2259  0 10:11 pts/0    00:00:00 sh ./t.sh
oracle     463   462  0 10:11 pts/0    00:00:00 sqlplus   as sysdba
oracle     465  2259  0 10:11 pts/0    00:00:00 ps -f
oracle    2259  2258  0 Feb14 pts/0    00:00:00 -bash
$ ps -ef | grep 463 | grep -v grep
oracle     463   462  0 10:11 pts/0    00:00:00 sqlplus   as sysdba
oracle     464   463 89 10:11 ?        00:00:14 oracleSLOB (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
$ sudo perf record -p 464 sleep 300
[ perf record: Woken up 35 times to write data ]
[ perf record: Captured and wrote 8.755 MB perf.data (~382519 samples) ]
$

$ perf report
$ perf report --stdio
# ========
# captured on: Mon Feb 18 10:17:02 2013
# ========
#
# Events: 286K cpu-clock
#
# Overhead  Command      Shared Object                       Symbol
# ........  .......  .................  ...........................
#
    29.84%   oracle  oracle             [.] kglic0
    11.08%   oracle  oracle             [.] kgxExclusive
    11.02%   oracle  oracle             [.] kglGetHandleReference
     6.66%   oracle  oracle             [.] kglGetMutex
     6.47%   oracle  oracle             [.] kgxRelease
     4.91%   oracle  oracle             [.] kglGetSessionUOL
     4.48%   oracle  oracle             [.] kglic_cbk
     3.79%   oracle  oracle             [.] kgligl
     3.44%   oracle  oracle             [.] kglMutexHeld
     3.12%   oracle  oracle             [.] kgligp
     1.96%   oracle  oracle             [.] kqlpgCallback
     1.92%   oracle  oracle             [.] kglReleaseMutex
     1.88%   oracle  oracle             [.] kglGetBucketMutex
     1.80%   oracle  oracle             [.] kglReleaseBucketMutex
     1.47%   oracle  oracle             [.] kglIsMutexHeld
     1.18%   oracle  oracle             [.] kglMutexNotHeld
     1.04%   oracle  oracle             [.] kglReleaseHandleReference
     0.51%   oracle  oracle             [.] kghalf
     0.33%   oracle  oracle             [.] qerfxFetch
     0.32%   oracle  oracle             [.] lnxmin
     0.31%   oracle  oracle             [.] qerfxGCol
     0.26%   oracle  oracle             [.] qeruaRowProcedure
     0.24%   oracle  oracle             [.] kqlfnn

$ perf report --stdio | grep oracle | sed 's/\%//g' | awk '{ t=t+$1 } END { print t }'
99.98
$

If you study Tanel’s post and compare it to mine you’ll see differences in the cost accounting Tanel has associated with certain Oracle kernel routines. That’s no huge problem. I’d like to explain what’s happening.

When I used perf-record(1) to monitor the shadow process for 300 seconds it collected 382,519 samples. When using a pstack approach, on the other hand, just be aware that you are getting a glimpse of whatever happens to be on the stack when the program is stopped. You might be missing a lot of important events. Allow me to offer an illustration of this effect.

Do You see What I See?
Envision a sentry  guarding a wall and taking a photo every 1 second. Intruders jumping over the wall with great agility are less likely to be in view when he snaps his photo. On the other hand, an intruder taking a lot longer to cross the wall (carrying a large bag of “loot” for instance) suffers greater odds of showing up in one of his photos. The sentry might get 10 photos of slow intruders while missing hundreds of intruders who happen to be more nimble–thus getting over the wall more quickly.  For all we know, the slow intruder is carrying bags of coal while the more nimble intruders are packing a pockets full of diamonds. Which intruder matters more? It depends on how cold the sentry is I guess :-) It’s actually a wee-bit like that with software performance analysis. Catch me some time and I’ll explain why a CPI (cycles per instruction)  of 1 is usually bad thing. Ugh, I digress.

Tanel’s approach/kit works on a variety of operating systems and versions of Linux. It also does not require any elevated privilege. I only aim to discuss the differences.

Summary
I’ve lightly covered the topic of performance monitoring perturbation and have given a glimpse into what I envision will end up as a multiple-part series on perf(1).



							

8 Responses to “Using Linux Perf(1) To Analyze Database Processes – Part I”


  1. 1 goryszewskig February 18, 2013 at 12:36 pm

    Perfect timing Kevin, Am I right saying that perf would be useful in mutex issues tracing when things are going really fast ?
    Regards
    GregG

  2. 3 Tanel Poder (@TanelPoder) February 19, 2013 at 4:31 pm

    That’s why I said “poor-man’s stack profiler” in my post. Even though the core topic of my post was about drilling down into CPU usage, the important side-topic there was that you do not always need to install additional tools to troubleshoot something right when it happens. Often a text datasource combined with a couple of OS commands or AWK do the trick and you can adapt to changing demands – like just adding “sleep 1″ between the stack samples in the loop.

    But perf itself is cool, even though oprofile can do the same things, usability matters – a single command which is able to show the top vs a sequence of commands to initiate collection and then parse & report is going to win. But I still don’t get perf’s hierarchical profiler output logic, I think stack profiles are much better readable if done like Windows ProcMon, Mac OS “Sample Process” or my oradebug-based OStackProf are doing it.

  3. 5 Anantha March 24, 2013 at 10:23 am

    @Kevin, thanks for these posts. Now we’re back to great technical articles that we’ve come to expect from your blog.


  1. 1 Peeking into Linux kernel-land using /proc filesystem for quick’n’dirty troubleshooting | Tanel Poder's blog: IT & Mobile for Geeks and Pros Trackback on February 21, 2013 at 7:47 am
  2. 2 Изучаем внутреннюю кухню ядра Linux с помощью /proc для быстрой диагностики и решения проблем « Домик Миа Trackback on February 23, 2014 at 9:01 am

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




EMC Employee Disclaimer

The opinions and interests expressed on EMC employee blogs are the employees' own and do not necessarily represent EMC's positions, strategies or views. EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the EMC logo and content regarding EMC products and services, employee blogs are independent of EMC and EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.

This disclaimer was put into place on March 23, 2011.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 1,719 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2013. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

Follow

Get every new post delivered to your Inbox.

Join 1,719 other followers

%d bloggers like this: