Thanks to Steve Shaw, Database Technology Manager, Intel for pointing me to the magic decoder ring for associating Xeon 5500 (Nehalem) processor threads with Linux OS CPUs. Steve is an old acquaintance who I would gladly refer to as a friend but I’m not sure how Steve views the relationship. See, I was the technical reviewer of his book (Pro Oracle Database 10g RAC on Linux), which is a role that can make friends or frenemies I suppose. I don’t have any bad memories of the project, and Steve is still talking to me, so I think things are hunky dory. OK, joking aside…but first, a bit more about Steve.
Steve writes the following on his website intro page (emphasis added by me):
I’m Steve Shaw and for over 10 years have specialised in working with the Oracle database. I have a background with Oracle on various flavours of UNIX including HP-UX, Sun Solaris and my own personal favourite Dynix/ptx on Sequent.
Sequent? I’ve emerged from my ex-Sequent 12-step program! Indeed, that is a really good personal favorite to have. But, I’m sentimental, and I digress as well.
The Magic Decoder Ring
The web resource Steve provided is this Intel webpage containing information about processor topology. There is an Intel processor topology tool that really helps make sense of the mappings between processor cores and threads on Nehalem processors and Linux OS CPUs.
What’s in the “Package?”
As we can see from that Intel webpage, and the processor topology tool itself, Intel often use the term “package” when referring to what goes in a socket these days. Considering there are both cores and threads, I suppose there is justification for a more descriptive term. I still use socket/core/thread nomenclature though. It works for me. Nonetheless, let’s see what my Nehalem 2s8c16t system shows when I run the topology tool. First, let’s see the output from “package” number 0 (socket 0). There is a lot of output from the command. I recommend focusing on line 20 and 21 in the following text box:
Package 0 Cache and Thread details Box Description: Cache is cache level designator Size is cache size OScpu# is cpu # as seen by OS Core is core#[_thread# if > 1 thread/core] inside socket AffMsk is AffinityMask(extended hex) for core and thread CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache CmbMsk will differ from AffMsk if > 1 hw_thread/cache Extended Hex replaces trailing zeroes with 'z#' where # is number of zeroes (so '8z5' is '0x800000') L1D is Level 1 Data cache, size(KBytes)= 32, Cores/cache= 2, Caches/package= 4 L1I is Level 1 Instruction cache, size(KBytes)= 32, Cores/cache= 2, Caches/package= 4 L2 is Level 2 Unified cache, size(KBytes)= 256, Cores/cache= 2, Caches/package= 4 L3 is Level 3 Unified cache, size(KBytes)= 8192, Cores/cache= 8, Caches/package= 1 +-----------+-----------+-----------+-----------+ Cache | L1D | L1D | L1D | L1D | Size | 32K | 32K | 32K | 32K | OScpu#| 0 8| 1 9| 2 10| 3 11| Core |c0_t0 c0_t1|c1_t0 c1_t1|c2_t0 c2_t1|c3_t0 c3_t1| AffMsk| 1 100| 2 200| 4 400| 8 800| CmbMsk| 101 | 202 | 404 | 808 | +-----------+-----------+-----------+-----------+ Cache | L1I | L1I | L1I | L1I | Size | 32K | 32K | 32K | 32K | +-----------+-----------+-----------+-----------+ Cache | L2 | L2 | L2 | L2 | Size | 256K | 256K | 256K | 256K | +-----------+-----------+-----------+-----------+ Cache | L3 | Size | 8M | CmbMsk| f0f | +-----------------------------------------------+From the output we can decipher that Linux OS CPU 0 resides in socket 0, core 0, thread 0. That much is straightforward. On the other hand, the tool adds value by showing us that Linux OS CPU 8 is actually the second processor thread in socket 0, core 0. And, of course, “package” 1 follows in suit:
Package 1 Cache and Thread details Box Description: Cache is cache level designator Size is cache size OScpu# is cpu # as seen by OS Core is core#[_thread# if > 1 thread/core] inside socket AffMsk is AffinityMask(extended hex) for core and thread CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache CmbMsk will differ from AffMsk if > 1 hw_thread/cache Extended Hex replaces trailing zeroes with 'z#' where # is number of zeroes (so '8z5' is '0x800000') +-----------+-----------+-----------+-----------+ Cache | L1D | L1D | L1D | L1D | Size | 32K | 32K | 32K | 32K | OScpu#| 4 12| 5 13| 6 14| 7 15| Core |c0_t0 c0_t1|c1_t0 c1_t1|c2_t0 c2_t1|c3_t0 c3_t1| AffMsk| 10 1z3| 20 2z3| 40 4z3| 80 8z3| CmbMsk| 1010 | 2020 | 4040 | 8080 | +-----------+-----------+-----------+-----------+ Cache | L1I | L1I | L1I | L1I | Size | 32K | 32K | 32K | 32K | +-----------+-----------+-----------+-----------+ Cache | L2 | L2 | L2 | L2 | Size | 256K | 256K | 256K | 256K | +-----------+-----------+-----------+-----------+ Cache | L3 | Size | 8M | CmbMsk| f0f0 | +-----------------------------------------------+So, it goes like this:
Linux OS CPU | Package Locale |
0 | S0_c0_t0 |
1 | S0_c1_t0 |
2 | S0_c2_t0 |
3 | S0_c3_t0 |
4 | S1_c0_t0 |
5 | S1_c1_t0 |
6 | S1_c2_t0 |
7 | S1_c3_t0 |
8 | S0_c0_t1 |
9 | S0_c1_t1 |
10 | S0_c2_t1 |
11 | S0_c3_t1 |
12 | S1_c0_t1 |
13 | S1_c1_t1 |
14 | S1_c2_t1 |
15 | S1_c3_t1 |
By the way, the CPU topology tool works on other processors in the Xeon family.
Nice, 2CPU QuadCore with HT!
the tool by Intel gives a very detailed info..
You can also check on this document by redhat kbase (with sample outputs),
http://kbase.redhat.com/faq/docs/DOC-7715
If it’s okay with you, can you give the output of the following commands:
cat /proc/cpuinfo | grep -i “model name” | uniq
grep processor /proc/cpuinfo
grep “physical id” /proc/cpuinfo
grep siblings /proc/cpuinfo
grep “core id” /proc/cpuinfo
grep “cpu cores” /proc/cpuinfo
Thanks!
Hi Karlarao,
I won’t run that, but I’ll run this: 🙂
# cat /tmp/foo
function filter(){
sed ‘s/^.*://g’ | xargs echo
}
grep processor /proc/cpuinfo | filter
grep ‘physical id’ /proc/cpuinfo | filter
grep siblings /proc/cpuinfo | filter
grep ‘core id’ /proc/cpuinfo | filter
grep ‘cpu cores’ /proc/cpuinfo | filter
# sh /tmp/foo
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
if you don’t get this output horizontally oriented it is a mess
Cool! Nice work with the function 🙂
I used to write/draw it on paper.. haha
From the output of /proc/cpuinfo, the table below could also be done. It’s kind of cryptic at first but it’s another way to do it without running the script by Intel
——————————————————-
OScpu#| 0 8| 1 9| 2 10| 3 11|
Core |c0_t0 c0_t1|c1_t0 c1_t1|c2_t0 c2_t1|c3_t0 c3_t1|
——————————————————-
OScpu#| 4 12| 5 13| 6 14| 7 15|
Core |c0_t0 c0_t1|c1_t0 c1_t1|c2_t0 c2_t1|c3_t0 c3_t1|
——————————————————-