<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Oracle on Opteron with Linux-The NUMA Angle (Part IV). Some More About the Silly Little Benchmark.</title>
	<atom:link href="http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/feed/" rel="self" type="application/rss+xml" />
	<link>http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/</link>
	<description>Oracle-related Platform, Storage and Clustering Topics (with the occasional rant)</description>
	<pubDate>Mon, 13 Oct 2008 15:26:41 +0000</pubDate>
	<generator>http://wordpress.org/?v=MU</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: kevinclosson</title>
		<link>http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2753</link>
		<dc:creator>kevinclosson</dc:creator>
		<pubDate>Tue, 06 Feb 2007 00:46:34 +0000</pubDate>
		<guid isPermaLink="false">http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2753</guid>
		<description>Henry,

  The number of CPUs doesn't affect latency as long as bus bandwidth does not get saturated and (MOST IMPORTANTLY) they aren't constantly hammering the same memory line--as is the case with spinlocks. Processors stall for long periods when pounding on heavily contended memory holding spinlocks. This is why so much work goes into breaking out work into multiple locks and why lock alternatives to spinlocks are so attractive (e.g., queued locks, read-writer locks, RCU, etc).

  Noons used the term "impedance mismatch" which is fine, but in the case of Opterons, the Hypertransport is clocked at the same rate as the processor. Very elegant stuff. But, it's NUMA and that is why I'm blogging this thread.

  Just for the sake of flashback, I recall running heavily contentious Oracle workloads with bus analyzers attached back in the Sequent days. The Orion chipset that supported the Pentium Pro processor routinely stalled for 19 bus cycles (yes bus cycles at 90MHz) when invalidating heavily contended cache lines (e.g., releasing a lock). Yes, releasing a lock is expensive. At least when there are other "interested parties".

  Trivial pursuit...</description>
		<content:encoded><![CDATA[<p>Henry,</p>
<p>  The number of CPUs doesn&#8217;t affect latency as long as bus bandwidth does not get saturated and (MOST IMPORTANTLY) they aren&#8217;t constantly hammering the same memory line&#8211;as is the case with spinlocks. Processors stall for long periods when pounding on heavily contended memory holding spinlocks. This is why so much work goes into breaking out work into multiple locks and why lock alternatives to spinlocks are so attractive (e.g., queued locks, read-writer locks, RCU, etc).</p>
<p>  Noons used the term &#8220;impedance mismatch&#8221; which is fine, but in the case of Opterons, the Hypertransport is clocked at the same rate as the processor. Very elegant stuff. But, it&#8217;s NUMA and that is why I&#8217;m blogging this thread.</p>
<p>  Just for the sake of flashback, I recall running heavily contentious Oracle workloads with bus analyzers attached back in the Sequent days. The Orion chipset that supported the Pentium Pro processor routinely stalled for 19 bus cycles (yes bus cycles at 90MHz) when invalidating heavily contended cache lines (e.g., releasing a lock). Yes, releasing a lock is expensive. At least when there are other &#8220;interested parties&#8221;.</p>
<p>  Trivial pursuit&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Henry</title>
		<link>http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2748</link>
		<dc:creator>Henry</dc:creator>
		<pubDate>Mon, 05 Feb 2007 22:58:16 +0000</pubDate>
		<guid isPermaLink="false">http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2748</guid>
		<description>Thanks Noons, I think I've got it now. The memory latency depends on the memory subsystem (though you can't measure it without having a processor in the loop, so no measurement is completely independent of CPU). One would expect this latency to remain the same regardless of how many processors are chugging away. This is not true because of the "impedance mismatch". A good experiment should list both the memory and CPU. Also, seeing similar behavior (i.e. cache stalls) with one type of CPU running on various types of memory will give some insight into CPU behavior. Is that about right? 

Henry</description>
		<content:encoded><![CDATA[<p>Thanks Noons, I think I&#8217;ve got it now. The memory latency depends on the memory subsystem (though you can&#8217;t measure it without having a processor in the loop, so no measurement is completely independent of CPU). One would expect this latency to remain the same regardless of how many processors are chugging away. This is not true because of the &#8220;impedance mismatch&#8221;. A good experiment should list both the memory and CPU. Also, seeing similar behavior (i.e. cache stalls) with one type of CPU running on various types of memory will give some insight into CPU behavior. Is that about right? </p>
<p>Henry</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Noons</title>
		<link>http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2697</link>
		<dc:creator>Noons</dc:creator>
		<pubDate>Mon, 05 Feb 2007 04:10:53 +0000</pubDate>
		<guid isPermaLink="false">http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2697</guid>
		<description>@Henry:

No, not at all. Allow me to try and fill in for Kevin:

Memory latency is a result of the architecture and technology used to make actual memory modules.  And as a consequence: the physical connectivity of the memory subsystem built around such modules. 

This is fixed for a particular type or technology of memory module.  

This is what Kevin means when he says: "Memory latency is nothing more that the time it takes to load or store a memory location".  

As in: the time it takes for the memory subsystem - be that a chip or a set of chips or whatever - to store or retrieve a given memory location.  

Totally independent of the architecture of the "core" processor(s).  Hopefully.

Processors may have one technology or architecture to access their native L1 and L2 cache memory - "native" as in residing in the same "chip" and allowing their clock rates to go at top speed  - and yet use a totally different architecture and/or technology when embedded in a practical, operational system, to access that system's main memory, at a much slower rate than the native cache.

It's the potential mismatch - also called "impedance mismatch" in electrical engineering parlance- between the two main "classes" of memory access and their relative speeds and synchronisation that becomes interesting.  And causes the "cache stalls" Kevin talks about.  And how well a given technology addresses this mismatch.

Kevin, please help extricate the foot off my mouth if that's the case.</description>
		<content:encoded><![CDATA[<p>@Henry:</p>
<p>No, not at all. Allow me to try and fill in for Kevin:</p>
<p>Memory latency is a result of the architecture and technology used to make actual memory modules.  And as a consequence: the physical connectivity of the memory subsystem built around such modules. </p>
<p>This is fixed for a particular type or technology of memory module.  </p>
<p>This is what Kevin means when he says: &#8220;Memory latency is nothing more that the time it takes to load or store a memory location&#8221;.  </p>
<p>As in: the time it takes for the memory subsystem - be that a chip or a set of chips or whatever - to store or retrieve a given memory location.  </p>
<p>Totally independent of the architecture of the &#8220;core&#8221; processor(s).  Hopefully.</p>
<p>Processors may have one technology or architecture to access their native L1 and L2 cache memory - &#8220;native&#8221; as in residing in the same &#8220;chip&#8221; and allowing their clock rates to go at top speed  - and yet use a totally different architecture and/or technology when embedded in a practical, operational system, to access that system&#8217;s main memory, at a much slower rate than the native cache.</p>
<p>It&#8217;s the potential mismatch - also called &#8220;impedance mismatch&#8221; in electrical engineering parlance- between the two main &#8220;classes&#8221; of memory access and their relative speeds and synchronisation that becomes interesting.  And causes the &#8220;cache stalls&#8221; Kevin talks about.  And how well a given technology addresses this mismatch.</p>
<p>Kevin, please help extricate the foot off my mouth if that&#8217;s the case.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Henry</title>
		<link>http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2571</link>
		<dc:creator>Henry</dc:creator>
		<pubDate>Sun, 04 Feb 2007 00:39:48 +0000</pubDate>
		<guid isPermaLink="false">http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2571</guid>
		<description>So is all RAM basically the same and the memory latency is a function of how the different processors satisfy their cores?</description>
		<content:encoded><![CDATA[<p>So is all RAM basically the same and the memory latency is a function of how the different processors satisfy their cores?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: kevinclosson</title>
		<link>http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2444</link>
		<dc:creator>kevinclosson</dc:creator>
		<pubDate>Fri, 02 Feb 2007 21:33:12 +0000</pubDate>
		<guid isPermaLink="false">http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2444</guid>
		<description>Hi Henry,

  Memory latency is nothing more that the time it takes to load or store a memory location. Letancy as a concept doesn't differ based on core count. Different processors will staisfy their cores with varying memory latencies as per their architecture.</description>
		<content:encoded><![CDATA[<p>Hi Henry,</p>
<p>  Memory latency is nothing more that the time it takes to load or store a memory location. Letancy as a concept doesn&#8217;t differ based on core count. Different processors will staisfy their cores with varying memory latencies as per their architecture.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Henry</title>
		<link>http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2439</link>
		<dc:creator>Henry</dc:creator>
		<pubDate>Fri, 02 Feb 2007 21:18:13 +0000</pubDate>
		<guid isPermaLink="false">http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2439</guid>
		<description>Kevin,
Thanks for your blog. The stuff you are covering is where I really feel ignorant and this is a great resource (and one of the few I have found). I am confused about one thing with your SLB test. It's purpose is to analyze memory latency. For each test you document the CPU type (i.e. Opteron, Clovertown). What is the connection between processor and memory latency? How does this differ on single, dual, quad core?

Thanks.</description>
		<content:encoded><![CDATA[<p>Kevin,<br />
Thanks for your blog. The stuff you are covering is where I really feel ignorant and this is a great resource (and one of the few I have found). I am confused about one thing with your SLB test. It&#8217;s purpose is to analyze memory latency. For each test you document the CPU type (i.e. Opteron, Clovertown). What is the connection between processor and memory latency? How does this differ on single, dual, quad core?</p>
<p>Thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: kevinclosson</title>
		<link>http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2225</link>
		<dc:creator>kevinclosson</dc:creator>
		<pubDate>Thu, 01 Feb 2007 01:18:20 +0000</pubDate>
		<guid isPermaLink="false">http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2225</guid>
		<description>You got it, Noons... of course you can download at anytime ... I'm trying to get my hands on a System x 3950 right now...I've still got contacts at IBM :-)</description>
		<content:encoded><![CDATA[<p>You got it, Noons&#8230; of course you can download at anytime &#8230; I&#8217;m trying to get my hands on a System x 3950 right now&#8230;I&#8217;ve still got contacts at IBM <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Noons</title>
		<link>http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2221</link>
		<dc:creator>Noons</dc:creator>
		<pubDate>Thu, 01 Feb 2007 00:42:47 +0000</pubDate>
		<guid isPermaLink="false">http://kevinclosson.wordpress.com/2007/01/31/oracle-on-opteron-with-linux-the-numa-angle-part-iv-some-more-about-the-silly-little-benchmark/#comment-2221</guid>
		<description>Awesome stuff!  Unreal, the change from 3 to 4 memhammer!
I'm dying to get my hands on some AIX gear in my new job: will definitely give this a try.  Please do keep the kit available for a while.</description>
		<content:encoded><![CDATA[<p>Awesome stuff!  Unreal, the change from 3 to 4 memhammer!<br />
I&#8217;m dying to get my hands on some AIX gear in my new job: will definitely give this a try.  Please do keep the kit available for a while.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
