Back in 2003, I published the results of experimenting with various values of _spin_count to improve throughput on latch congested systems. You can find the original paper here.
The study involved varying spin count on an Oracle 8.1.6 database suffering from heavy cache buffer chains latch contention. The database was running on a Windows 2000 server. The results of this study confirmed that _spin_count can be very effective in improving the throughput of latch contention-bound systems. In the test, throughput almost doubled when spin_count was raised from the default of 2000 to 10000.
I recently repeated these tests for Oracle 11g. This time, the database was experiencing shared pool and library cache latch contention and was running on RHEL 5.0. The results will be published soon in my contention series at ToadWorld. To summarize however, increasing spin_count in 11g was equally effective in reducing latch contention. As _spin_count increased, latch waits reduced and throughput increased, up until CPU saturated, at which point no further improvements were achieved.
This chart summarizes typical results:
In this case, the optimal value for spin_count was probably about 8000. I did these tests on a variety of systems and workloads, and almost always the default spin_count of 2000 was lower than optimal
In the 2003 tests, increasing spin count beyond the point at which CPU saturated resulted in latch waits continuing to decrease, but waits for the CPU rendering those reductions useless. In these tests, after CPU saturated latch free waits stopped reducing. I presume that this is because the session was pre-empted by the OS - effectively stopping it's spin - so that when CPU gets stressed excessive _spin_count values are effectively dishonored. In 2003 trials it looked like the sessions kept spinning, creating large run queues. The difference may be due to the scheduling differences between the Windows and Linux kernels, or it might be a difference between 8i and 11g; I'm guessing the former, but I need to do some more research when I get the chance.
I should point out that Spotlight on Oracle, the diagnostic tool i developed at Quest software, has a latch tuning module that will try and discover the most optimal value for spin_count. You might like to try it out.
Do you have some numbers concerning the sleep / spin ratio with different values of _spin_count you tried ?
The question behind this is, what is a good value for this ratio ? Is it a good idea to increase _spin_count if 1% / 0.5% / 0.1% of misses induce a sleep ?
Posted by: Colin Pitrat | March 12, 2009 at 01:26 AM
I think every system will be different. The chance that increasing the spin count will increase the hit rate is going to be highly dependent on the type of latch and the workload. My approach when we implemented a solution in Spotlight on Oracle (www.quest.com/spotlight) was to measure wait times and as we adjusted the spin count until we found the spin count that resulted in the least waits and the highest throughput.
Posted by: Guy Harrison | March 26, 2009 at 10:26 AM
i like this part of the blog:"In this case, the optimal value for spin_count was probably about 8000. I did these tests on a variety of systems and workloads, and almost always the default spin_count of 2000 was lower than optimal" is very good
Posted by: buy viagra | April 27, 2010 at 06:51 AM