Wednesday, August 13, 2008

Effect of Multiple Table Scan

Let's assume that we have the following scenario happening in a database server (also assume that this is the only thing happening in it):

5 table scans started at the same time, hitting different databases but the table structure and number of rows are all identical (roughly 200 million rows).

When I do a mysqladmin extend -r -i 10 | grep Handler_read_rnd_next

It shows that this variable is being updated. The question is since there are 5 threads running, are all of them hitting the Handler* subroutine one after the other? If so, would this slow down the whole table scanning process?

I tried to test this by running 1 table scan and it took about 1 hour to finish. And if I try to run 2 table scans together, it takes about 4 1/2 hours to finish.


Roland Bouman said...


I should think that competition for hardware resources is the most likely source of the relative slowdown, not so much the fact that the same routine is called.

In this particular case there are two things I am thinking of:
- disk. If all data files are on the same disk the threads are fighting to read 'their' part of the disk. Sequentially scanning multiple disks in parallel should be faster.
- cpu. If each thread has its own cpu, it can go about its business uninterupted, not losing any time in switching between threads.

How did you conduct your test? Did threads have to share a cpu? Did the tables share a disk?

Bichonfrise74 said...

Hi Roland,

I also thought it was a hardware issue but I am not seeing enough evidence to support it.

vmstat -> shows that i/o wait is just at 10-20%, cpu usage is at 25-30%, and memory is 20% for each thread.

The threads are running their own cpu and the tables are on separate disks.

It is possible that I'm missing something when I'm trying to monitor the table scans but I just want to eliminate the possibility that the slowdown is caused by the Handler routine being hit a lot.

Just a side note, if data files are on the same disk and threads are fighting to read their part on the disk, should the I/O wait in vmstat show a high value?