- Local time
- Today, 06:29
- Joined
- Feb 28, 2001
- Messages
- 29,960
CJ London's posts inspired me. Those of you who know me know this could be a good thing or a bad thing. I'll try to keep it straight-forward.
To understand and properly analyze whether you are getting good performance out of a database operation, you have to understand how it works. Access (the program) takes your query and analyzes its requirements. This analysis occurs at creation of the QueryDef for your query. Dynamically created queries work find but might not be pre-optimized, so the first helpful thing to know is that if you have to use queries based on some variable, a "Parameter Query" approach gives you the best of both worlds. A stored Parameter Query can be optimized yet still have some variables fed to it.
Factors going into the analysis and optimization of a query include whether indexes exist for the table(s) and whether you have single-table or multi-table operations (JOIN or UNION, for example).
The nature of the query also is a factor. SELECT only has to touch its sources once for reading content to form the final recordset, but a SELECT query doesn't DO anything without interacting with another element of Access that will impose speed issues. For instance, SELECT feeds datasheet views, reports, and forms - all of which have formatting delays that will have far greater effect on performance than the raw speed of the query itself. An isolated SELECT query is therefore usually part of some other transformative method for data inspection, and that other method will be the speed driver.
INSERT INTO with a SELECT clause as the source of the insertion touches data twice - once for the SELECT and once for the INSERT. It doesn't return a recordset so doesn't have GUI/print issues.
UPDATE queries also touch twice - once to read and once to write back. Again, no returned recordset, so no GUI/print issues.
DELETE queries have to touch all of the extant records twice, too. Once to find the record, once to mark it as DELETED. (If you doubt this, open a query, then manually open the underlying table or run a DELETE query and watch the records that you are looking at change to #DELETED#.) Again, no GUI/print issues because no returned recordsets.
So let's analyze a simple case where the speed of the GUI/print elements are not a factors. Let's see how to compute how fast something SHOULD happen. Let's focus on an UPDATE query.
Suppose that your update is going touch every record in a table for which the average record size is about 128 bytes and you have 4096 records. (You can use the Database Documenter to analyze record size by adding up the field size for a given table.)
Assume that you have a modern system with a 15 kRPM disk which means you have about 250 RPS on that disk. We will also assume you have enough physical memory that paging or swapping will not be an issue. Given most off-the-shelf systems these days, that should not be a tough assumption.
We will ALSO assume you compacted and repaired your database and that you recently defragged your disk. (Both are important - discussion later.)
Important fact: Access does things in chunks of disk buffers, 4 kbytes each. If you have 128 bytes, you can fit 32 records per buffer. You have 4096 records so will need 128 disk reads (and since this is an UPDATE, 128 disk writes). For a 250 RPS disk, rotational latency will be 1/2 of the time for one revolution, or 2 ms. To do 256 total disk operations you will need 512 ms or a little over 1/2 second for the optimum case.
You might ask why I'm not adding in compute time here. Windows uses a device driver that operates in parallel with other programs, so after the first buffer is fetched, modified, and written back, Access and Jet/Ace can overlap their current operations with the write-back of the buffer just updated. Therefore, the instruction overhead is hidden behind the longer delay of disk readiness for the next read or write operation.
In summary on these calculations, you can never do an UPDATE of 4096 records of 128 bytes length faster than about 1/2 second for the given conditions. If you see anything slower than 1/2 second for updates, you know that you have missed something. (If you see faster, the odds are that you have a faster disk, as it is the biggest contributor to this computation.)
I said to defrag the disk and then compact & repair the DB. The action of a C&R is to create a new, empty database and then to read each table, one at a time, to transfer records from the old to the new copy. This has the effect of linearizing the tables and making them contiguous, which minimizes disk movement (head-seek latency). It also enhances the probability that you need ONLY 32 buffer-loads to manage the entire 4096-record table.
If your database has a lot of "churn" factor (addition and deletion of records), you increase the likelihood that you need more disk buffers to manage the table because it will become scattered within the DB file. That is, a disk buffer might not be 100% dedicated to your table - other tables might be there; so might some deleted records. So your fragmented database detracts from performance.
If your DISK has a lot of "churn" factor as well, don't forget that database have this thing called "bloat." That is, they tend to grow in size. On a fragmented disk, the appended disk blocks are not necessarily contiguous with the the original DB file. This increases the need to do more distant seek-arm movement.
Therefore, when folks hear us say "have you compacted & repaired your DB file" or "have you recently defragged your disk" we are talking about affect the physical factors that are performance detractors for any database.
What I just did was show you how a system analyst looks at performance, breaking it down into contributors to that performance and the expected size of their contributions.
CJ London's notes talk about things to optimize your database layout, choosing the optimum kinds of operation (query vs. recordset/loop, etc.). This note should be taken as a way to know what is the proper expectation for performance.
To understand and properly analyze whether you are getting good performance out of a database operation, you have to understand how it works. Access (the program) takes your query and analyzes its requirements. This analysis occurs at creation of the QueryDef for your query. Dynamically created queries work find but might not be pre-optimized, so the first helpful thing to know is that if you have to use queries based on some variable, a "Parameter Query" approach gives you the best of both worlds. A stored Parameter Query can be optimized yet still have some variables fed to it.
Factors going into the analysis and optimization of a query include whether indexes exist for the table(s) and whether you have single-table or multi-table operations (JOIN or UNION, for example).
The nature of the query also is a factor. SELECT only has to touch its sources once for reading content to form the final recordset, but a SELECT query doesn't DO anything without interacting with another element of Access that will impose speed issues. For instance, SELECT feeds datasheet views, reports, and forms - all of which have formatting delays that will have far greater effect on performance than the raw speed of the query itself. An isolated SELECT query is therefore usually part of some other transformative method for data inspection, and that other method will be the speed driver.
INSERT INTO with a SELECT clause as the source of the insertion touches data twice - once for the SELECT and once for the INSERT. It doesn't return a recordset so doesn't have GUI/print issues.
UPDATE queries also touch twice - once to read and once to write back. Again, no returned recordset, so no GUI/print issues.
DELETE queries have to touch all of the extant records twice, too. Once to find the record, once to mark it as DELETED. (If you doubt this, open a query, then manually open the underlying table or run a DELETE query and watch the records that you are looking at change to #DELETED#.) Again, no GUI/print issues because no returned recordsets.
So let's analyze a simple case where the speed of the GUI/print elements are not a factors. Let's see how to compute how fast something SHOULD happen. Let's focus on an UPDATE query.
Suppose that your update is going touch every record in a table for which the average record size is about 128 bytes and you have 4096 records. (You can use the Database Documenter to analyze record size by adding up the field size for a given table.)
Assume that you have a modern system with a 15 kRPM disk which means you have about 250 RPS on that disk. We will also assume you have enough physical memory that paging or swapping will not be an issue. Given most off-the-shelf systems these days, that should not be a tough assumption.
We will ALSO assume you compacted and repaired your database and that you recently defragged your disk. (Both are important - discussion later.)
Important fact: Access does things in chunks of disk buffers, 4 kbytes each. If you have 128 bytes, you can fit 32 records per buffer. You have 4096 records so will need 128 disk reads (and since this is an UPDATE, 128 disk writes). For a 250 RPS disk, rotational latency will be 1/2 of the time for one revolution, or 2 ms. To do 256 total disk operations you will need 512 ms or a little over 1/2 second for the optimum case.
You might ask why I'm not adding in compute time here. Windows uses a device driver that operates in parallel with other programs, so after the first buffer is fetched, modified, and written back, Access and Jet/Ace can overlap their current operations with the write-back of the buffer just updated. Therefore, the instruction overhead is hidden behind the longer delay of disk readiness for the next read or write operation.
In summary on these calculations, you can never do an UPDATE of 4096 records of 128 bytes length faster than about 1/2 second for the given conditions. If you see anything slower than 1/2 second for updates, you know that you have missed something. (If you see faster, the odds are that you have a faster disk, as it is the biggest contributor to this computation.)
I said to defrag the disk and then compact & repair the DB. The action of a C&R is to create a new, empty database and then to read each table, one at a time, to transfer records from the old to the new copy. This has the effect of linearizing the tables and making them contiguous, which minimizes disk movement (head-seek latency). It also enhances the probability that you need ONLY 32 buffer-loads to manage the entire 4096-record table.
If your database has a lot of "churn" factor (addition and deletion of records), you increase the likelihood that you need more disk buffers to manage the table because it will become scattered within the DB file. That is, a disk buffer might not be 100% dedicated to your table - other tables might be there; so might some deleted records. So your fragmented database detracts from performance.
If your DISK has a lot of "churn" factor as well, don't forget that database have this thing called "bloat." That is, they tend to grow in size. On a fragmented disk, the appended disk blocks are not necessarily contiguous with the the original DB file. This increases the need to do more distant seek-arm movement.
Therefore, when folks hear us say "have you compacted & repaired your DB file" or "have you recently defragged your disk" we are talking about affect the physical factors that are performance detractors for any database.
What I just did was show you how a system analyst looks at performance, breaking it down into contributors to that performance and the expected size of their contributions.
CJ London's notes talk about things to optimize your database layout, choosing the optimum kinds of operation (query vs. recordset/loop, etc.). This note should be taken as a way to know what is the proper expectation for performance.