Why indexing is important for good performance

@Doc
Access has a limit of 10 indexes per table (and a PK counts as one).

Whilst agreeing with the general idea behind your post, that quote isn't quite correct. These appear to be the correct values:
Number of indexes in a table 32
Number of fields in an index 10

See:
https://support.office.com/en-us/article/access-specifications-0cf3c66f-9cf2-4e32-9568-98c1025bb47c
https://www.databasezone.com/techdocs/acclimit.html

EDIT:
The following is also an illustration of when indexing does not help.

I recently did a test on another forum to compare the time taken to update a large number of records (2.6 million) where around 20% of the records were null
The comparison was between using Len(fieldname)=0 and Nz(fieldname,"")="" to check for nulls.
The prediction by a forum user was that Len would be faster as VBA handles number more efficiently than strings
That was not the case. Nz was about 10% faster IN THIS TEST

I also decided to compare the times taken when the field was indexed compared to unindexed.
Indexing INCREASED the time to do the update from around 40s to 60s

So whilst indexing may speed up SEARCHES significantly, UPDATES will in many cases be MUCH slower
 
Last edited:
Used to be 10. If it is 32 now, even better - but still a limited resource and still has overhead when you have a lot of records AND a lot of indexes.
 
Yes I agree. In fact I'd just finished adding an example to illustrate the same point
 
If the answer is "50% of the number of records" then the answer is "Don't do it."
To go back to my original thread - I did say this

There is also little point in indexing fields which have few distinct values (like Booleans) or contain a lot of nulls because the index itself will not be that efficient (although with Access you can set the index to ignore nulls). Go back to library analogy - how useful is it to have pages of 'Yes' followed by pages of 'No'?
one of the other things to improve index performance when relevant is to ignore nulls. So if you have a table with a field populated say 25% of the time, the rest of the time it is null - ignore them. index update performance is improved because the null index 'record' is not created and the index 'table' is smaller.

@NG
One of my main tables has 18k records and it takes over 20 min to get the recordcount you mentioned (PK is Autonumber). However, I can open a form that uses a query that returns ALL of those records with additional fields from related tables and a subform - and the recordcount is instantaneous.
In speculation mode so I can't offer an explanation except that opening a query potentially will be making use of an index whilst opening a table will not. There are better people than me who can provide a more considered answer.

Having said that I have noticed that if a table(def) is saved with sort and/or filter properties populated (together with order on load/filter on load set to yes) it can take longer to open. Not sure if this translates across to sql server but a query accessing an access table would ignore those properties.

The standard advice is that forms/reports should use a query as a recordsource, not a table - even if the query returns the entire table. I've not really considered it but it may be this advice is based on the above theory.

Suggest two experiments. Open the query as a query, not as a form recordsource. Open a form with a table as a recordsource. Compare the results.

The other experiment is on your existing form, once opened, click on the 'go to last record' option. Does it go straight there? It may be the ODBC driver is returning a recordcount at the time the recordset is opened - as happens with ADO - to quote for ADO here
https://docs.microsoft.com/en-us/sq...recordcount-property-ado?view=sql-server-2017
Remarks

[FONT=segoe-ui_normal]Use the RecordCount property to find out how many records are in a Recordset object. The property returns -1 when ADO cannot determine the number of records or if the provider or cursor type does not support RecordCount. Reading the RecordCount property on a closed Recordset causes an error.[/FONT]
Bookmarks or approximate positioning

[FONT=segoe-ui_normal]If the Recordset object does support either bookmarks or approximate positioning, this property returns the exact number of records in the Recordset. This property returns the exact number regardless of whether the Recordset has been fully populated.[/FONT]
 
I wish this were a SQL Server thread, because I know so much more about it there than how JET/ACE handles it.

In any case, if anyone's curious, I thought I'd throw out a mention to the book SQL Performance Explained by Marcus Winand. It's not for Access (in fact, not for any specific DB, though he gives examples in MSSQL, PostrgreSQL, MySQL and Oracle), but covers pretty much how every RDBMS engine works, and Access is close enough for it to be very helpful.

It's definitely in the top list enlightening technical books I've ever read.

While I prefer the hardcopy myself, Marcus does have a website (https://use-the-index-luke.com/ which does contain all of the book content online).

Good stuff, thanks for the post CJ_London.
 
still has overhead when you have a lot of records AND a lot of indexes
Obviously requires more effort but on occasion with a very large number of inserts or updates I've had to write code to remove indexing, do the insert/update then reapply indexing.
 
Thanks Jack - good link as well.

Interested in the TOP N part of the publication. pipeline is not available in access but I have found that in order to find the TOP N, the entire dataset (i.e. after criteria has been applied) needs to be sorted on some basis to determine which records are the top 10 (otherwise all you are returning is the 'first' 10) - which can be slow with even medium sized datasets.

I did have a thought, but not had the opportunity to try it yet about using the partition function to split say sales into groups (or perhaps use an aggregate query which doesn't aggregate but groups on a similar basis to partition) and using a criteria on that to limit the dataset further before applying TOP N

Edit: I'm talking something a bit more sophisticated than simple >=somevalue criteria
 
...about using the partition function to split say sales into groups (or perhaps use an aggregate query which doesn't aggregate but groups on a similar basis to partition) and using a criteria on that to limit the dataset further before applying TOP N...

I often find it very difficult to know how the engine will process this, and deciphering execution plans was never a strong point of mine, so that doesn't help either.

I recall years ago in Access saving one query to use as the "base" and querying another on top of that, only to run into trouble with the order. I had assumed that the "base" query would be calculated first, then the "secondary" query applied to the base results... turns out that's not the case and the engine (any of them, AFIAK) will take great liberties in looking at the entire mess and working out what it thinks is the best way. In this case the "base" query was supposed to filter out nulls which would have tripped up the second query, but I kept running into null processing errors in the second query. Anyway, that when I learned that sometimes dumping something into a temp table really IS the way to go.

These days I work with SQL Server, which has much better tooling for this stuff (though I suspect PARTITION functions and window functions in general are in the category of "use wisely for performance considerations").

I should dig that book out again, haven't been through it in a while and it's one of those that I can probably re-read once a year for the next ten years and still pick up nice tidbits each time.

Cheers,
 
CJ_London said:
The standard advice is that forms/reports should use a query as a recordsource, not a table - even if the query returns the entire table. I've not really considered it but it may be this advice is based on the above theory.

As a matter of fact, I ran into another reason why this was a good idea. I never did track down WHY it was this way, but when I had a split DB with UNC links rather than drive-letter links to the BE, and I had a form that opened a table as its record source, I kept on getting a pop-up box asking for the location of the BE tables. Happened once per form launch. When I built a single-table, returns-all-fields, no-filter query and drove the form from the query, everything was quiet as a mouse, smooth as silk, not even a hiccup. I tried to find out why but never found a clear cause. I eventually gave up and yielded to the inevitable.
 
To give you some idea of how efficient indexes are - it takes a maximum of 8 physical index reads to retrieve any record in a million row table. Doubling the row count only increases the maximum reads by 1. Reading an index is a binary action. The query engine keeps statistics so it can easily find the midpoint. Depending on whether the ID is higher or lower, the engine then finds the midpoint of the top or bottom half. It takes 8 splits for a million row table. 9 for 2 million, 10 for 4 million, etc.

One of my clients had a 70 million row master file and as a matter of course, they always wrote their programs to read the master file sequentially and match it against the transaction table. I proved to them that as long as the transaction file was less than half a million rows (it was usually less than 75,000), it was more efficient to read the 500,000 rows randomly based on a sequential reading of the transaction file than to do the two file match reading 70 million rows in one table and 500,000 in the other. Therefore, since most of the time the BE's of my applications are SQL server or some other RDBMS (whatever the client uses), all of my forms are bound to queries with criteria. When a user wants to go to a new record, he enters the criteria, and the form is requeried to get the requested record. So, the first thing to get rid of when you think your app might need to be upsized at some point is filters. Using filters requires bringing the entire server side table or query recordset down from the server to memory on the local PC where Access manages the filtering. The nice thing about utilizing good client/server techniques is that they work just fine for Jet/ACE databases also.
 
Last edited:
Using filters requires bringing the entire server side table or query recordset down from the server to memory on the local PC where Access manages the filtering.
I think this is where the access templates lead newbies in the wrong direction. 9 time out of 10, the template forms are based on tables - and of course the subform construct relies on filtering when you change the mainform record.

Shame Access doesn't have a 'mode' property to allow users to make a choice - simplistically 'professional' - tables can only be viewed via query, forms/reports can only have a query recordsource, field names do not have spaces or use reserved words, etc, 'just messing' - do what you like, but expect plenty of issues along the way.

in professional mode the query builder could be improved to allow a greater variety of joins (although I suppose the argument would then be don't use the query builder)
:D:D:D
 
Shame Access doesn't have a 'mode' property to allow users to make a choice - simplistically 'professional' - tables can only be viewed via query, forms/reports can only have a query recordsource, field names do not have spaces or use reserved words, etc, 'just messing' - do what you like, but expect plenty of issues along the way.

in professional mode the query builder could be improved to allow a greater variety of joins (although I suppose the argument would then be don't use the query builder)
:D:D:D

What a great idea. One for the Access User Voice?
Mind you, I expect chances of being implemented to be close to zero as that would indicate that the current system is flawed
 
Using filters requires bringing the entire server side table or query recordset down from the server to memory on the local PC where Access manages the filtering

A point I took to heart when we migrated to SQL. However, the clients/users were used to clicking the “filtered” indicator in the record selector portion of my the split form (not my decision) to toggle from open records (about 600) to all records (18k, about 4 years of history) to make research easier. I had built different views on the server side just in case the performance became an issue, but much to my surprise (and their pleasure; they HATE change) the performance hit was almost imperceptible...even at our remotest location, Bahrain. Pretty impressive considering the server is in Naples, Italy.

As much as it bothers me that our application isn’t fully optimized, I am thankful that Access and SQL Server work so well together.
 
What a great idea. One for the Access User Voice?
Mind you, I expect chances of being implemented to be close to zero as that would indicate that the current system is flawed

We fought for years to have them supply basic templates that followed some sort of best practices (which I don't know if they ever did, actually), let alone actually enforce anything within the program itself.

Nice thought, and I'd give it a vote, but I'd be less surprised by a talking penguin, I think, if they actually did it.
 
I think this is where the access templates lead newbies in the wrong direction. 9 time out of 10, the template forms are based on tables - and of course the subform construct relies on filtering when you change the mainform record.

Yep, there's a reason that in my current apps, the subforms always resolve to a single record based on the value in the parent form, rather than using the Parent/Child links.

When you're doing this on a table with 1.6 billion records, the normal filtering mechanic just isn't remotely acceptable.
 
1.6 billion!?! US billion (10^9) or UK billion (10^12)
Either way that's a lot - presumably a SQL database as I'm sure it would be more than 2GB

In one of my apps I have a table of 2.6 million postcodes
Searching is obviously impossible using a combo box so to make it manageable, I break it down using 5 cascading combos, each of which are fast:

attachment.php


Do you have a different solution?
 

Attachments

  • PostcodeBuilder.PNG
    PostcodeBuilder.PNG
    85.3 KB · Views: 421
Yeah, 1.6 billion. American billion, thank God. And we add a couple million more every week. That table alone is somewhere on the far side of 1 TB in size.

Keep in mind that I work for one of the largest health insurance organizations in the US. My particular subset only handles Michigan, and we still have 4.5 million members in Michigan and 1.5 million more in other states. For the record, I actually use a view covering all claim lines* received in 2010-2012, 2012-2014, and 2014 to the current day (which is the 1.6 billion record table - the others are smaller).

*A claim line is a single action on an insurance claim. If you go to the doctor here, the claim is your visit, and the lines could include your service fee for walking in the door, 3 lines for different blood tests, a line for an x-ray, and a line for a treatment provided on-site. If you wind up in the hospital for something complicated and stay there for several weeks, the resulting claim could be THOUSANDS of lines. When I was hospitalized for two days for simple Cellulitis, my claim was still over 50 lines.

I assume your screenshot isn't the normal entry, but instead is used to find a postal code if you don't already know it. Given that constraint, I'd have probably done the exact same thing (except that I don't know how to add the Google Maps shot showing the location). Speed-wise, I don't think you'll find a faster solution than you used.

Another option I found on this very site (Paul Brown, maybe? Maybe Mr. Baldy?) was to have a combo box that doesn't actually pull a list until a certain number of characters are entered, in order to cut down on the recordset size. I'm not entirely sure how even that would perform against millions of records, though, and I really doubt it would match the performance of what you showed here.
 
That's certainly a big file....
The postcodes BE file is 1.6GB with almost all of that in one table but it's only updated each quarter.

The normal postcode entry is a textbox.
The builder is indeed when you aren't sure what to enter.
Each combo is based on a separate table filtered for the previous selection which makes it very fast.

Allen Browne had a combo that behaved as you described. I tried the idea for this setup but found it utterly unusable
 
We have a dedicated virtual server with something around 12 or 15 TB of space. That table and its archives are far and away the biggest tables we have, but they're also the core of everything my department does. We update it every Monday night, and it feeds something like 20 different in-house applications.

And it doesn't even include prescriptions! That's another table entirely.

(In a nutshell, my department tracks down over and under payments and gets them corrected, as well as dealing with the administrative nightmare that is Medicare and determining whether Medicare pays for something or we do. Last year we actually recovered $87 million that had been over- or incorrectly-paid.

I had a feeling that your postal code table would break that dynamic combo box. ;)
 
Another option I found on this very site (Paul Brown, maybe? Maybe Mr. Baldy?) was to have a combo box that doesn't actually pull a list until a certain number of characters are entered,
its a technique I use - requirement is to enter 3 chars (or wait 1/2 second on the first 2) - and I don't use an initial * - I train the user to use it if required.

You can also enter multiple entries separated by a semi colon. i.e. an entry of smi;bro will return all names beginning with smi or bro.

simplistically/statistically even entering one character reduces the returned recordset by 96%

I have a client with circa 3m records - time to return a recordset is generally less than a second but is based on a simple search of one field or another.
 

Users who are viewing this thread

Back
Top Bottom