- Local time
- Today, 16:14
- Joined
- Feb 19, 2013
- Messages
- 17,072
Indexing is essential for the efficient retrieval of data from a db.
Some may be concerned about the additional disk space required for the index(es) - but disk space is cheap and they do not take up that much space. For small db's - a few '000 records per table and simple queries it is probably not noticeable but with more records and more complex queries and you will start to notice performance issues without indexing.
A simple rule of thumb is to open the table and look at the recordcount at the bottom. If it is not populated instantly, you need indexing because the time it takes to populate is about the time it takes to do a sequential search.
Think of the table as being a library of books all stored in a random order on shelves. (which is how a table stores it records, they are not stored in the order you enter them). You don't have an index so to find a particular book you go along the shelves until you find the one you want - sometimes you are lucky and it is on the first shelf, sometimes not and it is on the last shelf - and you may still need to go along all the shelves if looking for all books by a particular author.
Now the librarian realises this takes too long so they create an index - a piece of paper which contains say book title and the aisle and shelf the book is stored on maintaining in alphabetical order of the book title. They may create another one based on the author. Now you can scan down the piece of paper, find the book you want and go straight to the shelf. That is what an index is - a separate 'object' containing the value of a field maintained in order and a pointer to that record in the table.
To understand why indexing is so efficient, you need to understand how computers connect with the datasource - simplistically in the case of a disk it will 'read' a block of data from the disk (i.e. a page of the librarian index), not sure what it is these days but probably something like 4kb look for what is required and if it doesn't find it, read the next block. If it is reading blocks with whole records (for a sequential search) it might pick up say 100 records, but if indexed it might pick up 1000 (because indexes are smaller) so will find your record 10 time quicker - and with indexing algorithms it will have a better idea of what block to read next - in the analogy above, you look at the first page - books starting with A, but since you are looking for a book starting with Z, you know to go to the last page and work backwards.
Indexing does have a time overhead when a record is inserted, deleted or the indexed field changed because it needs to be updated to maintain the index order. But this more than pays dividends when you want to find that record again.
However there is no point in indexing for the sake of it, just those fields you are going to join on or regularly sort and/or filter on. There is also little point in indexing fields which have few distinct values (like Booleans) or contain a lot of nulls because the index itself will not be that efficient (although with Access you can set the index to ignore nulls). Go back to library analogy - how useful is it to have pages of 'Yes' followed by pages of 'No'?
Clearly the size of the index field will also have an impact - the larger the datatype, the fewer elements can be read in a single block. Longs (typically used for autonumber primary keys) are 4 bytes in length, dates (used for timestamps for example) are 8 bytes in length whilst text is 10 bytes plus the number of characters.
Users will still need to search on text (for a name for example), but performance can be improved by using a numeric link between a relatively short list of author names and the long list of books they have written. Primary key is in the author name table and foreign/family key is in the book table.
Using an initial * in a like comparison, prevents the use of indexing because indexing starts from the first character (think of the librarian index above) so should be avoided as much as possible - how many users would look for 'Smith' by entering 'ith'? This is also a good reason for storing names in two fields (firstname/lastname) rather than a single field so each can be indexed separately.
Sometimes it can't be avoided - but better to either train users to enter a * when required, or provide a button/option for begins with/contains to implement the initial * when required.
Some may be concerned about the additional disk space required for the index(es) - but disk space is cheap and they do not take up that much space. For small db's - a few '000 records per table and simple queries it is probably not noticeable but with more records and more complex queries and you will start to notice performance issues without indexing.
A simple rule of thumb is to open the table and look at the recordcount at the bottom. If it is not populated instantly, you need indexing because the time it takes to populate is about the time it takes to do a sequential search.
Think of the table as being a library of books all stored in a random order on shelves. (which is how a table stores it records, they are not stored in the order you enter them). You don't have an index so to find a particular book you go along the shelves until you find the one you want - sometimes you are lucky and it is on the first shelf, sometimes not and it is on the last shelf - and you may still need to go along all the shelves if looking for all books by a particular author.
Now the librarian realises this takes too long so they create an index - a piece of paper which contains say book title and the aisle and shelf the book is stored on maintaining in alphabetical order of the book title. They may create another one based on the author. Now you can scan down the piece of paper, find the book you want and go straight to the shelf. That is what an index is - a separate 'object' containing the value of a field maintained in order and a pointer to that record in the table.
To understand why indexing is so efficient, you need to understand how computers connect with the datasource - simplistically in the case of a disk it will 'read' a block of data from the disk (i.e. a page of the librarian index), not sure what it is these days but probably something like 4kb look for what is required and if it doesn't find it, read the next block. If it is reading blocks with whole records (for a sequential search) it might pick up say 100 records, but if indexed it might pick up 1000 (because indexes are smaller) so will find your record 10 time quicker - and with indexing algorithms it will have a better idea of what block to read next - in the analogy above, you look at the first page - books starting with A, but since you are looking for a book starting with Z, you know to go to the last page and work backwards.
Indexing does have a time overhead when a record is inserted, deleted or the indexed field changed because it needs to be updated to maintain the index order. But this more than pays dividends when you want to find that record again.
However there is no point in indexing for the sake of it, just those fields you are going to join on or regularly sort and/or filter on. There is also little point in indexing fields which have few distinct values (like Booleans) or contain a lot of nulls because the index itself will not be that efficient (although with Access you can set the index to ignore nulls). Go back to library analogy - how useful is it to have pages of 'Yes' followed by pages of 'No'?
Clearly the size of the index field will also have an impact - the larger the datatype, the fewer elements can be read in a single block. Longs (typically used for autonumber primary keys) are 4 bytes in length, dates (used for timestamps for example) are 8 bytes in length whilst text is 10 bytes plus the number of characters.
Users will still need to search on text (for a name for example), but performance can be improved by using a numeric link between a relatively short list of author names and the long list of books they have written. Primary key is in the author name table and foreign/family key is in the book table.
Using an initial * in a like comparison, prevents the use of indexing because indexing starts from the first character (think of the librarian index above) so should be avoided as much as possible - how many users would look for 'Smith' by entering 'ith'? This is also a good reason for storing names in two fields (firstname/lastname) rather than a single field so each can be indexed separately.
Sometimes it can't be avoided - but better to either train users to enter a * when required, or provide a button/option for begins with/contains to implement the initial * when required.
Last edited: