How does indexing affect database size? (1 Viewer)

sumdumgai · Oct 11, 2023

Hello all,

I've got a question regarding indexing and database size. Does anyone know if the database size will differ if the table design includes indexed fields and the table is then loaded with data vs. the table design does not include indexed fields, is loaded and then indices are added to the design? Hope that is understood.

Thanks.

GPGeorge · Oct 11, 2023

I would phrase the question differently.

How would it be possible for the database size to differ between the two scenarios? What mechanism, in other words, would account for such a difference?

Another approach would be to find out what would happen by just "Trying it and seeing what happens".

ebs17 · Oct 11, 2023

The time at which the indexing is set will have no influence on the resulting database size. The internal effort of indexing depends on the data managed.

But it makes a difference in terms of speed. If you first fill the non-indexed table with data and then set the indexing, it is a little faster.
In practice, however, you have tables that are part of a database schema. The relationship and therefore the keys in use, including indexing, cannot and will not simply be removed and added again. A set referential integrity as well as unique indexes have an important and indispensable functionality when inserting data.

sumdumgai · Oct 11, 2023

Thank you for replies. The reason I'm asking is that I'm loading so much data that it's approaching the 2GB limit. I was able to load the data without indexing set in the table design and then add the indices afterwards. Still within 2GB. But, when I loaded the table with indices already set, the table was not fully loaded before the database shut down. It could be something else that prevented full load. I'm still testing.

ebs17 · Oct 11, 2023

sumdumgai said:
The reason I'm asking is that I'm loading so much data that it's approaching the 2GB limit.

Loading data sounds like temporary operations.
So you should have already separated the frontend and backend and in this case also think about several backends into which you can sensibly divide data.
For me, the warning bells would ring after reaching two thirds of the 2 GB at the latest.

The second necessary consideration is that if you are on the fence about database size, it is better to look for a database management system that can manage significantly larger amounts of data.

sumdumgai · Oct 11, 2023

Well, it looks like loading the data into a non-indexed table works, but loading into an indexed table does not. Without indices, loading 24 months of data creates a database that is 1.725 GB. Loading into an indexed table loads about 20 months and then stops. Haven't figured out why it stops because no error messages. But with 20 months, the database is about 2.0 GB.

CJ_London · Oct 11, 2023

Seems to me an access back end is not appropriate if you are that close to the 2gb limit

When importing to an indexed table, additional space is required temporarily and then released. However that released space is not available until you compact:repair which is why you are getting a fail

Appending large amounts of data to indexed tables also takes a long time because of the repetitive nature of updating the indexes so the policy of appending to unindexed tables and adding the indexes afterwards is a more efficient process

I presume your indexing is appropriate and not over used indexing fields that don’t need it

sumdumgai · Oct 11, 2023

Thank you.

The_Doc_Man · Oct 12, 2023

This is a guess, but I believe that declaring indexing before loading the table means the index will have to be "churned" as well as the database space for each new record. It is probably a relative thing, but you probably have to rewrite indexes in a less efficient way when adding one item at a time. "Churning" in this case refers to having to split index parts in a way that Access probably doesn't clean up well, causing creation of new parts AND erasure of old parts. The more erasure that goes on, the more wasted space you accumulate. Eventually that clogs your DB.

I recall seeing a study about B-tree indexes (on another vendor's DB) that says you do better to add data and then declare an index. But they also suggested that pre-sorting the data actually made it WORSE because it produced an asymmetric index tree that wasn't as efficient as would have been a randomly defined index. I mention that because I recall seeing something about B-tree indexes in a discussion about Access.

You have gotten suggestions relating to using something other than an Access back-end. I have to concur will my colleagues on that point. If you get that close to the limit that your success is dependent on the order of declaring an index, you are too close to the hard limit.

ebs17 · Oct 12, 2023

sumdumgai said:
Without indices, loading 24 months of data creates a database that is 1.725 GB.

Anyone who needs an evaluation over 2 years will very quickly need it over 10 or 20 years. With an access file as a backend, you no longer work at the border, but under conditions of a limited provisional arrangement.

It's not much progress to just have the data in a table. For Jet-SQL (Access-SQL) you are also limited to 2 GB for queries. If the data already takes up 2 GB, there is no capacity for temporary intermediate results and additional calculated fields.

So if you want to work with a comfortable Access frontend, you will be dependent on an active DBMS not only holding the data ready, but also pre-processing it in a targeted manner and providing you with intermediate results and results with significantly smaller amounts of data via views.

The_Doc_Man · Oct 12, 2023

I "liked" your post, @ebs17, but I'll also add that your point about large queries is EXTREMELY significant.

People don't always take this into account, but there is an order to the steps of query execution.

SQL Query Order of Execution | Sisense

Understanding query order can help you diagnose why a query won't run, and will help you optimize your queries to run faster.

www.sisense.com

First, SQL evaluations the FROM clause or clauses. In practice this means SQL makes a list of tables. This list has to go into the workspace, which is usually associated with the front end. This list isn't that big.

Second, the WHERE clauses. This means SQL has to make a list of records not filtered out by the WHERE clauses. Depending on the exclusivity of the WHERE clause, this could be a short list or a VERY long list. And that ALSO goes into the workspace. When you are dealing with "weak" WHERE clauses and a big table, this list could get very long. Folks sometimes observe "bloat" in a front-end file and wonder how it originates because all of their tables are in the back-end. Well, this is how it happens. For small tables or highly restrictive WHERE clauses, the "record list" isn't so big. For larger data sets such as are described in this case, that list won't be trivial either.

The rest of the steps deal with grouping and ordering, which doesn't change the list size that much. By the third step, the damage is done.

gemma-the-husky · Oct 12, 2023

sumdumgai said:
Well, it looks like loading the data into a non-indexed table works, but loading into an indexed table does not. Without indices, loading 24 months of data creates a database that is 1.725 GB. Loading into an indexed table loads about 20 months and then stops. Haven't figured out why it stops because no error messages. But with 20 months, the database is about 2.0 GB.

Well indexes aren't free. Each index needs a collection of pointers (a b-tree I imagine) to store the index appropriate for each record. You save time in speed of access of records. The more indexes, the more space you need. So I'm not surprised you can't fit the data and the indexes into a 2gb maximum space, if the space requirement is critical. Try just loading the last year's data.

Pat Hartman · Oct 12, 2023

sumdumgai said:
Haven't figured out why it stops because no error messages. But with 20 months, the database is about 2.0 GB.

My guess is that appending data with the indexes and RI already defined causes extra work and extra work space to be created and is therefore less efficient than adding the indexes after the fact. The bloat is what causes the process to stop much earlier than the append and then add the indices later method.

I'm pretty sure that during the C&R process, the relationships and indexes except for the autonumber PK are not added until after the fact.

Others have recommended that it is time to upsize to SQL Server and I concur. You really do not want to be living on this precipice.

The_Doc_Man · Oct 12, 2023

@sumdumgai - part of our problem that leads to an air of vagueness here is that Microsoft code is not OpenSource - it remains proprietary. Therefore we really don't know how they do things internally. Our advice is based mostly on guesswork and varying amounts of past experience. Maybe we don't always know WHY something works - or doesn't - but working with the "beast" that is an MS application file leads us to have a sense of how to - and how not to - properly use Access.

Cotswold · Oct 13, 2023

I once came across a client with an MDB of 4GB that was running just fine. I asked why he'd not run the maintenance utilities and he said "Oh I don't understand all the technical stuff. Obviously it wasn't 4GB when I left his offices!

But if you've accumulated only 2 years in 2GB I'd wonder about using a different BE. Or archive data in 2 year blocks for later accessing if required. So as soon as you get to year2 in the BE, archive year1 to another BE and delete. When the archive has 2 years start another.

How does indexing affect database size? (1 Viewer)

sumdumgai

Registered User.

GPGeorge

Grover Park George

ebs17

Well-known member

sumdumgai

Registered User.

ebs17

Well-known member

sumdumgai

Registered User.

CJ_London

Super Moderator

sumdumgai

Registered User.

The_Doc_Man

Immoderate Moderator

ebs17

Well-known member

The_Doc_Man

Immoderate Moderator

SQL Query Order of Execution | Sisense

gemma-the-husky

Super Moderator

Pat Hartman

Super Moderator

The_Doc_Man

Immoderate Moderator

Cotswold

Active member

Similar threads

Users who are viewing this thread