How does indexing affect database size? (1 Viewer)

sumdumgai

Registered User.
Local time
Today, 16:55
Joined
Jul 19, 2007
Messages
453
Hello all,

I've got a question regarding indexing and database size. Does anyone know if the database size will differ if the table design includes indexed fields and the table is then loaded with data vs. the table design does not include indexed fields, is loaded and then indices are added to the design? Hope that is understood.

Thanks.
 

GPGeorge

Grover Park George
Local time
Today, 13:55
Joined
Nov 25, 2004
Messages
1,873
I would phrase the question differently.

How would it be possible for the database size to differ between the two scenarios? What mechanism, in other words, would account for such a difference?

Another approach would be to find out what would happen by just "Trying it and seeing what happens".
 

ebs17

Well-known member
Local time
Today, 22:55
Joined
Feb 7, 2020
Messages
1,946
The time at which the indexing is set will have no influence on the resulting database size. The internal effort of indexing depends on the data managed.

But it makes a difference in terms of speed. If you first fill the non-indexed table with data and then set the indexing, it is a little faster.
In practice, however, you have tables that are part of a database schema. The relationship and therefore the keys in use, including indexing, cannot and will not simply be removed and added again. A set referential integrity as well as unique indexes have an important and indispensable functionality when inserting data.
 

sumdumgai

Registered User.
Local time
Today, 16:55
Joined
Jul 19, 2007
Messages
453
Thank you for replies. The reason I'm asking is that I'm loading so much data that it's approaching the 2GB limit. I was able to load the data without indexing set in the table design and then add the indices afterwards. Still within 2GB. But, when I loaded the table with indices already set, the table was not fully loaded before the database shut down. It could be something else that prevented full load. I'm still testing.
 

ebs17

Well-known member
Local time
Today, 22:55
Joined
Feb 7, 2020
Messages
1,946
The reason I'm asking is that I'm loading so much data that it's approaching the 2GB limit.
Loading data sounds like temporary operations.
So you should have already separated the frontend and backend and in this case also think about several backends into which you can sensibly divide data.
For me, the warning bells would ring after reaching two thirds of the 2 GB at the latest.

The second necessary consideration is that if you are on the fence about database size, it is better to look for a database management system that can manage significantly larger amounts of data.
 

sumdumgai

Registered User.
Local time
Today, 16:55
Joined
Jul 19, 2007
Messages
453
Well, it looks like loading the data into a non-indexed table works, but loading into an indexed table does not. Without indices, loading 24 months of data creates a database that is 1.725 GB. Loading into an indexed table loads about 20 months and then stops. Haven't figured out why it stops because no error messages. But with 20 months, the database is about 2.0 GB.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 21:55
Joined
Feb 19, 2013
Messages
16,616
Seems to me an access back end is not appropriate if you are that close to the 2gb limit

When importing to an indexed table, additional space is required temporarily and then released. However that released space is not available until you compact:repair which is why you are getting a fail

Appending large amounts of data to indexed tables also takes a long time because of the repetitive nature of updating the indexes so the policy of appending to unindexed tables and adding the indexes afterwards is a more efficient process

I presume your indexing is appropriate and not over used indexing fields that don’t need it
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 15:55
Joined
Feb 28, 2001
Messages
27,188
This is a guess, but I believe that declaring indexing before loading the table means the index will have to be "churned" as well as the database space for each new record. It is probably a relative thing, but you probably have to rewrite indexes in a less efficient way when adding one item at a time. "Churning" in this case refers to having to split index parts in a way that Access probably doesn't clean up well, causing creation of new parts AND erasure of old parts. The more erasure that goes on, the more wasted space you accumulate. Eventually that clogs your DB.

I recall seeing a study about B-tree indexes (on another vendor's DB) that says you do better to add data and then declare an index. But they also suggested that pre-sorting the data actually made it WORSE because it produced an asymmetric index tree that wasn't as efficient as would have been a randomly defined index. I mention that because I recall seeing something about B-tree indexes in a discussion about Access.

You have gotten suggestions relating to using something other than an Access back-end. I have to concur will my colleagues on that point. If you get that close to the limit that your success is dependent on the order of declaring an index, you are too close to the hard limit.
 

ebs17

Well-known member
Local time
Today, 22:55
Joined
Feb 7, 2020
Messages
1,946
Without indices, loading 24 months of data creates a database that is 1.725 GB.
Anyone who needs an evaluation over 2 years will very quickly need it over 10 or 20 years. With an access file as a backend, you no longer work at the border, but under conditions of a limited provisional arrangement.

It's not much progress to just have the data in a table. For Jet-SQL (Access-SQL) you are also limited to 2 GB for queries. If the data already takes up 2 GB, there is no capacity for temporary intermediate results and additional calculated fields.

So if you want to work with a comfortable Access frontend, you will be dependent on an active DBMS not only holding the data ready, but also pre-processing it in a targeted manner and providing you with intermediate results and results with significantly smaller amounts of data via views.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 15:55
Joined
Feb 28, 2001
Messages
27,188
I "liked" your post, @ebs17, but I'll also add that your point about large queries is EXTREMELY significant.

People don't always take this into account, but there is an order to the steps of query execution.


First, SQL evaluations the FROM clause or clauses. In practice this means SQL makes a list of tables. This list has to go into the workspace, which is usually associated with the front end. This list isn't that big.

Second, the WHERE clauses. This means SQL has to make a list of records not filtered out by the WHERE clauses. Depending on the exclusivity of the WHERE clause, this could be a short list or a VERY long list. And that ALSO goes into the workspace. When you are dealing with "weak" WHERE clauses and a big table, this list could get very long. Folks sometimes observe "bloat" in a front-end file and wonder how it originates because all of their tables are in the back-end. Well, this is how it happens. For small tables or highly restrictive WHERE clauses, the "record list" isn't so big. For larger data sets such as are described in this case, that list won't be trivial either.

The rest of the steps deal with grouping and ordering, which doesn't change the list size that much. By the third step, the damage is done.
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 21:55
Joined
Sep 12, 2006
Messages
15,658
Well, it looks like loading the data into a non-indexed table works, but loading into an indexed table does not. Without indices, loading 24 months of data creates a database that is 1.725 GB. Loading into an indexed table loads about 20 months and then stops. Haven't figured out why it stops because no error messages. But with 20 months, the database is about 2.0 GB.
Well indexes aren't free. Each index needs a collection of pointers (a b-tree I imagine) to store the index appropriate for each record. You save time in speed of access of records. The more indexes, the more space you need. So I'm not surprised you can't fit the data and the indexes into a 2gb maximum space, if the space requirement is critical. Try just loading the last year's data.
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 16:55
Joined
Feb 19, 2002
Messages
43,281
Haven't figured out why it stops because no error messages. But with 20 months, the database is about 2.0 GB.
My guess is that appending data with the indexes and RI already defined causes extra work and extra work space to be created and is therefore less efficient than adding the indexes after the fact. The bloat is what causes the process to stop much earlier than the append and then add the indices later method.

I'm pretty sure that during the C&R process, the relationships and indexes except for the autonumber PK are not added until after the fact.

Others have recommended that it is time to upsize to SQL Server and I concur. You really do not want to be living on this precipice.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 15:55
Joined
Feb 28, 2001
Messages
27,188
@sumdumgai - part of our problem that leads to an air of vagueness here is that Microsoft code is not OpenSource - it remains proprietary. Therefore we really don't know how they do things internally. Our advice is based mostly on guesswork and varying amounts of past experience. Maybe we don't always know WHY something works - or doesn't - but working with the "beast" that is an MS application file leads us to have a sense of how to - and how not to - properly use Access.
 

Cotswold

Active member
Local time
Today, 21:55
Joined
Dec 31, 2020
Messages
528
I once came across a client with an MDB of 4GB that was running just fine. I asked why he'd not run the maintenance utilities and he said "Oh I don't understand all the technical stuff. Obviously it wasn't 4GB when I left his offices!

But if you've accumulated only 2 years in 2GB I'd wonder about using a different BE. Or archive data in 2 year blocks for later accessing if required. So as soon as you get to year2 in the BE, archive year1 to another BE and delete. When the archive has 2 years start another.
 

Users who are viewing this thread

Top Bottom