Question on large data table design (1 Viewer)

thechazm · Jun 9, 2016

I am trying to setup a table the best way I possibly can and was just hoping for some opinions.

I have the need to store potentially 313,000 rows of data multiple times a month. I have tables that have around 3 million rows of data in them but they are a bit sluggish.

I could reduce the number of rows necessary from 313,000 to 10,793 if I make the main table hold 83 fields.

I guess the question is should I pursue the more rows when it's that many or more fields for performance sake?

Any advise or questions are welcome and thanks in advance.

Ranman256 · Jun 9, 2016

if youre not going to query on the extra fields, like:
id, Jan2016, Feb2016, ....
then do it.

but if you do need to query, it's harder to run queries 'across' fields, rather than down.

thechazm · Jun 9, 2016

I would not be filtering on the 83 fields as they are just values. I have also thought about putting each data dump into it's own table. Any thoughts on that and keeping the 313,000 rows in each table?

plog · Jun 9, 2016

I think you need to think/explian what 'performance' means. Also, give us a context for this whole system. Where's the data coming from? Who is consuming the data and how?

When you use terms like 'data dump' it makes it seem like you take a large pile of data, pare it down to a usable sub-set and then work only with that subset. Is that true?

If so, I would recommend a sort of data warehouse. You create your Main database with properly structured tables and then you create a set of throw-away staging database which you import the data dump into. The staging database has a set of queries that extract just the rows and fields you need into the Main database.

thechazm · Jun 10, 2016

Thanks plog for the thoughts. Yes it's like a data warehouse and yes you are correct is that we need to use the data in smaller subsets later. It's just for reporting and comparison purposes. No user input other than an interface to fetch the reports. The data is coming from a large excel application that gets repeated several times a month (1-7 times). Each excel document produces 313,000 values that needs to be tracked and possibly compared to previous ones.

Anakardian · Jun 10, 2016

Have a look at a star diagram type database.
You would purposely de-normalise the database in an effort to make speed the priority.

This is a pretty specialised database design so it is completely unsuitable for the normal transactions.

The basic idea is that all data is pre-digested and pre-calculated as much as possible so the final result are available as fast as possible.
Be warned though, you will significantly increase the size of the database as you will have a lot of duplicated data.

thechazm · Jun 10, 2016

Thanks checking into it

The_Doc_Man · Jun 11, 2016

I could reduce the number of rows necessary from 313,000 to 10,793 if I make the main table hold 83 fields.

The problem might then become an issue of record size. How big would a single record be if you did that reduction to 83 fields. If your solution was to make the table have more fields and fewer rows, but the result was still the same number of characters, you probably won't be helping yourself.

Simple math says if you have 300k rows x 7 per month, that is 2.1 million rows/month or just over 25 million rows per year. That's a lot of data for Access. Presuming some overhead for search keys and the like, I'd be surprised if you could shoe-horn more than about 500-600 bytes per record into a year's data before it would go POOF on you.

That is, presuming that you want to keep the data purely in Access rather than one of the SQL solutions (SQL Server, My SQL, ORACLE to name a few), you probably would not take long to smash past the size barriers on Access databases.

The question that comes to mind immediately is whether these subsets that were mentioned earlier ever have to directly interact. If they do, you probably would overflow Access capacity all too soon. This is why I mentioned other back ends.

If they do not significantly interact from one spreadsheet run to the next, you can use a Front-End, Back-End, 2nd Back-End situation (perhaps) in which each set gets imported into an Access back end that you could copy from a master template (presuming that the structures are always the same...)

It would work kind of like this (IF AND ONLY IF the datasets are separable).

1. Create a database file with your correct - but empty - table structure.

2. When you are going to start this import process, use a File System Object to copy the template to a file with another name.

3. Have some linked tables dangling (and not referenced) in your main DB. Use VBA to dynamically link to the copy of the template you just made.

4. Do your import through the linked tables.

5. You can now use queries (that you could not touch before this point) to extract what you want.

6. When you are done with your big, honking back end, disconnect the linked tables.

At this point, you have a separate table in a separet file for each iteration of this spreadsheet exercise and will avoid the issues of overflowing the maximum size of an Access database (about 2 Gb). Further, because the data can be referenced as a BE file, you can have other databases that you dynamically link to this big pile and do your data reduction that way, yet have a perfectly good way to store the older data in logical units.

Further, you can have up to 16 databases open at once in Access. One has to be the FE file, but you could have one year's worth of data in 12 files holding a month's worth at a time. The table names might be an issue, but you could probably generate some UNION queries to remerge the data from the monthly tables.

As to performance, if you don't significantly reduce the amount of data you store, whether in tall/thin or short/fat layouts, it really won't make THAT much of a difference. The time saved processing fewer but wider rows when compared to more but narrower rows is a tough call, and we can't forget that VBA is interpretive, not complied to true machine code.

The biggest table I've got is 600k rows, with probably in the 30k to 50k rows per month, but also with archiving and removal of old data. Performance isn't that bad, but remember that if you have a lot of cross-month processing, 1.2 million of anything takes a long time to process.

Question on large data table design (1 Viewer)

thechazm

VBA, VB.net, C#, Java

Ranman256

Well-known member

thechazm

VBA, VB.net, C#, Java

plog

Banishment Pending

thechazm

VBA, VB.net, C#, Java

Anakardian

Registered User.

thechazm

VBA, VB.net, C#, Java

The_Doc_Man

Immoderate Moderator

Similar threads

Users who are viewing this thread