Need advice on DB design. (1 Viewer)

sumdumgai

Registered User.
Local time
Today, 17:04
Joined
Jul 19, 2007
Messages
453
Does it ever happen that someone sells TWO units in the same month?
If I understand your question, the answer is yes. The number of items sold (i.e., > 0) per sales rep per month ranges 1 to 100 +.

When you say back-end, I was thinking more along the lines of linked tables (or do linked tables always imply a back-end?).

What if I created a separate database for each product (group), each with one product table and loading into that table only the 5 fields that have been suggested for that product. I could append monthly non-zero sales data for that product there. Then I could link that table to a 'master' database that contained sales rep and product information?

Thanks.
 
Last edited:

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 16:04
Joined
Feb 28, 2001
Messages
27,146
do linked tables always imply a back-end

In a word, yes. Tables are either linked or local. Local tables are in your primary app file which is usually your front-end or FE file. Linked tables are in another file that is usually your back-end or BE file.

Here is my take based on personal historical perspective. Adding separate files (distinct BE files) adds to the maintenance overhead of the project because to do maintenance, you typically have to take down all of the FE files and all of the BE files AT THE SAME TIME. You need some guaranteed common point (yet another BE file) so that you can store data regarding your maintenance time. And trust me, for files this big, you WILL need maintenance. They will eat your socks if you aren't careful.

The more "things" you have to touch, the worst this gets. (A) More opportunities to make mistakes. (B) More need for a careful backup system. (C) More downtime needed to perform the maintenance tasks. (D) More opportunities to get confused. (E) More resources required - and you already have an issue with that WITHOUT the multiple BE files being in place.

I need to explain (B) a bit better. When you have a database, it has an implied or explicit "instantiation" date or number or SOMETHING that says that "All files involved with this overall database were synchronized as of " either a date or a sequence number of some sort. ORACLE used instantiation numbers. I wouldn't rule out other systems that might use dates. The idea, though, is that if you have to RECOVER a blown database, you must be able to recover files such that you can return EVERY COMPONENT FILE to the same instantiation number or time. If you cannot do that, you cannot recover the database reliably and you risk data loss.

Using SQL Server and a Terabyte disk, you can hold a lot of big data in a single place. Big file - but easy to maintain, easy to back it up.

Using ORACLE Enterprise Server and one or more large disks, you can hold a lot of big data in a couple of big places but ORACLE manages the files for you if you set aside the required backup time.

Using Access "native" BE files, you are doing your own data management and your own backup. You are asking for a major accident to happen. This is based on nearly 50 years of program engineering experience that includes systems analysis, product management, security management, and network usage management. And in those 50 years, I have NEVER seen a system that did not occasionally require a data restoration. It WILL happen.
 
Last edited:

sumdumgai

Registered User.
Local time
Today, 17:04
Joined
Jul 19, 2007
Messages
453
Don't mean to be argumentative and I really do appreciate all of your comments, but these databases will be read-only and updated once a month with backups. There will not be any 'transaction' queries where instantaneous results are needed. Once the data retrieval queries are developed, there shouldn't be any more database updates. Does that change anyone's opinion of using Access?
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 16:04
Joined
Feb 28, 2001
Messages
27,146
Looking at the structure again, I changed my mind about something. You are making a lot of "roll-up" reports but I don't see it likely that you will want to go back and reference anything by the sales ID that would be a PK/Autonumber field. I also don't see any child table depending on the five-field sales record. If there are no child tables and everything you are doing is based on roll-ups, you don't need the SalesID PK field - or any PK. But if you DID need one, you have a natural compound key in the form of the RepID and date.

The thing about PK fields is that everybody thinks you need them. But that ain't so. Sometimes you need them if something is going to depend on something else. When you have dependency, the independent item needs an ID but the dependent item might not. It's all in how you use that dependent item. In this layout, Rep ID depends on a representative table. Prod ID depends on a product table. But what field in what table depends on an individual sales record? At the moment, I don't see one.

So your table changes from 24 bytes to 20 bytes and shrinks from about 250 Mb to a little over 200 Mb. So storing a year of data means 2.4 Gb total before you add in indexes. Which means Access STILL can't fit one year of that data into a single table.

But if your sales were differently recorded, you might get some space back here. I'm going through the exercise to show you that it won't be enough.

You said you had limited group size (<100) and limited subgroup size (<100) and limited attribute size (<100). This question is INCREDIBLY important: How likely is it that the number of groups, subgroups, or attributes will change? BECAUSE you could store the info like this:

RepID: Long, FK to representative detail table - 4 bytes
TheDate: Date, 8 bytes
Group: BYTE, FK to group-name table, 1 byte
Subgroup: BYTE, FK to subgroup-name table, 1 byte
Attribute: BYTE, FK to attribute-table, 1 byte
TheCount: WORD, 2 bytes (unless you have some go-getter sales persons who can sell more than 30,000 units in a month single-handedly)

This would make your group, sub-group, or attribute roll-up queries trivial since there is no JOIN requirement. That would make everything a lot faster and the only time you need do to the lookup is to print some names on your report.

That means one sales record is 17 bytes. Your 10.5 million records now take about 180 Mb. That means you can now fit 5 months into a single table.

The reason I went through this is to show you how really tight for space you would be. You STILL will be unable to fit six months worth of data into a single table with a native Access BE file. You are STILL looking at a data overload situation.

You appear to be desperately seeking an Access solution but I have to say this does not look like Access can do this without a LOT of really hard, complex, and perilously messy work. This is probably a job for an active SQL engine like SQL, ORACLE, MySQL, or one of the other ODBC compliant engines.

Perhaps we can understand your persistence by asking this: Did the boss tell you it HAD to be done in Access? Because I think we are going to do you the greatest favor by suggesting that you would do better with an Access FE and something else for a BE.

AND I have made it a point to include my detailed thinking on the layout in case one of my colleagues sees something I have overlooked. In fact, I would WELCOME anyone else telling me why I'm totally wrong-headed about this one. But at the moment, I don't see it.
 
Last edited:

Mark_

Longboard on the internet
Local time
Today, 14:04
Joined
Sep 12, 2017
Messages
2,111
Just to make sure, your project is to do reporting on large data sets. Does this also involve data mining or is this for simple analysis or tracking?

More to the point, will you be required to pull something as exotic as "How often is product X sold in colour Y on a Wednesday afternoon"? How often are reports pulled? What is the greatest time period you will be running reports for?

Also can you get away with simply having one "Per month" record instead of needing an actual date/time?

It is possible that your record could very well be smaller if there is data you don't need to store. Depending on need, you may be able to get away with

RepID: Long FK (4 bytes)
ProdID: Long FK (4 bytes)
Month: Byte (1 bytes)
Count: Byte (1 bytes)

If you know you will never see more than 255 of a given item sold by one rep in one month. Much of this will be driven by what outputs you will be required to support though.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 16:04
Joined
Feb 28, 2001
Messages
27,146
Mark's comments are on-point. In order to know how much to store, you must first determine how much you intend to ask - and then more or less work backwards - to see what it will take to support that intent. The more detail you intent to request, the more detail you will need to store.
 

Mark_

Longboard on the internet
Local time
Today, 14:04
Joined
Sep 12, 2017
Messages
2,111
SPECIAL NOTE ABOUT COUNT.

In my example I'm using a byte with the assumption most salespeople will not sell more than 255 of an item in one month. If one does, you add an extra record (or records) for the same person/month/product. This is not a LIMIT, but instead a space saving method. It does mean that every single report will need to total all transactions for one person and show the total, just in case there is one (or more) high performance salespeople.

Also, as it can be very important, are you tracking ONLY by salesperson or do you also (as is oft the case) reporting primarily by location? If by location, there is an entire extra layer that gets added where you verify who is working where when, and accumulate those to a "Location" accumulator.

In a normal database that is being used regularly accumulators are highly disliked as they take up a fair amount of resources and duplicate data. In your case, with large static recordsets, doing the work once but being able to access it multiple times may be the difference between an hour to run a report and few seconds.

In addition to what you need to output, please also let us know if there are time constraints or other special needs not normally found in most databases.
 

sumdumgai

Registered User.
Local time
Today, 17:04
Joined
Jul 19, 2007
Messages
453
To Doc,


Attributes always 12. Number of groups and subgroups can change. Regarding using Access, we're trying to see if it's a fit before jumping to something like Azure SQL Server.



To Mark,
Will you be required to pull something as exotic as "How often is product X sold in colour Y on a Wednesday afternoon"? How often are reports pulled? What is the greatest time period you will be running reports for?


Nothing exotic. Who's selling the most of group x and in what subgroup and attribute? Which group was sold the most and in what subgroup and attribute? Recent Month? Past 12 months? I believe some general queries will be run at first, creating some result tables which will be used to generate other reports.


I think my only shot with Access is to create a table for each group in its own database. Fields would be something like:
RepID: Long FK (4 bytes)
ProdID: text FK (4 bytes) {g(1-20),s(a-z),a(1-12) e.g., 20z12} mapped to group/code table
MMYYY: Byte (1 bytes)
Quantity Sold: Byte (1 bytes)


Use linked table manager to sum across groups.


Thoughts?


Thanks.
 

Mark_

Longboard on the internet
Local time
Today, 14:04
Joined
Sep 12, 2017
Messages
2,111
For a moment, lets looks at the sales person and Product tables.

What information do you have/need per salesperson?
What information do you have/need per product?

If you can store location information on a per-sales person basis, that will mean you won't have to address it in your main table. I am bringing up location as this often drives these kinds of reports.

From what I can tell your products will have
ProductID (AutoNumber)
ProductName (String)
ProductGroup (Long OR String)
ProductSubGroup (Long OR String)
ProductAttribute (Long OR string)

For the group, subgroup, and attribute, you will either want to store the actual value that goes on reports and such OR a reference value if it is looked up from another table.

For your main table, you would want to reference this as a number.
If you are limited to a couple thousand possible "products" you may want to use a WORD (2 bytes) instead of a LONG (4 bytes) as your ProductID. That could save 20% in space per record. This is based on the assumption that your products do not change, or do not change often.

Of additional importance is what is the longest period you will need to report on for a by/month report? Will you have to show the monthly results from 15 years ago? Will you be doing trend analysis based on same month but different years?
 

sumdumgai

Registered User.
Local time
Today, 17:04
Joined
Jul 19, 2007
Messages
453
What information do you have/need per salesperson?
What information do you have/need per product?
Each salesperson and product will have unique identifiers along with other descriptive information that are contained in other tables.


ProductID (AutoNumber)
This will be text field, several characters.


Products will start at 6 and may go to a dozen or so but not much more.


longest period
Probably a rolling13 months. Archiving in a different database at 14th month and later will be a maintenance challenge but I hope doable.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 16:04
Joined
Feb 28, 2001
Messages
27,146
As soon as you say "with its own database" you are opening up Pandora's box, which contains a can of worms. Multiple databases mean you CANNOT create a formal relationship between / among the files in question. You are struggling to avoid the obvious. To do any kind of analysis on a data set this size, you're gonna need a bigger boat.

I understand that people want to avoid the cost of going to something more formal and more expensive than Access, but part of wisdom is learning to tell when the problem is just too complex for a development system designed for lower-end applications. And what you have, volume-wise, is not low-end.
 

sumdumgai

Registered User.
Local time
Today, 17:04
Joined
Jul 19, 2007
Messages
453
I'm going to give Access a try. I've managed to reduce the amount of data but still need to split into multiple databases. I will be asking for help under the 'Queries' forum. Thanks again to all who contributed.
 
Last edited:

Users who are viewing this thread

Top Bottom