Scheduling Database (1 Viewer)

Treason · Dec 6, 2004

I have developed a database application for a school. It's a network application back-end/front-end, with up to 30 users at a time. I am trying to improve my database speeds, which seems to get slower with more database objects added. I now fear I fell into common pitfalls when I created my tables.
I am hoping to get advice on how to correct these pitfalls if they affect database speeds.

This is a list of pitfalls I think I may have,

Pitfall 1:
I never used the relationships window. Instead I always create my joins in the query design view.

Pitfall 2:
(Cannot enforce referencial integrity because source data is not housed by me.) The majority of my data is not entered manually. Instead the user replaces my tables with CSV files.

Pitfall 3:
Every table does not have a Primary Key.
(many times I found it hard to add them)
Here's an example of my 3 main tables:

My frontend database is about 15mb, and my backend databases are about 60 mb in total.

KenHigg · Dec 6, 2004

There are several potential problems base on what you've posted...

Can you elaborate on exactly which data is stored in your db? How do you use the data, etc?

kh

Treason · Dec 6, 2004

Ok, The heart of the application is for Schedule changes. Switching StudentA from English-01 to Math-02. Thats the core.

To most users it is used to view Report Cards, transcripts, schedules, Class information, attendance etc..

It's almost generic in that it uses anything you might think a high school needs.

KenHigg · Dec 6, 2004

Hum... I see. Cool db.

The statement 'I never used the relationships window.' implies you may not have an adequate understanding of data normalization. It may be helpful if you do a bit of reading up the topic. - In my humble opinion, you may need to do a major overhaul of your db as I spot missing primary keys, one to many relationships, etc...

kh

The_Doc_Man · Dec 6, 2004

My best advice is to remember the old programmer's rule:

When designing an Access project, if you can't do the job on paper, you won't be able to do it in Access.

When laying out data, you should spend some up-front time because no less an authority than Nicklaus Wirth (father of the Pascal language) stated a simple statistic over 30 years ago: Over 80% of all programming problems stem from poor data design.

You state that you can't enforce referential integrity because you don't house the data. BUT if you get CSV files, you DO control some part of that process of data import. 'cause I KNOW you don't have a CSV file as part of your live DB. No .DLL exists (that I know of) to support a CSV-based database file. So if there are abominations in the imported data, you can flag them and refuse to handle them in their current form.

The whole problem of referential integrity is the ability to control the data you need to use. If you truly cannot control the content of your DB, it isn't your DB. So the solution to this problem is to make it your DB or make the DB someone else's problem, one or the other. I.e. either you get to make decisions OR you pass the problems on to the person who won't let you make the decisions. Advise that person of the problems you face and get a resolution so you can do something constructive. If the person tries to delegate the authority, get it in writing. If the person is dismissive (I.e. "Just go make it work and don't bother me"), don't take that for an answer without pointing out that your current situation cannot be improved without a change. I.e. make your manager MANAGE for a change. Otherwise you will spin your wheels on sand.

OK, now to the techie side of this issue: Normalization is going to be paramount in a DB this complex. The idea of defining your relationships in the query screens is NOT necessarily wrong in the situation you describe, but it is less efficient than having a central repository of relationships. (Centralized: Takes up less space, takes less labor for maintenance.)

I give this advice to many people. A simple yet practical (and usually cheap) method to do a good DB design is to define your tables ON PAPER. Go find yourself a dry-erase board or a chalkboard. Buy a box of sticky notes. Not a pad. A BOX.

Put your table names on the board. Create some sticky notes to act as sample records for each table. When you have the normalization rules in front of you, start populating the tables. Whenever you are about to store something in a record but the something is a candidate for another table, you hit one of the normalization rules: Thou shalt not mix apples and oranges unless you were planning to make fruit salad. Apples go in the apples table. Oranges go in the oranges table. Records containing both apples AND oranges usually have to go into a linking table (representing the fruit salad, of course.) The "real" rule is that you don't store any details in a table if the details don't depend entirely and only on the prime key of said table.

OK, how does that work in practice? Well, look at your schedule. You've got students. You've got classes. You need a table to show what students are in what classes. But you ALSO need separate tables that hold the details for each student independent of their classes, and for the classes independent of each student. The class population table is where these two tables overlap. This is also where the prime keys of the "pure" tables are used as foreign keys for the table that uses both types of information.

Anyway, go find the rules of normalization. Access lists the three biggies (first through third-normal forms), though a web search could turn up more. Trust me, if you get the big three properly implemented, the others are obscure enough that you COULD survive without breaking things up that far. (Yeah, for the purists out there, I'm noted for uttering such blasphemies now and then. Good for the soul to get those out of your system, don't you know?)

You commented that not all of your tables have primary keys. Well, for the linking tables, this is possible. You should still have non-unique indexes on the foreign keys, though. And a primary key MIGHT be possible as the combination of the foreign keys in those cases. Take, for example, the class membership table. A student cannot be a member of the same class twice, so the combination of student ID and class ID should be unique. But in other cases, this might not be so. Depends on your business rules.

Treason · Dec 6, 2004

Thanks for the reply DocMan.

When I say I can't enforce referential integrity it is because I import a new CSV containing student's bio data daily. When I import this CSV, I delete my table first then import the CSV totally replacing the table. If Integrity was enforced and I dumped my Student table, I would also dump the classes associated with that student.

Now you are right, I could import the CSV file into a temp table, do some checks against my old table, then only update/remove/addnew instead of simply replacing the entire dataset. That seems to be alot more work, which I am willing to do, if the benefits are worth wild.

When I designed the DB I did map out my tables on paper, and this is honestly the only way I thought it would work. I read alot of threads here concerning simliar issues and I know my database is not designed poorly, I just fear my database is not optimized, and that of course that is my real goal.

I was hoping someone might see a large problem in my design that might help processing times. If solving this problem does not really improve processing times then it doesn't bother me (not for the time being anyway).

I have a book on migrating my Access Backend to SQL which I am considering, but I am not even sure if it will help either. I really just want what every developer wants, optimal performance from my application.

The_Doc_Man · Dec 6, 2004

What you get when you migrate to an "active" backend (e.g. SQL Server), you get better apparent performance because of the relative speeds of the interfaces involved.

On Access, your backend is a file server. You remotely read the .mdb file to find the records you need, but the work is done on your PC. That means that the query processor has to pull in EVER RECORD of EVERY TABLE to find the records matching your query. All of that at NETWORK speeds. Then you build the query's recordset, which is essentially a list of pointers to affected records. You build THAT at network speed. Then, you read back the result set - again, at network speed.

On SQL Server, you do the query and the SQL Server equivalent of building the recordset list on the server. Then the server gives you the resultant recordset. The first two items are done at local disk speeds, which are always faster than any network operation. And the only network-based item is the readback of the resultant recordset, which is far smaller than the query processing work. So you should get a lot of performance by moving the biggest part of your work to a local disk I/F.

I see your point about R.I. - if you rebuild records every day, it is hard to do R.I. properly. On the other hand, the combination of having R.I. for a "live" table and importing to a temp table might be a good idea as a way to control what is going into your DB. At least, it might give you a way to filter out some trash records first. A side effect of this method is that if the CSV files are big, the DB will grow radically during the import operation, after which you will have lots of deleted records. So the "big temp table" approach is good but has the overhead of the need for frequent compaction.

Now, if you could import to a separate DB (like having more than one back end), you could at least control the clutter factor in your primary DB. But again, that is up to you.

KenHigg · Dec 6, 2004

Treason,

Then perhaps maybe this would work:

1. Store the student data on your local db and do the proper relationship stuff.
2. Attach to the student csv file.
3. Each day, delete the previous days csv file and copy in the new one.
4. Build some queries that update the Access student tables based on data from the attached csv file and run them each day.

Would this work. I suspect this relationship issue could be cause the performance hit...

kh

Treason · Dec 6, 2004

Thanks for all your advice guys, I really value your opinions

I think enforcing integrity may be too much work without a big enough payoff. So I will migrate to SQL Server in my next major update. I have alot of reading to do!

Scheduling Database (1 Viewer)

Treason

#@$%#!

KenHigg

Registered User

Treason

#@$%#!

KenHigg

Registered User

The_Doc_Man

Immoderate Moderator

Treason

#@$%#!

The_Doc_Man

Immoderate Moderator

KenHigg

Registered User

Treason

#@$%#!

Similar threads

Users who are viewing this thread