'I've been creating applications based on relational databases since before most of you were born.'
-----------
'The only place where I have ever used unnormalized tables is in Data Warehouse applications.'
Come on now Pat... Every db you ever put into production has been in 5th normal form...
- I say that because people of my generation know the difference between your and you're; their, there, and they're; its and it's; to, too, and two for starters
Since the need for higher levels of normalization occurs so infrequently, most people consider 3rd normal form to be normalized. I must confess to a moment of weakness a few years ago where I stored a couple of fields in an order header because their calculation was so complicated. I was wrong and changed the application as soon as I was able
'It is possible to go overboard with how you define a repeating group,'
Hobby1 could be their favorite and Hobby2 could be 2nd favorite.
Hobby1 and Hobby2 are not the same as Address1 and Address2. The hobby fields are repeating groups because they represent two instances of a single attribute. If you were interested in the rank of the hobbies, you would keep an additional field to hold rank in the many-side table. The address fields are 1 field that is separated into 2 for the sole purpose of facilitating the printing of address data. Remember city, state, zip, and country are stored separately in a proper structure because they have other uses besides printing. Addressing snail mail is the reason for keeping address information for most applications and the separation is a way to facilitate PRINTING. As I said, applications that need to do other things with an address, store it as components. The problem with keeping two hobby fields is that a person may have more than two hobbies, I do, and the table cannot accommodate that for one thing but the major problem is that having multiple hobby fields increases the complication of any analysis.
monkeytunes,
what’s better IN THIS CASE? One table with up to 90+ companies (90+ records) and 25 columns (25 fields) for their contractual obligations, or, since all components are eventually required, a join table with three fields (company, component, date) and 2250+ records
Rows vs columns is NEVER what determines correct structure. Rows is "correct". This problem is always, IMHO, a display issue. Someone wants the display format to look like a spreadsheet or some other flat structure so you talk yourself into flattening the table because that solves the interface issue. In your case, I'm guessing that not much is ever done with the individual fields. So you may never have a problem until a change is made to the contract structure and hopefully for your sake, you'll be long gone. The problem is that when you make the decision to flatten the structure it is usually too early in the development process to know what if any problems will be caused by that decision. At least you knew that you were making a conscious decision to deviate from recommended design guidelines.
But, what you really need to weigh is ease of data entry/display vs. everything else.
1. If someone wants a report showing contracts with unreported items, you'll need a query with 25 criteria - I'll need only one criterion.
2. If someone wants a list of the sections by the date they were entered, you'll need a Union query with 25 separate select queries within. I'll just need an order by clause.
3. If someone wants an analysis of which items take the longest to acquire, you'll need that union query with the 25 separate select queries including 25 calculations. I'll need 1 calculation and an Order by clause.
4. If your data were slightly different and it were amounts and someone wanted to sum it, you'd need a calculation that added each field individually and you'd need to account for potential null values. I'd need the Sum() function in a totals query.
5. Same scenario as 4 but they want an average. In addition to summing, you'll need to count the non-null fields so you can come up with the correct divisor. I'd need the Avg() function in a totals query.
6. If someone wants to add a new subsection to item 14, you'll need to change at a minimum the data entry form and its query plus any report, query, or sub that needed the new item. I'd need to add 1 record to a table - NO PROGRAMMING of any kind!
IS ANYONE GETTING THE MESSAGE? Before you shake your head and say
I try to follow normalization practices as much as possible
make absolutely certain that you know the extra effort this decision will cause. If you're having trouble trying to convince a user that it is in his best interest (financially and time wise) to learn to love a subform with a list rather than the flat GUI that he "sees" in his mind, then build an unbound form for the data entry. This is really annoying and takes a lot of code. The only good thing about it is that changes are localized to the data entry form and you won't have an albatross of a data structure getting in your way every time you need to write a query.
If I split that table then at different points I will have to rejoin it with a query and I need all of those fields for data that insered (SIC) into a Word.doc via Word Bookmarks. Now in my experience are query that joins a few of these will run slower than the table itself.
I looked at Mike's fields and agree that most of them are independent attributes of a policyholder and so don't form a repeating group but many of them do belong in other tables to avoid duplication of data. Anything related to the policyholder belongs in the person table. If it is common to have policies for multiple members of the same household, it is probably best to have a separate Address table to minimize the change process when a household moves. Answers to questions should be the same whether the person has one policy or many. Joining tables on primary key to foreign key (especially if the fields are numeric) is quite efficient. When you define a relation and enforce RI, you get to specify Cascade Delete and Jet automatically creates a non-unique index that isn't visible on the fk field to facilitate joins. Of course if you don't bother to create proper relationships, Jet won't do this for you and you could experience slowness if the many-side table is large enough. But why would you want to be sloppy/lazy when you can do the correct thing and Jet can read one row in an Index rather than 100,000 rows in a policy table?