Normalization Denormalization of tables. Keep it denormalized when it is possible.

Kevin_S · Aug 4, 2004

monkey -

wasn't trying to abandon ya earlier it just gets frustrating trying to deal with some who post continuously with one opinion without considering the benefits of the other...

Here is an example I threw together based loosely on your topic area (contracts). I wanted to show you from a visual standpoint how (1) - this can be accomplished using a subform that is pretty basic and (2) how you don't append all 25 components to the new contract - you simply (in this example) select the component from the pick list as it is available and add the records in a many to one relationship as they become available...

Note* - I didn't know what the names of these 25 components are so I named them Component 1, Component 2, Component 3, etc... - If you would like to change these you can do so in the table tblComponent...

HTH and post back if you need more assistance...

Kev

KenHigg · Aug 4, 2004

Stay with me - we're almost there. And don't worry about how to arrange things - we can do that...

Let me think for few minutes...

ken

KenHigg · Aug 4, 2004

Hopefully this will do it: I looked at the tables and here's what it looks like - A project has only one contract and a contract applies to only one project.

So.... I assume that by your db that a project is more important than a contract (Do you want to track this by project instead of by contract? In this case, because of the one to one realationship, this is really just wording issue as I saw project id and no reference to a contract id.)

ken

Pat Hartman · Aug 4, 2004

'I've been creating applications based on relational databases since before most of you were born.'
-----------
'The only place where I have ever used unnormalized tables is in Data Warehouse applications.'

Come on now Pat... Every db you ever put into production has been in 5th normal form...

- I say that because people of my generation know the difference between your and you're; their, there, and they're; its and it's; to, too, and two for starters

Since the need for higher levels of normalization occurs so infrequently, most people consider 3rd normal form to be normalized. I must confess to a moment of weakness a few years ago where I stored a couple of fields in an order header because their calculation was so complicated. I was wrong and changed the application as soon as I was able

'It is possible to go overboard with how you define a repeating group,'

Hobby1 could be their favorite and Hobby2 could be 2nd favorite.

Hobby1 and Hobby2 are not the same as Address1 and Address2. The hobby fields are repeating groups because they represent two instances of a single attribute. If you were interested in the rank of the hobbies, you would keep an additional field to hold rank in the many-side table. The address fields are 1 field that is separated into 2 for the sole purpose of facilitating the printing of address data. Remember city, state, zip, and country are stored separately in a proper structure because they have other uses besides printing. Addressing snail mail is the reason for keeping address information for most applications and the separation is a way to facilitate PRINTING. As I said, applications that need to do other things with an address, store it as components. The problem with keeping two hobby fields is that a person may have more than two hobbies, I do, and the table cannot accommodate that for one thing but the major problem is that having multiple hobby fields increases the complication of any analysis.

monkeytunes,

what’s better IN THIS CASE? One table with up to 90+ companies (90+ records) and 25 columns (25 fields) for their contractual obligations, or, since all components are eventually required, a join table with three fields (company, component, date) and 2250+ records

Rows vs columns is NEVER what determines correct structure. Rows is "correct". This problem is always, IMHO, a display issue. Someone wants the display format to look like a spreadsheet or some other flat structure so you talk yourself into flattening the table because that solves the interface issue. In your case, I'm guessing that not much is ever done with the individual fields. So you may never have a problem until a change is made to the contract structure and hopefully for your sake, you'll be long gone. The problem is that when you make the decision to flatten the structure it is usually too early in the development process to know what if any problems will be caused by that decision. At least you knew that you were making a conscious decision to deviate from recommended design guidelines.
But, what you really need to weigh is ease of data entry/display vs. everything else.
1. If someone wants a report showing contracts with unreported items, you'll need a query with 25 criteria - I'll need only one criterion.
2. If someone wants a list of the sections by the date they were entered, you'll need a Union query with 25 separate select queries within. I'll just need an order by clause.
3. If someone wants an analysis of which items take the longest to acquire, you'll need that union query with the 25 separate select queries including 25 calculations. I'll need 1 calculation and an Order by clause.
4. If your data were slightly different and it were amounts and someone wanted to sum it, you'd need a calculation that added each field individually and you'd need to account for potential null values. I'd need the Sum() function in a totals query.
5. Same scenario as 4 but they want an average. In addition to summing, you'll need to count the non-null fields so you can come up with the correct divisor. I'd need the Avg() function in a totals query.
6. If someone wants to add a new subsection to item 14, you'll need to change at a minimum the data entry form and its query plus any report, query, or sub that needed the new item. I'd need to add 1 record to a table - NO PROGRAMMING of any kind!

IS ANYONE GETTING THE MESSAGE? Before you shake your head and say

I try to follow normalization practices as much as possible

make absolutely certain that you know the extra effort this decision will cause. If you're having trouble trying to convince a user that it is in his best interest (financially and time wise) to learn to love a subform with a list rather than the flat GUI that he "sees" in his mind, then build an unbound form for the data entry. This is really annoying and takes a lot of code. The only good thing about it is that changes are localized to the data entry form and you won't have an albatross of a data structure getting in your way every time you need to write a query.

If I split that table then at different points I will have to rejoin it with a query and I need all of those fields for data that insered (SIC) into a Word.doc via Word Bookmarks. Now in my experience are query that joins a few of these will run slower than the table itself.

I looked at Mike's fields and agree that most of them are independent attributes of a policyholder and so don't form a repeating group but many of them do belong in other tables to avoid duplication of data. Anything related to the policyholder belongs in the person table. If it is common to have policies for multiple members of the same household, it is probably best to have a separate Address table to minimize the change process when a household moves. Answers to questions should be the same whether the person has one policy or many. Joining tables on primary key to foreign key (especially if the fields are numeric) is quite efficient. When you define a relation and enforce RI, you get to specify Cascade Delete and Jet automatically creates a non-unique index that isn't visible on the fk field to facilitate joins. Of course if you don't bother to create proper relationships, Jet won't do this for you and you could experience slowness if the many-side table is large enough. But why would you want to be sloppy/lazy when you can do the correct thing and Jet can read one row in an Index rather than 100,000 rows in a policy table?

KenHigg · Aug 4, 2004

Here the spill (asuming my last post was true)

tblCustomers
customer_recno, pk, autonumber
customer_name, text
customer_info1, ???
etc...

tblProjects
project_recno, pk, autonumber
customer_id, fk, number
project_info1, ???
etc...

tblComponents
component_recno, pk autonumber
project_id, fk, number
component_sort_order, number
component_info1, ???
etc...

tblComponentModel
component_model_recno, pk, autonumber
component_model_sort_order, number
component_model_info1, ???
etc...

There is a on to many customer -> project
There is a one to many project -> component

1. Do a form to build the component model. The sort order field is the order that you want the components to be in when seen on a form or a report.

2. Do form base on the project table.

3. A new project generates a new project_recno

4. This form has the customer_id in a combo box that sets that relationship.

5. Have code append a copy of the current model to the component table, inheriting the project_id from the form

6. Put a suform on the main from based on the tblComponents, linked to the project_id on the main form, and sort.

---------

There's a lot of fill in the blanks here. Let me know of the issues you see...

Think this will work?

ken

Mike375 · Aug 5, 2004

From Pat Hartman posting:

Hobby1 and Hobby2 are not the same as Address1 and Address2. The hobby fields are repeating groups because they represent two instances of a single attribute.

The table I presented does not have any version of Hobby1 and Hobby2 and I think Pat has more or less said that about my table.

From Pat Hartman posting:

I looked at Mike's fields and agree that most of them are independent attributes of a policyholder and so don't form a repeating group but many of them do belong in other tables to avoid duplication of data. Anything related to the policyholder belongs in the person table. If it is common to have policies for multiple members of the same household, it is probably best to have a separate Address table to minimize the change process when a household moves.

Not sure what you mean Pat by duplication of data. Certainly lots of people have the same qualifications, same occupation, same solicitor and same accountant but there are also lots of people with the same surname, postcode and suburb.

When both husband and wife are insured they both have an entry in the main table and in each case a few few fields with data for the spouse. A common address held in another table will not function because they often have different business and postal addresses. This occurs when a doctor is married to a doctor etc.

The thing I am seeing with many of the people on this thread is that normalisation is being thought of as the number of fields in a table and there must be no fields with null values. In my opinion the number of fields is irrelevanmt in terms of normalisation, it is the type of data the field holds that determines the degree of normalisation.

Attempting to make lots of tables to avoid null values would mean in my case I would have no field for middle name as many people do not have a middle name. To my way of thinking null values are important because they tell me what a person does not have. Very young doctors or dentists might not have an accountant or solictor. Since we do some cross referrals with accountants and solicitors it is easy for me to pull those people with no accountant or solicitor.

Now some people might say that No Accountant or No Middle Name etc should be entered and that is another way of doing things but then we would be back at one table

Mike

Rich · Aug 5, 2004

Zzzzzzzzzzzzzzzz

Mike375 · Aug 5, 2004

Rich said:
Zzzzzzzzzzzzzzzz

Rich,

You need to stop reading this stuff.

Your compulsion to read what you are not interested in can be addressed by the appropriate medical specialist. The problem is quite common.

I am still waiting for Kevin to show me how I position the fields on the main form from all the sub forms. Lots of slack talk occurs here from what I think are people who have Access as a hobby.

That is why Kevin gave up because he knows his talk was all hobby bullshit.

But do see a medical specialist about your compulsion to read what does not interest you.

Mike

Rich · Aug 5, 2004

Said the egotist to the erudite prince

Mike375 said:
Rich,

Your compulsion to read what you are not interested in can be addressed by the appropriate medical specialist. The problem is quite common.

Mike

Mike375 · Aug 5, 2004

My learned friend, you cant do much with things in which you lack an interest.

As I said, the forum has a lot of hobbyists on board.

Now if you want to make a real contribution then:

1) Comment on what Kevin has said about my table, that is, do you agree or disagree that the table should be split.

2) If you think it should be split then how are a fields placings on the form going to be done. I am limited to sub forms in that situation which have obvious limitations.....but is there another way?

3) If I split the table what are my gains.

If I follow Kevin's suggestion and split that table into 4 tables where do I go from there.

Mike

Rich · Aug 5, 2004

Why would I, or anyone else, bother to answer your questions?..............

I've no intention of changing anything, but I'd love to see how you do it

Mike375 · Aug 5, 2004

Rich said:
Why would I, or anyone else, bother to answer your questions?..............

Can you expand on the quote you have posted. Where did that come from. What context.

I put in place your suggestion to have the column of about 1000 numbers on one page via a Report.

I also put in place the code to round up and down by 500, 100 etc. which I got from ByteMyzer and then from Pat Hartman on how to put it into a query via QBE.

I also put in place izyrider's code from dBForums to separate duplicates from memo fields. Can you deal with duplicates on memo fields?

But I am not going to put in place something that wastes time because of the time it takes to do it in conjunction with something that does not do the job.

The reason you won't comment on the table structure I put up is because you know that it is "normalised" and Kevin S has got it wrong with splitting it into 4 tables. Kevin is a hobbyist....well, I hope is he is hobbyist

Kevin knows he has stuffed up and he knows the problems that are associated with all the sub forms he would create and doubly so when he must realise that the table he would split into 4 tables is a table that does not have repeating data.

Mike

Len Boorman · Aug 5, 2004

Pat Hartman said:
IS ANYONE GETTING THE MESSAGE

Pat

Was with you before the message was sent.

I have great respect for anybody who from their own efforts creates a database that "works". However "works" may be interpreted in many different ways.

Within my company there were many people creating what they described as databases that worked.! However the problems that they experienced were many and frequent. Training had comprised of a 2 day course at best where Normalisation was not even mentioned.
Basically in true terms the applications were c**p.
However people had worked very hard to produce this c**p with the very best of intentions

I think that since my arrival the quality of databases has improved quite dramatically. Based on the demands on my time people can now see and understand what can be achieved in a well designed application.

By well designed I mean Third Normal Form Normalised (Maybe also B.C by default), Referential Integrity Enforced, A "friendly" User Interface. Adoption of Best Practices

These thing the Designer/Developer can achieve through Education, Experience and Listening to others. Is not one of the objectives in building a database to make it "better" than the last one ?.

I do not consider myself an expert, I believe I know enough to know that there is much more to learn.

To Normalise when designing a Relational Database Application is not a discussion point.

It is a fact.

Len B

If anybody feels aggrieved by any of the above then I apologise now. It was not my intention.

The Stoat · Aug 5, 2004

Mike375 said:
When both husband and wife are insured they both have an entry in the main table and in each case a few few fields with data for the spouse. A common address held in another table will not function because they often have different business and postal addresses. This occurs when a doctor is married to a doctor etc.
Mike

If you look at my break down of your data you will see three address tables Business, Home and Postal. In your Client table you will have three fields to retain the PK of each one of the address tables. You can enter the address once in the address table and link it to as many clients as you like. If the postcodes change (they often do in the uk) you simply need to change them once in the address table and all of the related clients will have effectively been updated.

Mike375 said:
The thing I am seeing with many of the people on this thread is that normalisation is being thought of as the number of fields in a table and there must be no fields with null values. In my opinion the number of fields is irrelevanmt in terms of normalisation, it is the type of data the field holds that determines the degree of normalisation.
Mike

Not really so, in fact quite the reverse. The number of fields as i said before is only related to the required number to express all the attributes of the entity - the object that the table represents - What the number of fields may tell an inexpercienced developer is that they have not correctly normalized their data. It is unusual to have large numbers of fields but by no means wrong, it really is irrelevant as long as you have followed the rules correctly.

As i develop new systems rather than upgrade old ones i tend to view a project in a particular way.

I look at the system and find all of the objects - Entities - within that system. Some are real like the car in the example i posted. Others are more intangible like a journey. Once i have found all of the entities i find their attributes, the properties that make them what they are i.e for cars ,doors, engine size etc. I then find the relationships between the objects. i.e cars go on journeys. The only step left after that really is to remove any many to many relationships. This i found was the hardest concept to grasp when i started. In my example i wanted to hold the information for each journey against each passenger that had been on the journey. I don't want to repeat the whole journey record for each passenger that is in efficient as i end up repeating the same journey data over and over again. I create one journey record and one client record. I create a link table that just holds the PK fields for these two entities. This massively reduces the time it takes to add data and the amount of data that i need to store. In my reports i can pull all the data together again and it is still much quicker than having all the data in one table.

Having read previous posts you have submitted on the subject of your system it is clear that it really is an automated spreadsheet that you have built in an access environment. You could have a more efficient system on a database but to discuss the merits of your system in a db forum is really pointless, you haven't got a database to discuss, sorry but that is the truth. I admire the work that you have put into it but i know that it could be better, much better. What strikes me is that you defend this because it works, which is admirable

, but you can't defend it as a database , it isn't.

TS

Mile-O · Aug 5, 2004

Mike375 said:
Mile,

Here is a question for you in terms of would you put this data on one row.

Title, salutation, first name, middle name, surname, business name, level, street, suburb, state, postode (and by 3 for home, business and postal address), business phone, home phone, mobile, fax, email, date of birth, smoker/non smoker, height, weight, gender, employed/self employed, net income, assets, liabilities, sick leave, workers compensation, married/single, occupation, qualifications, Medical speciality, visiting medical officer/staff specialist, graduation date, dangerous past times, medical loading/no loading, bank, source of lead, place of birth, spouses names and dates of birth, smoker/non smoker, spouse work/not work, spouse date of birth.

Then some fields which determine what sort of mail outs he/she gets, policy owner, accountant, solicitor.

Most of it I'd be inclined to put in one row but there are some fields in there which don't seem atomic: qualifications, dangerous pastimes, spouses' names, etc.

These, to me, would be in seperate tables.

However, in this case, I might be inclined to make a small table for people

i.e.
Forename
MiddleName
Surname
DateOfBirth

and then make an extension table meaning that you could expand upon some of the people in the table as customers and therefore add all the extra stuff in this table. Or, for spouse details, you could refer to the people table and all that will come with the foreign key.

The Stoat · Aug 5, 2004

Len Boorman said:
Pat

Was with you before the message was sent.

I have great respect for anybody who from their own efforts creates a database that "works". However "works" may be interpreted in many different ways.

Within my company there were many people creating what they described as databases that worked.! However the problems that they experienced were many and frequent. Training had comprised of a 2 day course at best where Normalisation was not even mentioned.
Basically in true terms the applications were c**p.
However people had worked very hard to produce this c**p with the very best of intentions

I think that since my arrival the quality of databases has improved quite dramatically. Based on the demands on my time people can now see and understand what can be achieved in a well designed application.

By well designed I mean Third Normal Form Normalised (Maybe also B.C by default), Referential Integrity Enforced, A "friendly" User Interface. Adoption of Best Practices

These thing the Designer/Developer can achieve through Education, Experience and Listening to others. Is not one of the objectives in building a database to make it "better" than the last one ?.

I do not consider myself an expert, I believe I know enough to know that there is much more to learn.

To Normalise when designing a Relational Database Application is not a discussion point.

It is a fact.

Len B

If anybody feels aggrieved by any of the above then I apologise now. It was not my intention.

Totally agree with you and Pat on this. You might as well say let's go and build a bridge out of marshmallows as build a database without normalization. I'm not saying it's always easy, but to reject the practice because you don't like it is nonsensical to the point of infantile. And quite frankly i don't care if anyone is offended - eat my db's dust

TS

Mike375 · Aug 5, 2004

TS,

One thing at a time so...

"If you look at my break down of your data you will see three address tables Business, Home and Postal. In your Client table you will have three fields to retain the PK of each one of the address tables. You can enter the address once in the address table and link it to as many clients as you like. If the postcodes change (they often do in the uk) you simply need to change them once in the address table and all of the related clients will have effectively been updated."

I am lost here.

What do I put in the address tables? I am assuming I enter addresses??

I am also assuming that the addresses will be displayed on a subform or subforms.

I am really lost on this...You can enter the address once in the address table and link it to as many clients as you like.

Apart from a few spouses we never have people with the same address.

I am sure I am missing something here and have read it the wrong way.

Mike

Mike375 · Aug 5, 2004

Mile,

"Most of it I'd be inclined to put in one row but there are some fields in there which don't seem atomic: qualifications, dangerous pastimes, spouses' names, etc."

They are because only one is entered. If someone has three different past times only the one that most influences a premium rating or a policy terms and conditions makes it to that field. Same deal with qualifications as I mentioned (I think?) in an earlier posting.

I need the spouses name in that table for a few reasons. Not the least is that it can be displayed without a subform and therefore postioned exactly where we want it on the form along with date of birth and smoker/non smoker. Perhaps one day we expand to selling insurance to the Muslims and the spouse could be in a table on a One to Many basis......but at this stage it is One to One

These, to me, would be in seperate tables.

However, in this case, I might be inclined to make a small table for people

i.e.
Forename
MiddleName
Surname
DateOfBirth

and then make an extension table meaning that you could expand upon some of the people in the table as customers and therefore add all the extra stuff in this table. Or, for spouse details, you could refer to the people table and all that will come with the foreign key.

You can do all of that but we come back to

Gains Vs Losses

Our main forms would have to have subforms and again we hit the problem of field positioning.

Also, why would include DateOfBirth in the little table when all of the other things equally (actually more so) influence premium and policy availability.

DateOfBirth describes the person for insurance purposes so if it goes in another table so must all the other fields.

We are not interested in DateOfBirth for the purposes of sending birthday cards. Within reason, DateOfBirth has far less impact on premium and policy availability than do many of the other entries.

Mike

Mile-O · Aug 5, 2004

In some instances I would use a subform but it wouldn't be in datasheet mode - it would in be in Single Form mode (not continuous) so that it looks as if its part of the parent form.

The Stoat · Aug 5, 2004

Mike375 said:
TS,

One thing at a time so...

"If you look at my break down of your data you will see three address tables Business, Home and Postal. In your Client table you will have three fields to retain the PK of each one of the address tables. You can enter the address once in the address table and link it to as many clients as you like. If the postcodes change (they often do in the uk) you simply need to change them once in the address table and all of the related clients will have effectively been updated."

I am lost here.

What do I put in the address tables? I am assuming I enter addresses??

I am also assuming that the addresses will be displayed on a subform or subforms.

I am really lost on this...You can enter the address once in the address table and link it to as many clients as you like.

Apart from a few spouses we never have people with the same address.

I am sure I am missing something here and have read it the wrong way.

Mike

Lets forget about the how for a minute and look at the why.

Yes you enter addresses in the address table.

You say

Apart from a few spouses we never have people with the same address.

How many is a few. 1000, 100000, 10000000? It doesn't really matter. To be fair this is a lookup table a 1 to 1 relationship - 1 client has 1 home address - what it does is give you control. Enter something like an address twice takes up twice the amount of room, gives you 2 opportunities to make mistakes and takes more than twice as long to change.

Your form can be based on a query that displays all of the address with the rest of the clients details if you want it that way. It doesn't need to have subforms. Or you could have popup forms where you can display the address.

You must be aware that when you phone up for services they go "Can i take your postcode please" and you say "cb2 2qq" and they say and what number is that and you say "1". They have done a search on their address tables and can now link you to the address. This takes 2-3 seconds for all the address in the UK 20 million + ( you can buy all the address on a cd and upload them into your database)- A system like yours would probably be well suited to an sqlserver with an Access front end.

TS

Normalization Denormalization of tables. Keep it denormalized when it is possible.

Registered User.

Attachments

Registered User

Registered User

Super Moderator

Registered User

Mike375

Guest

Rich

Guest

Mike375

Guest

Rich

Guest

Mike375

Guest

Rich

Guest

Mike375

Guest

Back in gainfull employme

The Grim Squeaker

Back once again...

The Grim Squeaker

Mike375

Guest

Mike375

Guest

Back once again...

The Grim Squeaker

Similar threads

Users who are viewing this thread