Normalization Denormalization of tables. Keep it denormalized when it is possible.

aleb

Registered User.
Local time
Today, 18:20
Joined
Jun 25, 2003
Messages
296
Normalization Denormalization of tables. Keep it normalized when it is possible.

I might express apostate opinion but
Recenly browsing the threads I have noticed that some of them are basically about whether it is needed to normalize the tables or not.
IMHO the whole issue of normalization is about 3 key points:
1. Avoid the need to enter the same information
2. Dbase performance
3. Avoid variances of the same information.

I will try to dwell upon those 3 points
1. Lets say we have an invoice table where there is an invoice number, company name, company representative. In normalized tables you will have two tables one for company where you will have company name and company representative the other is invoice table. In invoice table you will have only the refernced to the company table key. Besides if the tables are normalized in case if company representative has changed you will change it only in one place rather than changing all the instances where you have company representative field. But imagine that in some companies company representatives come and go every other day and you need to know who was the representative at the time when invoice came to you.
2. It is obvious that dbase performance is increased when there are used normalized tables. But ... Having a bunch of normalized tables you come back with increased number of queries for denormalization. Besides you will come across the "ambiquious" issue :):):) if you try do everything in one query. "Head and Shoulders" - RIP. More over if you have the dbase which is designed for a small enterprise who on earth will notice the difference in couple of milliseconds. For big enterprises there are better and bigger programms than your penni-ante dbase.
3. All the variances in spelling of de-normalization/ denormalization is treated as a different word even though it is the same. Well if vocalizing the word levels up the variances why would you care about the variances in spelling. :):):)

I think that most of dbases produced with the idea of seeing the report/s that is where you would have to denormalize the tables back again and if your 3 rules are not compromized - keep the tables denormalized.

At the end bunch of links
http://pubs.logicalexpressions.com/Pub0009/LPMArticle.asp?ID=88

Rules of Data Normalization http://www.databasejournal.com/sqletc/article.php/1428511

Database Normalization Basics http://databases.about.com/library/weekly/aa080501a.htm


Normalization http://www.troubleshooters.com/littstip/ltnorm.html

Overview of Normalization http://www.gslis.utexas.edu/~l384k11w/normover.html

Rules of Data Normalization: http://www.datamodel.org/NormalizationRules.html
 
Last edited:
In a properly normalized system, normalized tables are minimized if your keys are well-chosen. The relationships among data elements makes it easier for Access to HELP you write your queries.

A specific comment: Your hypothetical case of rapidly-changing company representatives means that your design of your representative table was in error. You would have TWO tables - a rep table and a company table. The rep table would include the key to the company table - plus a date, since you made a point of it. So I have to say the problem you attacked was attacked incorrectly from the get-go.

If you have to increase the number of queries to get at something due to normalization, you might not be doing it right. Unless there is the need to "relay" a query (i.e. make a query of a query), the same number of questions usually leads to the same number of queries (almost every time).

aleb, you most certainly can express your opinions here, and I would NEVER tell you otherwise, but some of the inexperienced folks might read your words and mistake them for gospel truth. The rules for normalization came about for good reason. They MEAN something in a relational database. Oh, by the way, in the commercial environment, try to build a non-normalized DB. Your boss will scream at you as he escorts you from the area.

Now, if you had a flat-file design, I can understand why you said what you did. But in a true relational design, denormalization should be used SPARINGLY. Notice I didn't say NEVER. We disgree on criteria, I think. A database can be denormalized only in specific situations where your analysis already SHOWS that the lazy man's programming method will be adequate in your specific problem. But in general and in the abstract, normalization (at least to 3rd normal form) is never a mistake in the Access environment.
 
I would suggest that Normalisation is not about the three items suggested. They may be the consequence of normalisation but you need to go back to the Relational Algebra that forms the basis on which Relational Database Applications are built to actually appreciate normalisation.

Having joined a company where Normalisation rules were just about unknown I can trace just about every problem back to a lack of normalisation. I would agree with The_Doc_Man in that there may be circumstances in which an un-normalised design may be required but it is something I still have to discover.

You are entitled to your opinion but my opinion wis totally the opposite. Normalise, Normalise, Nomalise.

We see a great number of posts here from people learning Relational Databases whose problems stem from a lack of normalisation. I do not believe I have ever seen a solution proposed to a problem that suggested De Normalisation.

Len B
 
Auferte malum ex vobis-extort evil=denormalization from you (latin) :)
The idea of this thread was ( and is ) to provoke a more lengthy discussion on denormalization/normalization.
I've asked couple of my friends to try to explain to me why would they want to normalize the tables. They all explained to me we need to keep ... to avoid ... and so and so forth. Notice no one said to me ... "it depends on ..."
That is why I put at the end the links on the normalization topic.
I also would like to hear the opinions and instances when denormalized tables were used. :)

========================
Ad gloriam of normalizaiton :):):)
 
I do not read, speak.write or understand latin :confused:

I too would be interested in circumstances where un-normalised design is preferable, advantageous or indeed desireable.

I believe that the contributions to this thread indicating preferable, advantageous or indeed desireable aspects on un-normalised tables will be very few.

Len B
 
I've asked couple of my friends to try to explain to me why would they want to normalize the tables. They all explained to me we need to keep ... to avoid ... and so and so forth.

Yes. You normalize a DB to avoid having to unneccessarily duplicate data that occurs frequently. For instance, don't repeat the name of a company in a purchase record when you can build a foreign key to point to the common record in the company table.

You normalize a DB to keep data when the last ACTIVE record referring to that data goes away. For instance, keep the supplier company in a supplier table so that you have the information even if at the moment you don't happen to have anything in stock or on order that was supplied by that company.

You normalize a DB to shorten records that would otherwise be interminably long. Why type out long company names when you already KNOW the name? Just point to the company record.

You normalize a DB to keep variable numbers of detail records. A common problem we see here is the "repeating group" issue. How many line items do you put on an invoice? As many as you can sell, of course. But if you had repeating slots in a single record, you end up wasting space on small invoices and having to double-allocate for really big invoices. But if you normalize, you have exactly as many (normalized) detail records as you need.

aleb, your question betrays flat-file thinking. This viewpoint is not really wrong for simple problems but eventually, as your problems grow, you will find that normalization is NOT evil. It is your friend. I guess I will accept that you wished merely to open a discussion of theory, so I won't jump on your stuff as though you were an impudent clown. But you have to realize that normalization didn't come about 'cause we thought it meant better job security. (The mere fact that often it DOES mean that, through, is not a factor to be overlooked.)

Normalization rules were discovered in the crucible of the business world. They came about to solve specific problems. They exist for a purpose. You can say that you don't agree with them, but that merely means you don't understand the purpose.

Now, having said that, I can understand that sometimes the modern tools - Excel for one - make it EASY to build a flat file without having to link to this or that or the other record. But that doesn't make normalization invalid. It merely means that a tool exists for flat-file thinkers because there is a need there, too.
 
Norm vs DeNorm

Say you had a db/table that recorded say member info. Name, Address, etc. Now say would like record their hobbies. A fully normalized method may have fields like hobby1, hobby2, etc or may even have an entirely different table to track hobbies.

On the other hand, a de norm'd db may just provide a memo field were you just type in names of the different hobbies.

There are cases where either method is valid, it just depends on how you need to use the information. The first example is simular to a really slow db we did where we over normalized.

ken
 
KenHigg said:
Norm vs DeNorm

Say you had a db/table that recorded say member info. Name, Address, etc. Now say would like record their hobbies. A fully normalized method may have fields like hobby1, hobby2, etc or may even have an entirely different table to track hobbies.
ken

A fully normalized system would never have fields like hobby1, hobby2, hobby3, as this is denormalized in structure - i.e. you are storing blank fields when a person has only one hobby or not having enough fields when a person has more then three hobbies. A proper normalized design would be like:

tblMember
MemberID PK
FName
MName
LName
Address
Sex
etc....

tblHobby
HobbyID PK
MemberID FK
HobbyType

Relationship
One to Many relationship between tblMember and tblHobby on MemberID

This way a member can have 0,1, or infinite listed hobbies...

I think this shows, in more a real world example, the benefits of normalization vs. denormalized data. WIthout this structure you limit the information able to be obtained by having too few fields or end up storing lots of blank space due to having too many fields. Same holds true for having a memo field with all the hobbies listed in that one field. Either you store a large field type when it has minimal data or you end up having lots of data that is not in a workable format that requires extra coding/work to get it useable vs. the normalized solution that is ready to go - i.e. if you want to know how many people listed "football" as a hobby in the normalized structure above you query for hobbies = "football" one field on search. In the denormalized setup of multiple hobby fields you have to search for "football" in each field and then do some addition functionality because the user could have entered "football" in any number of the hobby fields (hobby1, hobby2, hobby3, etc..) in the memo field denormlaized structure you would have to first parse the field (assuming a common character was used) then, after parsing, manupulate the data to find the result for "football".

Thats another thing out proper normalization - it may require more fore thought and planning initially but the end results is that working with the data on the other end is much, much easier! ;)

HTH,
Kev
 
a. never = big word

b. 'an entirely different table to track hobbies':

tblHobby
HobbyID PK
MemberID FK
HobbyType
 
A fully normalized system would never have fields like hobby1, hobby2, hobby3,..

a. never = big word

Yes, but in the context it appeared, it was entirely appropriate.

In a properly normalized system, the repeating group represented by hobby1, hobby2, hobby3 AS FIELD NAMES will NEVER occur 'cause they got promoted to having their own table.
 
KenHigg said:
a. never = big word

b. 'an entirely different table to track hobbies':

Yes, never is a big word - but it applies in this case. The only time it doesn't apply is in instances when you know that everyone will have 2 entries and that you will be tracking both. FOr example, have 2 address fields: ( HomeAddy, WorkAddy ) is pretty common in alot of DB's I have seen. Here is a case where it is ok to have this structure as they are similar but have different data - i.e. one is work and one is home. I guess it sometimes becomes a grey line here but 99.999999% of the time it applies.

Also - I realize that you said 'an entirely different table' in your original post and this is correct - I was just pointing out that you put the qualifier "or" in there as well and I was stating that in my opinion there is no "or" its simply and only this way....

cheers,

Kev
 
I have a case where, despite stern talking-tos from Pat Hartman and others, I've been using a denormalized table, because it's made it easier to set-up in forms and has kept the records to a minimum.

My example: My company has other companies with whom we work (tblCompanies, with 90+ records). Each company needs to fill out a contract to work with us (tblContracts). We track 25 components of each contract, and right now it's set up in the unnormalized tblContracts with 90+ records, and a field for each of the 25 components (so 26 fields, with the matching ID to tblCompanies).

This gives us 90+ records, and leaves plenty of nulls.

The alternative is to have 2 fields, one for the ID to tblCompanies and then just a long list of which components they've completed, and another table (tblComponentDesc) with 25 records describing each component.

However, real world constraints say: EVERY company has to fill out EVERY component eventually. That means, every time I put in a new company to work with us, I'd have to run an append query putting in 25 new records in tblContracts with null spaces, and wait for each component to get filled in as they mail stuff to us.

Which is better? One table (tblContracts) with 90+ records and 25 fields, or two tables (tblContracts and tblComponentDesC) where one table will now have over 2250 records, will grow by 25 records at a clip, and still have the same number of null fields as before...except now it will be harder to set it up in Form View since each contract component doesn't have it's own designated field?
 
Perhaps you should start out with a list of business rules that define your use of contracts. Such as:

A. All new customers get a standard set of contract components.
B. Components, once applied to a contract, can never be changed.
C. If a change in contract is required, the following must take place:
1. xxx
2. yyy
3. zzz

Make sense?

This can then be used to design a proper data model.

???
ken
 
KenHigg said:
Perhaps you should start out with a list of business rules that define your use of contracts. Such as:

A. All new customers get a standard set of contract components.
B. Components, once applied to a contract, can never be changed.
C. If a change in contract is required, the following must take place:
1. xxx
2. yyy
3. zzz

Make sense?

This can then be used to design a proper data model.

???
ken

Makes sense, Ken, but our lawyers have already decided: "If companies want to work with us, they will fill out this contract - NO CHANGES." If other companies don't like it, we say "Well, it's been a pleasure" and we go our merry way. So - there are never any changes. These are the parameters the legal-sharks have set.

My department just happens to be concerned with 25 particular components that apply directly to us, and we track when the companies get around to completing everything. This is all complex, and very boring.

I'm more concerned about what you pros think about my normalized/denormalized conundrum, and since aleb brought it up, I figured I'd throw my example out there. I see HOW to normalize it, but I don't know if I SHOULD normalize it, because of what I said before: which is better, one table with 90 records and 25 fields (plenty of nulls), or two tables, one of which will have 2250 records but only three fields (but still have plenty of nulls). And, how would you set up the second example in a Form view for data-entry, i.e. trying to get fields to repeat and line-up nicely in a uniform progression for each company for ease of use?
 
monkeytunes -

Your example is one where I believe the first example you posted is the proper design method (at least in my humble opinion) for the very reason that, while you are storing null values at this time, you said it yourself that all parties will be required to fill out all of the information - sooner or later - so your design concept works. Now, that being said, as Ken stated if you ever need to add a field or downsize the information then you will be royally (incert explitive here)... its a question of business function. If the data will not change then I, personally, would go with the two table configuration as having 25 fields is not the problem as each field in the table is unique and related to the table PK.

That being said, if the data required could change in the future (never is a big word ;) ) then the other setup is your best option... however you said before this is not a possibility... Also, ALso - your logic is a little flawed as you will not have 2250 records in the one table initially as you would not add data to that table until the data is made available - this is what seperates this design from the other, "flatter", table. You only add the data when its ready, thus, no nulls in this design and no append query for the 25 fields... make sense? If not I'll go further into this with an example if you think it would be useful to you...

HTH,
Kev
 
monkeytunes said:
Makes sense, Ken, but our lawyers have already decided: "If companies want to work with us, they will fill out this contract - NO CHANGES." If other companies don't like it, we say "Well, it's been a pleasure" and we go our merry way. So - there are never any changes. These are the parameters the legal-sharks have set.

My department just happens to be concerned with 25 particular components that apply directly to us, and we track when the companies get around to completing everything. This is all complex, and very boring.

I'm more concerned about what you pros think about my normalized/denormalized conundrum, and since aleb brought it up, I figured I'd throw my example out there. I see HOW to normalize it, but I don't know if I SHOULD normalize it, because of what I said before: which is better, one table with 90 records and 25 fields (plenty of nulls), or two tables, one of which will have 2250 records but only three fields (but still have plenty of nulls). And, how would you set up the second example in a Form view for data-entry, i.e. trying to get fields to repeat and line-up nicely in a uniform progression for each company for ease of use?


Please don't take this personal - but you have missed the point entirely. You should (in my humble opinon), let the business rules drive the degree to which you normalize your db, not the qwest for 5th normal form!

ken
 
Kevin_S said:
Also, ALso - your logic is a little flawed as you will not have 2250 records in the one table initially as you would not add data to that table until the data is made available - this is what seperates this design from the other, "flatter", table. You only add the data when its ready, thus, no nulls in this design and no append query for the 25 fields... make sense? If not I'll go further into this with an example if you think it would be useful to you...

I figured the data would be added the moment a company says "Okay, we'd like to work with you" - then we add their name to our database, and wait patiently while they send in thier contact components. I have 4 or 5 users who enter the data, and they like to see a nice screen like the one I've attached. I don't know how I'd set it up using a datasheet subform or continuous subform, let alone how it would get set up without all 25 spaces being put in every time we get a new prospective company. (The Project name and company name would be from a related table, a la normalization. But the contract stuff would be 25 fields, one record for each project, because each project has full contract. To date.)

KenHigg said:
Please don't take this personal - but you have missed the point entirely. You should (in my humble opinon), let the business rules drive the degree to which you normalize your db, not the qwest for 5th normal form!

Nothing personal taken; I'm here to learn. I learn more from mistakes pointed out to me than I ever will from people patting me on the back.

It sounds like you agree with Kevin_S, that in this case, the denormalized table produces the results we need, so it can slide? (But just this once...)
 
Last edited:
I would agree with The_Doc_Man in that there may be circumstances in which an un-normalised design may be required but it is something I still have to discover.
- I've been creating applications based on relational databases since before most of you were born. The only place where I have ever used unnormalized tables is in Data Warehouse applications. In a Data Warehouse, the data is static and only refreshed periodically by additions/replacements from the live system. The idea of a warehouse is to give users a datasource that they can use to create queries/reports. But even in a data warehouse repeating groups are not created. They are simply too difficult to work with. The items that are "flattened" are mostly "lookups". So the customer information is added to the order header to eliminate the need for a join but the order detail items would be left as is, if they were added to the warehouse at all. No attempt would ever be made to "flatten" them into a single row. The other type of table in the warehouse contains summary data so the users don't have to keep summarizing details. Sales might be summarized by month if that is a frequent request. So instead of keeping detail for each order, the warehouse would contain a number of summary tables that slice and dice the detail by various things such as product, salesperson, month, etc. Typically datawarehouses are huge. They are much larger because of the de-normalization and therefore slower than the live database.

Someone suggested that address1, address2 were a repeating group. This is not true. There is really only a single address. The address may be broken into 2-4 "lines" to facilitate printing. Street addresses are difficult to parse and it is far better to have a human decide how best to structure the address for printing. In a properly normalized address, the street address will be broken into 1-4 generic pieces to facilitate printing but the city, state, postal code, and country (if necessary) will occupy separate columns. In an application whose core functionality revolves around mailing (such as some systems I created for Reader's Digest), the address field will actually be broken down differently to ease finding duplicates. For example the fields might be houseNumber, StreetPrefix, StreetName, StreetSuffix, AptNumber, RRNumber, etc. Then there would be business rules on how to combine the various separate fields into address lines 1-4 for printing.

As to monkeytunes 25 fields; I looked at the example and couldn't figure out what data they were intended to contain so I can't tell if they are a repeating group or not.

It is possible to go overboard with how you define a repeating group, I've seen some people try to argue that ALL tables should contain only four columns - primary key, foreign key, field type code, and a field value.
 
Pat Hartman said:
- I've been creating applications based on relational databases since before most of you were born.

Is that back when databases were all in black and white?

Great points Pat, and lots of good things to think about. This thread has generated lots of food for thought that's going to take me a long time to take in!


Pat Hartman said:
As to monkeytunes 25 fields; I looked at the example and couldn't figure out what data they were intended to contain so I can't tell if they are a repeating group or not.

Here's another example, with two sets of tables. The is "tblGCContracInfo", which has company names and contact information. Then there is a denormalized table "tbleContracts" - these two tables feed a query which then feeds frmExample. (The data in the "app" fields could be dates, could be dollar amounts, could be checkmarks - I changed everything to dates in my example merely for convenience. The data in the Project Number and Project Name fields are always unique, and decided by entities beyond my department.)

The other two tables normalized - "tblCompanyContracts" and "tblComponents". I tried to seperate them in the Relationships window. Hopefully this will give a better idea of what I currently have in place to accomodate users versus the normalization practices which we all try to follow.
 

Attachments

Last edited:

Users who are viewing this thread

Back
Top Bottom