Do you see normalisation as mandatory?

HHUK · Mar 26, 2008

Hey guys,

I was just wondering, do you all see normalisation as mandatory?

I spent months on it in college, it was drilled into my head.

Early on in my latest project I was setting up all the tables to make it normalised, I did abit of tinkering and got carried away, forgetting about normalising. My database runs extremely well, it's fast and in my stress tests or 5 users using the database simulataniously I haven't ran into anything odd.

Any recurring information is generated in the form dropdowns and there are no complicated "transactions" taking place.

As a backup, I have everything ready for the normalisation progress, the other tables are sitting there ready, I just can't be bothered to go through and link them all up properly. The only downside at the moment is that I could get a speed hit.

Am I a sinner?

Rabbie · Mar 26, 2008

If it works - don't fix it.

If your data is truly not normalised then you run the risk of inconsistencies creeping into your data because you will have the same info stored in more than 1 place and if that data changes then you need to update it in several places.

Without seeing the schema for your DB I can't say if you are looking at mega problems or not. It may well be that your data is sufficiently normalised to meet your needs.

In many cases you only need to be at 3NF or maybe not even quite there.

Put it this way I wouldn't want to see a customer's address stored anywhere except in the customer table which would be linked to the Orders table for example.

I suspect your DB is more normalised than you imply.

I am by nature a pragmatist and don't believe in slavishly following rules just for the sake of it but you need to understand the rules to know where you can bend them without getting into trouble.

Brianwarnock · Mar 26, 2008

This has been debated on here before. I have created many small nonnormalised DB.s in my time, but they have all been for shortlived projects. Many see normalisation as a technical exercise, it is really a business necessity as it allows for the business to change without having to rewrite the DB. eg more options .

Those who have worked with DBs for any length of time will say normalise, they've been bitten.

Brian

Mike375 · Mar 26, 2008

Rabbie said:
If it works - don't fix it.

I am by nature a pragmatist and don't believe in slavishly following rules just for the sake of it but you need to understand the rules to know where you can bend them without getting into trouble.

Hit the nail square on the head.

gemma-the-husky · Mar 26, 2008

you may really have normalised your dbs - the main error I think is spreadsheet thinking, storing multiple values in one field, or multiple columns of data holding the same thing.

doing tihs makes sorting/filtering extremely hard, as you have ot start using union queries, and so on.

If you can live with this, or have conscioulsy donr this, then its not a problem.

eg, in one of my databases, i store the last 5 effective selling prices and dates in the product line, rather than in a price history table - i know its not normalised, but its sufficient for the purpose, and works ok

Rabbie · Mar 26, 2008

Having given this a little more thought i feel that you need to normalise to 2nf and possibly to 3nf but see the Exception in the extract below from the Microsoft support site

Third Normal Form
• Eliminate fields that do not depend on the key.
Values in a record that are not part of that record's key do not belong in the table. In general, any time the contents of a group of fields may apply to more than a single record in the table, consider placing those fields in a separate table.

For example, in an Employee Recruitment table, a candidate's university name and address may be included. But you need a complete list of universities for group mailings. If university information is stored in the Candidates table, there is no way to list universities with no current candidates. Create a separate Universities table and link it to the Candidates table with a university code key.

EXCEPTION: Adhering to the third normal form, while theoretically desirable, is not always practical. If you have a Customers table and you want to eliminate all possible interfield dependencies, you must create separate tables for cities, ZIP codes, sales representatives, customer classes, and any other factor that may be duplicated in multiple records. In theory, normalization is worth pursing. However, many small tables may degrade performance or exceed open file and memory capacities.

HHUK · Mar 26, 2008

My database is based on clients.

The only possible repeated information is:
Nationality
Religion
Ethnicity
Support Worker
Support Type
Diagnosis

Almost everything else is pretty unique, I guess I'm blessed with a good set of data to work on.

I have tables with lists of these possible repeated items with ID's, etc. They're just not being used.

Dropdowns are used for storing the data instead, for example, the drop down for the diagnosis' Row Source says: "Depressive Type Illness";"Bipolar";"Schizophrenia";"Anxiety Disorder";"Personality Disorder";"Dual Diagnosis";"Other"

All my reports seem to work and count everything correctly.

I guess I'm being lazy yet really organised, with a backup plan.

gemma-the-husky · Mar 26, 2008

you ought to use the lookups

if you jiust rely on text entries and enter eg

Bipolar
Bi-polar
Bi Polar

or even miskey and get
Bioolar

you will have many different examples of the same condition - which will interfere with your stats/sorting etc

kodiak385 · Mar 31, 2008

Not being a true Access pro and still mastering the fundamentals of database design, I ordinarily wouldn't throw in my two cents, but I've recently started getting into the practice of normalizing data, and I'm finding that while it does take a little more time to set up initially, it is invaluable when trying to sort, filter, and summarize all the data you're capturing in the 'base.

Beyond that, working at my *real* job (lol), I co-manage a group of associates in a call center where we use a fairly elaborate customer relationship management system, and just trying to get them to consistently enter and query for data the same way is a NIGHTMARE. Having this experience, it is obviously in the forefront of my mind then when designing my own databases, lol. Gemma nailed it. Any time you have ANY kind of simple repeating data for which you may ever want to search or group by, USE A LOOKUP. Whether you've normalized it or not, it is extremely beneficial for that information to be uniform. Any time you give any user the opportunity to enter data in free form, you WILL get anomalies in your data.

gemma-the-husky · Apr 1, 2008

HHUK

if you just have a relatively small system with one main table and a few lookups then it may not matter too much

once you get more complex, you will really start to struggle without proper normalization. The key is to try to avoid spreadsheet thinking. A database is quite different - you rarely if ever see all the data - mostly you are seeing a snapshot of a selected part of it. eg, in your case, details for a single patient, or analysis of different outcomes for a given period. In order to get these results the data has to be properly normalised. If a diagnosis is "bipolar disorder" as you say, it needs to be entered consistently every time, so that you can count the frequency of bipolar disorder.

-------
In your case, this sounds like some sort of medical/social support type thing - the types of information you have noted are straightforward lookups, so maybe your system really is a very simple one

but as and when you expand the system, you may need to have tables for consultation events [you dont mention followup appointments - is every consultation a single event, or do you have repeat visits for certain clients], multi-item prescriptions related to consultations, training records for your support workers, different grades of support workers, personnel systems for your workers etc, ordering systems for drug supplies, case conference information, and so on. What if your clients addresses/contact details/phone numbers/next of kin changes. How do you deal with this.

As your users use things and get used to Access, they will ask you to add more and more facilities.

As you add extra facilities, you will most certainly need to deal with complicated system design and these normalisation issues.

Hope this helps

tehNellie · Apr 3, 2008

Exactly to what level you should normalise is very much a "how long is a piece of string" question. You can and many would say should take everything to 3NF at the very least.

One thing to really bear in mind is the long term consequences of accepting now that it's not "properly" normalised but living with it because it's ok with the records in it and the 2 people who currently use it follow the rules slavishly to prevent your data going too awry.

I struggle to keep a very badly designed database working that I was never allowed to fix when it only had small volumes of data in it because it was going to be thrown away in a few months and we needed the new stuff put on top of the wrong basic design NOW NOW NOW!. That was 3 years ago and it's likely to be needed for another 2 years or so. Now instead of looking after 20,000 records it is trying to manage 2.5million with duplicate information and what should be static data manually typed in scattered all over the place. We very much suffer from people typing "biPolar", "Bi-Polar", "bi polar" in fields where there should be, basically, dropdownlists. Of more concern is that the design deliberately duplicates information in the main tables with all the inherent problems that implies.

With a bad initial designed left to gather data you'll find it much, much harder to put it right in the future especially when someone wants to add addtional functionality into the database.

Rich · Apr 3, 2008

Where's Mikes 300 penneth worth?

Pat Hartman · May 18, 2008

I have been designing and building database applications for almost 40 years and the short answer is YES! I do violate the rules but only when creating databases that only I will use. My contacts database is not third normal forn and neither is my billing database but databases that I build for others will ultimately (if they are good) be taken over by other people once I move on. They will be modified and grow over time. If the structure starts out less than optimum, they will be significantly more difficult to expand.

As to your combo question - I solved the problem once, many years ago and so now I just use the same solution in each new application. I created a table and a couple of forms and reports that manage ALL of my simple lookup needs. I don't have to create separate tables for each new lookup and I don't have to hide the data in the table definition. My solution allows the user to take responsibility for his own code lists so I won't be bothered having to add new elements.

georgedwilkinson · May 18, 2008

Dang, I can't believe I didn't comment on this when it was originally posted! I guess I was busy doing actual real work (lol).

Yes, normalization is mandatory for an OLTP system. For all the reasons Pat said.

Plus, when the non-normalizing database developer goes and builds their next non-normalized database, abandoning the project just as it starts breaking, the stakeholders are gonna call somebody like me who spends days fixing crap on the old database just so it'll continue working.

Also, there is absolutely no reason to not normalize from the very beginning from a project time/cost perspective. I defy anybody to develop a non-normalized OLTP database that meets all of a company's requirements in less time than I can build a normalized OLTP database that meets those requirements. In fact, unless it is a data warehouse/reporting project, I doubt a non-normalized OLTP system ever met most requirements, except on the surface.

From a speed perspective, though no-one has done anything but subjective tests, I sincerely doubt that a large system performs better with non-normalized data structures than a fully normalized system. In fact, if you look at it the design of DBMSs closely (Oracle publishes and teaches the internals of their DBMS), you would be led to believe that tables that are less wide and lots of data spread over many tables are faster (stay in memory longer, easier to load into memory, etc.) than tables that are super wide (like most of the crap I have to come in and clean up). I'm know that SQL Server is similar in the way it uses and optimizes its memory and believe the same is true. Same for MySQL. Not sure about Access.

FWIW

Guus2005 · May 19, 2008

Is this a poll?

DONT Normalise when designing a datawarehouse, in all other cases DO! For all the above reasons.

Usually a datawarehouse is build from a normalised database...

Len Boorman · Jun 17, 2008

Having just changed employment they wanted somebody who could take them to the next level.

Next level I hear you say. At the moment they are in a hole and digging furiously.

No ERD, No Relationships, No Referential Integrity, No Normalisation.

At the moment after just over a week I have put out more fires that Red Adair.... and some ***** is still lighting them.

Have you ever tried fixing a database without things mentioned above.

Try doing a crossword, without the clues, dodgy pen and standing on one leg. Thats similar

Yeah.... Normalise and the other things unless you have 59 good reasons not to

Len

The_Doc_Man · Jul 13, 2008

As a pragmatist, I have always taken a Laissez-faire view of normalization.

For my little home address book database, I don't care if it isn't normalized. For my list of music CDs or movie DVDs, I want that to give me some nice reporting options. For the personnel database out at work, if it ain't at least fully 3rd normal and verging on 4th normal in spots, we can't write the queries we need to run in our lifetimes.

It is also important to realize that my home address book won't be redesigned and constantly tweaked. But a personnel database gets tweaked every time Congress passes a law about personnel records management.

Finally, my address book database just has to print out mailing labels for Christmas cards and let me look up Uncle Bennie's address when I want to send him an anniversary card.

My work database has to track a specific class of military personnel with regard to length of service (and therefore retirement points), rank or rate, skills inventory, geographic location, elibigility for overseas duty, etc. etc. I have to know the person's name, the military unit with which they are affiliated, and the billet they fill within or external to that unit. So three base tables right there, plus the junctions for each. The more complex the task, the easier it will be to normalize it first and build reports second.

Therefore, no, it is not required to normalize everything in sight. But if there is anything complex to be done, normalization sometimes simplifies your work.

tokoloshi · Aug 10, 2008

To normalise or not to normalise is not the question

I know that this discussion is a bit old, but I also know that many people will read up on the topic and may want to add to the body of knowledge, not just the discussion.

In my case, I don't think that it is a question to be considered. Every database eventually gets to the stage where you hope that you had normalised it from the beginning - and that also is not the issue.

The real issue is getting the rust out of the way of the thought process and think in a normalised fashion from the outset.

Constantly ask yourself if the design you are creating is the best you can create for the initial life cycle of the product - which is going to be transaction processing intensive.

When you are having to right a reporting solution off the same database constantly ask yourself if you have denormalised the database sufficiently to provide the performance that the user would always expect.

Normalising during the initial (OLTP) stage is a mindset, a discipline, not an option.

T

Banana · Aug 10, 2008

IMO, the real issue is that nobody has really deployed a solid design pattern for any higher normal forms. If it were easy to implement such things, then there would be less of need to question whether we've "over-normalized" or not. Because we ultimately are providing a solution to a real business problem, they couldn't give a hoot if the solution was in 4NF, 5NF or even only 2NF as long it works for them.

Some DBA will indeed break the rules if they decide it's easier to maintain the tables. An example:

tblCustomer:

Code:

CustomerID
FirstName
LastName
StreetAddress
City
State/Province
ZIP
PhoneNumber
EmailAddress

For most part this may be sufficient. Now in some cases, we may require a physical address (e.g. no PO Box). Some would then expand the above table to thus:

Code:

CustomerID
FirstName
LastName
MailingAddress
MailingCity
MailingState
MailingZIP
PhysicalAddress
PhysicalCity
PhysicalState
PhysicalZIP
PhoneNumber
EmailAddress

This is a definite violation of repeating fields (and this is just 1NF). Yet I'm fairly sure that this is far more common than having a table of Customers with a many-side table of Address *simply* because it's easier, on surface to manage query. Consider the one-many structure that would then satisfy 3NF:

tblCustomer:

Code:

CustomerID
FirstName
LastName

tblAddress:

Code:

AddressID
CustomerID
AddressTypeID <- references a lookup table, indicating what kind of address it is
StreetAddress
City
State
ZIP

Constrast the SQL:

Code:

SELECT 
    MailingAddress, 
    MailingCity, 
    MailingState, 
    MailingZIP 
FROM tblCustomer 
WHERE LastName = "Smith" AND FirstName = "Joe"

Code:

SELECT 
    StreetAddress, 
    City, 
    State, 
    ZIP 
FROM tblAddress 
JOIN tblCustomer ON 
    tblAddress.CustomerID= tblCustomer.CustomerID 
WHERE 
    LastName = "Smith" AND 
    FirstName = "Joe" AND 
    AddressTypeID = "Mailing Address"

We now have a join predicate and an extra condition to satisfy, and almost always, a SQL with join predicate is much slower than a vanilla SELECT SQL. (I'm not going to get into discussion of JOINs vs. subqueries- too specific to various RDBMS's engines). So, when we develop into a full blown solution, some DBA will just say why not just look the way and allow repeating fields in tblCustomer; all we really care about is whether they have a physical and/or mailing address, no more, and it will work for most part. With company A where 99% of their customers only reporting their physical address, they're happy as clam to have such design even though we've violated 1NF.

OTOH, Company B may have a slightly more stringent requirement- say it specializes in services that is closely overseen by IRS and therefore they have to be far more diligent in keeping their records straight. Therefore, they need far more details about customers' addresses than Company A could ever possibly care about. Therefore, they will not be satisfied with the former design and will elect to use the tblCustomer->tblAddress design.

One interesting thing, though. It was The_Doc_Man whom pointed out this possibility. Suppose we only wanted physical and mailing address. Would this not also satisfy the 3NF *and* is simpler:

tblCustomer:

Code:

CustomerID
FirstName
LastName

tblAddress:

Code:

AddressTypeID <- References a lookup table
CustomerID
StreetAddress
City
State
ZIP

The difference being is that we've omitted AddressID, and instead made a compound key of (AddressTypeID, CustomerID). This therefore guarantees that any given customer will have at maximum, *one* mailing address and *one* physical address. This provides us with a true one-many relationship but with proper constraints in place without us having to enforce it through code at all. A good thing™, IMO.

Furthermore, I suspect when we move higher in normalization, we may have to reconsider how we normally write queries. I've already mentioned about how we've moved from a simple SELECT to one with JOIN and an extra WHERE. Yet, this seems to me easier to write:

Code:

SELECT 
   Address, 
   City, 
   State, 
   ZIP 
FROM tblAddress 
WHERE CustomerID = 
            (SELECT CustomerID 
             FROM tblCustomerID 
             WHERE LastName="Smith" AND
                       FirstName = "Joe") AND 
   AddressTypeID = "Mailing Address"

With a subquery, I only need to concern myself with one table at a time without joining, and it's just two WHEREs condition, rather than a JOIN and two WHEREs. Much easier to write and understand, especially when things get more complicated than, I would think.

Mind you, I'm not even touching the question of whether joins or subqueries are faster or not. I'm strictly looking from the lens of how much work does it entails *ME* to do. I'm a lazy bum, and I suspect several of programmers are. They would rather have the menial tasks automated and will easily work hard to get it automated somehow. Which is why I said earlier that if 4NF/5NF were braindead to implement, everyone would be doing it.

And all this time, I've been talking about table design and not the actual implementation detail! IMO, implementation does influence how we may ultimately design stuff.

For example, it's easier to just have everything in one updateable query to be used as a recordsource on a form. Drag'n'drop or use wizard to build the fields, tidy up here and there, add code in form's BeforeUpdate for validation, and poof! we're done. Nothing more to do here. But with a one-many, we now have to involve subform. Not much more work when it's just one-many.

It's actually the third time that is bugbear. Consider this structure for managing classes, attendees and an attendance roster:

tblClass:

Code:

ClassID
ClassName
StartDate
EndDate

tblAttendee:

Code:

AttendeeID
FirstName
LastName

tblRegistration:

Code:

AttendeeID
ClassID

tblSession:

Code:

SessionID
ClassID
SessionDate

tblAttendance:

Code:

SessionID
AttendeeID

Assume that we have a separate form for inputting new classes and attendees and registering attendees to certain class, we now need to create a form for tracking attendance. This means that we have to check that we have a given class, then get all sessions for this class, then check the registration roster to get all attendees, then *only* from each list created can we create new record. We certainly do not want attendee attending session for class they weren't registered for!

The most simplest implementation is to create a form with two combobox, one rowsource for all sessions matching a certain class, another rowsource for all attendees registered for a class. But this can be a bit confusing to end users, and tedious as they have to select sessions multiple times when they should be concentrating on attendees.

The next implementation would to be have a subform for all sessions for a given class. The subform would in turn have a subform for the attendance roster which we now have the Session part fixed to the parent subform and therefore only need to add attendees. The rowsource of attendee would refer to the class at top level to filter only attendees to those actually registered for the class. This implementation would use a total of four queries to get the job done; the recordsource for all class on the form, the recordsource for all sessions per given class for subform, then recordsource for the attendance roster and the rowsource.

Constrast this with the design (drop tblSessions, all other table are identical):

tblAttendance:

Code:

ClassID
AttendeeID
SessionDate

Therefore, on the form for attendance, we only need one subform to relate all tblAttendance records to tblClass, one combobox to select Attendee then enter the SessionDate (doesn't have to be manual; can be done with default value or something). We just went from two subforms and a combobox with four queries to one subform and one combobox. Even more, this design can still be as bit as stringent because we can specify that all three fields be a compound key. Far more simpler to maintain, no?

So, whenever anybody talk about normalization, I don't think it's good enough to just think about table design, but also the implementation details because an over normalized design, while it will work, will be hard to implement. The example of class-attendee-session may seem to work OK, but I can tell you from experience that in practice, this is not so simple and you will find yourself constantly rewriting queries that will satisfy all the conditions, to a point where you've just about but not quite over-constrained on the design.

dkinley · Aug 10, 2008

Props to Banana for an excellent and insightful post!

Banana said:
I'm a lazy bum, and I suspect several of programmers are.

Well said! I think what tips the scales of someone being a good programmer is to what degree of laziness are they?

Too lazy to learn or do the research to implement good design to the nth degree? Or do just enough to get the job done, forget about sustainability and resiliency. How about how good is their inline documentation?

I admit, I am guilty of your first example ... physical/mailing addresses. I am lazy and have expanded the table to hold both pieces of data. Sign of laziness? Nah - according to my crystal ball when I designed those pieces, the database lifespan was never going to be long enough for it to be an impact.

Now, if I wasn't lazy, I would have 3NF'd it right on up for the off-chance of extended lifecycle and sustainability if the database design was expanded upon. No doubt I and many others have cursed previous programmers of databases for a 1NF implementation or partial 2NF when it CLEARLY should have been a complete 2NF and whomever had to expand on one of mine probably cursed me for doing the same by not taking it completely into the 3NF or nNF realm.

My decision-making is based on requirements gathering and whatever my crystal ball tells me what the stakeholders didn't tell me. It does not include a dash of "Am I gonna be long gone when this sucker breaks?" That's just bad business.

-dK

Do you see normalisation as mandatory?

Registered User.

Super Moderator

Retired

Mike375

Guest

Super Moderator

Super Moderator

Registered User.

Super Moderator

is using THE INTERNET!!1

Super Moderator

Registered User.

Rich

Guest

Super Moderator

AWF VIP

AWF VIP

Back in gainfull employme

Immoderate Moderator

Registered User.

split with a cherry atop.

Access Hack by Choice

Similar threads

Users who are viewing this thread