Do you see normalisation as mandatory? (1 Viewer)

georgedwilkinson · Aug 10, 2008

Banana said:
IMO, the real issue is that nobody has really deployed a solid design pattern for any higher normal forms.

Sure they have.

Banana said:
If it were easy to implement such things, then there would be less of need to question whether we've "over-normalized" or not.

There's no such thing as over-normalized. There are only designs that meet the requirements now and in the future (which is required to be fully normalized) and those that don't. On the other hand, there are those who understand the designs and those who write tons of code to compensate for a poor design.

Banana said:
Because we ultimately are providing a solution to a real business problem, they couldn't give a hoot if the solution was in 4NF, 5NF or even only 2NF as long it works for them.

Interesting thing about normal forms, it is VERY RARE for a database designed to 3NF to not already be solved for 4th, 5th, BCNF, etc. In fact, you almost have to work hard to un-normalize to higher than 3NF and still be in 3NF.

Banana said:
I'm strictly looking from the lens of how much work does it entails *ME* to do. I'm a lazy bum, and I suspect several of programmers are. They would rather have the menial tasks automated and will easily work hard to get it automated somehow. Which is why I said earlier that if 4NF/5NF were braindead to implement, everyone would be doing it.

I'm also incredibly lazy...most of my database work entails getting the table design right. The rest just kinda falls into place. See my note above about 4NF/5NF...they are easy to implement, usually only requiring thought to implement 3NF...it is HARD (not impossible) to NOT implement HNF if you have normalized to 3NF.

I think people get confused over what higher normal forms mean. I have actually had intelligent people "over-design" an address system, fully normalizing the over-designed system. Instead of having an address 1, address 2, City, State, etc., they object oriented it into geographic components with x/y/z coordinates. The geographic component design was totally normalized to Boyce/Codd and beyond (DKNF), but it didn't fit the business requirements. However, it made the designer feel good. It was stupid.

Because the word "normalization" was used and because this particular designer was fond of saying our databases were not properly normalized (that was true), all these smart people thought that normalizing to HNF meant having an unusable design...nope, not true. 5NF is as easy to implement as 3NF, being almost identical (except the hard work you have to go through to un-normalize so it is not 5NF and still maintain 3NF).

You can fully normalize an awful design and you'll still have an unworkable system.

dkinley · Aug 10, 2008

One last thought ... and Mr. George brings up a good point -- fitting business requirements.

In the telecommunications industry, especially cellular, reducing latency is a primary requirement which places emphasis on the number database accesses to setup and tear down a call. The quantity of database accesses can signficantly contribute to that 'real-time conversation' feel (the time it takes to send/retrieve data and the number of accesses uses up bandwidth the voice packets could be using). Normalization in this sense has a different definition than a financial organizaton. Telecommunications companies tread a fine line in under/over normalization.

These companies are forever researching new schemes and algorithms especially in the push-to-talk where alot of the call set-up is based on fuzzy logic to minimize latency.

In this sense, normalization is mandatory to a certain degree - not 'laziness', but only because it fits the business model.

-dK

georgedwilkinson · Aug 10, 2008

Hmmmm...having worked as the manager for the database department at a telecomm, I'm not quite sure what you're saying. Switches record exact event data with no (what we think of as) normalization but that would be normal since a switch is not a "business system". It would be quite difficult to program every telephone switch in the world with every possible piece of data to give them the data they need to operate in what we think of as a fully normalized manner.

However, I submit to you that the data in a switch or other piece of complex hardware is already normalized. If the data is written into an "Event" table, and no other table, and it has the same cell number multiple times, same tower, same hardware identifier, etc., the data is still properly normalized for what the table "says" it stores...i.e. event data. No need to keep track of every cell phone user and every tower and every cell phone number and every piece of hardware...that's not what the system, and its single table, is about. It is about recording event data, which is fully normalized (that is to say, it doesn't suffer from the anomolies that normalization is designed to cure). So what if dK gets a new cell number...the system still records his call with the exact data it needs. So what if Banana gets a new phone, same reason. The switch doesn't have to "know" that to provide meaningful data processing.

That not withstanding, it is my sincere belief that a properly normalized OLTP system is much faster and more efficient than a non-normalized OLTP system. Of course, it's all about the business need (now and in the future).

Be assured that if the data design of a system can break (i.e. is not normalized), it will break. That's why we get so many questions on this forum how to write code and so few about how to design tables.

Banana · Aug 10, 2008

georgedwilkinson said:
Sure they have.

Reading my sentence again, I think I should qualify it a bit more. It is easy enough to normalize up to 5NF, but to actually implement, I've yet to see a simple interface where it was trivial to set up. For example, Access's subform control does wonders in making it simple to manage one-many relationship, but when we start to move to many-many, things get a bit more complicated. If there are such design patterns for implementation, I would love to see them.

There's no such thing as over-normalized. There are only designs that meet the requirements now and in the future (which is required to be fully normalized) and those that don't. On the other hand, there are those who understand the designs and those who write tons of code to compensate for a poor design.

Well, I think we're in basic agreement that overnormalization isn't a bad thing. But this is same as surveyor saying you can never have enough surveying data in the sense that more data is a good thing, but you are still constrained by time allocated for surveying a lot. With database design, the constraint may be time to market or implementing it. From what I've talked with other folks who may maintain a database for websites, they would rather have a simple design so they can not worry too much about satisfying the normalization constraint and just insert the data right now without hassle and without overhead. This was alluded to earlier with data warehousing.

Interesting thing about normal forms, it is VERY RARE for a database designed to 3NF to not already be solved for 4th, 5th, BCNF, etc. In fact, you almost have to work hard to un-normalize to higher than 3NF and still be in 3NF.

You are absolutely correct on this point. Many designs out there can be already 4NF and beyond without designer even knowing it.

I'm also incredibly lazy...most of my database work entails getting the table design right. The rest just kinda falls into place. See my note above about 4NF/5NF...they are easy to implement, usually only requiring thought to implement 3NF...it is HARD (not impossible) to NOT implement HNF if you have normalized to 3NF.

Then I'm afraid I'm doing something wrong because I know that when I have several many-many relationship to satisfy, Access gives me hell when I try to be lazy and use subform or so to avoid coding. I tried to give an example in my previous post re: class, attendee, and sessions, but I afraid this wasn't sufficiently clear example of where a proper design can be too much for Access controls.

I think people get confused over what higher normal forms mean. I have actually had intelligent people "over-design" an address system, fully normalizing the over-designed system. Instead of having an address 1, address 2, City, State, etc., they object oriented it into geographic components with x/y/z coordinates. The geographic component design was totally normalized to Boyce/Codd and beyond (DKNF), but it didn't fit the business requirements. However, it made the designer feel good. It was stupid.

Wow. coordinates? Who will actually know their coordinates for their house?

Because the word "normalization" was used and because this particular designer was fond of saying our databases were not properly normalized (that was true), all these smart people thought that normalizing to HNF meant having an unusable design...nope, not true. 5NF is as easy to implement as 3NF, being almost identical (except the hard work you have to go through to un-normalize so it is not 5NF and still maintain 3NF).

Well, I don't disagree that 5NF has to be unusable. The issue as I see it, is that any bound form are inherently for one table (even if we use queries as a recordsource). Things get ugly if you try to create a recordsource involving multiple tables, even if all you want to do is insert a parent record and related child record at same time. Doable, but not braindead simple, IMO.

Subform control sure is a boon for managing one-many relationship, but they still suffere the same limitation. As you move to many-many relationship, let alone one that is constrained by other factors, the implementation becomes problematic.

Moving beyond Access, many developers, I suspect, would just use unbound forms and cook up their solution which is probably only good for this case and will live & die with this case. Not to say that they can't reuse bits & pieces (a good one would!), but I've not seen a implementation that was very simple and effective in modeling complex relationship.

One exception, though. FMS Inc has a great example of using two unrelated subforms syncrhronized to a intermediate textbox which makes it very simple for case where we want two continuous subforms, one showing orders, another showing the details of the order. Access doesn't allow nested continuous subform, but this solution circumvent and works very well. This also can be put to use with complex relationships.

You can fully normalize an awful design and you'll still have an unworkable system.

An excellent point. Normalization is just a part of design, not the sum of design.

dkinley · Aug 10, 2008

Agreed with remarks about switches - in simplification; they only know the down-stream and upstream ... not to get off on a tangent (but watch me go), in respects to telecommunications databases are in use fetching HLR & VLR (location) information, authentication, billing, etc., for the setting up the call (e.g., getting all the switches to swing a certain way), maintaining the call (transferring users from one tower to another) or tearing down the call (releasing the switches/resources for other users) regardless of Layer 2 or 3 technologies.

I just remember in school, we spent a whole semester researching and analyzing these methods and attempting to minimilize it through theoretical approach.

-dK

georgedwilkinson · Aug 10, 2008

dkinley said:
Agreed with remarks about switches - in simplification; they only know the down-stream and upstream ... not to get off on a tangent (but watch me go), in respects to telecommunications databases are in use fetching HLR & VLR (location) information, authentication, billing, etc., for the setting up the call (e.g., getting all the switches to swing a certain way), maintaining the call (transferring users from one tower to another) or tearing down the call (releasing the switches/resources for other users) regardless of Layer 2 or 3 technologies.

I just remember in school, we spent a whole semester researching and analyzing these methods and attempting to minimilize it through theoretical approach.

In essence though, the Billing system is a fully normalized billing system. The Field Order system is a fully normalized Order system. The Equpipment system is fully normalized.

The real-time processing systems don't need normalization because people don't use them for transactions...they are already totally normalized to remove any possibility of data anomolies (i.e. no chance of data anomolies because all data is raw, real world, and immediate).

georgedwilkinson · Aug 10, 2008

Banana said:
Subform control sure is a boon for managing one-many relationship, but they still suffere the same limitation. As you move to many-many relationship, let alone one that is constrained by other factors, the implementation becomes problematic.

M:M has nothing to do with normalization...it is a business requirement.

Here is the solution for M:M: Many-to-many example

tokoloshi · Aug 11, 2008

1nf, 2nf, WTFnf - Look at performance

What I really like about SQL Server is the tools given to answer specific questions like - is my database design optimal?

All of the discussion so far is from the perspective of the human being interacting with the solution. It is either from the concerns of the developer or the user or the IT admin perspective.

If you had a sample workload of your solution or could get a dump of typical data for your solution and upsized that to SQL Server, then switched on SQL Server Profiler and created a performance trace log in Performance Manager you could extract a graph of your application's performance.

It takes all of the guesswork and religious fervour out of the equation.

It is actualy a good habit to get into to upsize the database and then run the Database Tuning Advisor wizard in SQL Server to see what the recommendations for your database are.

T

The_Doc_Man · Aug 11, 2008

tokoloshi, I agree with your earlier comment. Normalization comes about because you are in the mind-set to normalize. You are accustomed to looking for the little hidden "gotcha" that leaps out at you because you did or didn't normalize that field.

George, hate to tell you this, but there IS such a thing as overnormalization. It is a truly real-world phenomenon, too. It is getting better, but there were times in the past when I worked on a system that had a limit on the number of express or implied (nested) JOINs it would support. We had to denormalize tables because the product limited us. As we go forward now in modern databases, the number of participants in a JOIN can go much higher. But when I started in my current location, that was decidedly NOT the case.

There are some tricks of normalization, though, that help sometimes. For instance, when you have a couple of dozen options, a drop-down needs to be based on a table and that will of course lead to a JOIN. However, if you only have a very few options, maybe three or four tops, a HARD-CODED list might save you one of those JOINs and some code in a function to display a string based on a stored code might save the JOIN. I can't tell you all of the tricks we had to play, but I know we once had a 15-way JOIN running. When we told the vendor's sales rep, he visibly paled before very quietly asking if that really worked... and he was asking about his own product!!!!

I will say this: Normalization is a good thing. It saves space. It saves time. I can tell you why, but you might not like the answer. It has to do with hypergeometric domain minimization. If you have to look for all the addresses to which you send newsletters, you can have an address table or you can have a subscriber table that was too badly denormalized to let you split out that data. Which table do you think is bigger? Now the simple "DUH" factor - it takes longer to search through bigger things. That statement is SO elementary as to be self-evident. Yet people routinely make bigger, more badly denormalized tables because they don't see the cost of it.

If you normalize, the space taken up by the field you just split out to become a FK makes the search domain smaller. Smaller domains = faster searches. When it is a multi-indexed search you are searching through many "dimensions" of that database, and that is where the hyperdimensional search issues kick in.

Therefore, back to the thread's original question... Is it mandatory to normalize a database? Not if you have all the time and money in the world to waste by searching a big, ugly, awkward, poorly-organized semi-disjointed pile of dyspeptic dragon droppings. But for most of us poor mortals who have construction, space, and performance constraints, normalization is a GREAT idea.

Mike375 · Aug 11, 2008

Could you normalise this..or perhaps it is normalised

Maintable has details of a prospect for telemarketing. There are about 12 possible outcomes for each call such as "unavailable", "no answer", "appointment made" etc and etc. The results of each call with time/date and so on form a record in a Many table. All OK so far

HOWEVER, in the prospect's record in the maintable there is a field for each outcome plus 5 date felds for RingDate1, Ring2, Ring3 4 and 5 with date/time. The values in "outcome" fields change with each call. Thus Ringdate1 might have an entry in the "unavailable" field (a 1 is entered and RingDate2 might mean an entry has gone into MakeAppt field (a 2) etc. The second call might also have resulted in "unavailable" and so a "2" would go in that field replacing the "1" from the first call. The third call attempt might be "recorded msg" and a "3" would go in that field.

The data on the prospect's record in the maintable is for display to the telemarketer (he can see the history of calls) and it is also used to select prospects for the call list. For example, any call that results in...unavailable, no answer, phone engaged, recorded message will not come back to the calling list for another 7 days. The difference between the date/time fields is also used for selection.

I tried earlier this year to see if I could get what I wanted without the fields in the maintable, but I could not do it. Subforms are no good and display is one of the problems.

It is not normalised from the point of view of null fields as many prospects don't go past one call. Wrong Number for example or an appointment made on the first call. However, all the fields are a description of the prospect. As an example a call could result in "Put Off", that is, phone back next month or whatever. Thus such a call should ultimately go through to gaining an appointment or a "wasteoftime". Thus if wastetime or makeappt are null and it is "x" time since the Putoff result then the call has not been followed through.

georgedwilkinson · Aug 12, 2008

The_Doc_Man said:
George, hate to tell you this, but there IS such a thing as overnormalization. It is a truly real-world phenomenon, too. It is getting better, but there were times in the past when I worked on a system that had a limit on the number of express or implied (nested) JOINs it would support. We had to denormalize tables because the product limited us.

I gotta agree with you on this. It's one of the reasons I never latched on to MySQL, though they have come light years from back then.

My solution to your correction though is that I don't use products that don't work right.

Banana · Aug 12, 2008

georgedwilkinson said:
My solution to your correction though is that I don't use products that don't work right.

[nit]But nothing *actually* follows SQL standard! Every major vendors breaks standard so they can extend some fancy newfangled feature![/nit]

I kid, I kid.

That said, the snot on my nose is too wet for me to say anything about history of SQL standard- would anyone else think vendors' breaking standard now & then has been beneficial or harmful in helping us moving toward normalization & general good database design?

Mike375 · Aug 12, 2008

George,

This is the problem as I see it.

If the call results are only in the Many table then that would be like having a table with only LastName as the one field and FirstName was a record in a Many table as was DOB, Height, Weight, postcode etc...each a different record in the same Many table.

Now you want to pull up the person based on parameters of postcode, height etc.

The call results being entered in the prospect's MainTable record are to my way of thinking no different to firstname, middlename, date of birth etc in that the are a description of the prospect. Although unlike name and DOB etc they are changing.

georgedwilkinson · Aug 12, 2008

Bob has moved this exercise into a new thread, so as not to hijack this thread.

The new location is: Mike & George's normalization exercise

Rabbie · Aug 12, 2008

It seems to me that some degree of normalisation is essential. I would not like to work on a database that was not even in 1NF. Even trivial reports would be extremely difficult to program.

I think that the level of normalisation required depends on the complexity of the application. I would regard it as desirable that each piece of information is held only once in the tables - if nothing else this makes updating the data simpler.

The_Doc_Man · Aug 12, 2008

George, on the fine point of saying "I don't use products that don't work."

You're talking to us through Windows and IE, right? 'nuff said. If it's Vista, WAY more than 'nuff said.

Now, as to my correction, all too often you find out that a particular product has this little limit of X joins (counting nested JOINs in a query/sub-query environment.) To turn around and say "this product doesn't work right" because it imposes a limit on the number of JOINs you can have is a bit harsh. If you publish an application that allows you to cascade combo boxes to manage a complex search, would YOU like it if your customer said, "Your product doesn't work because it doesn't give me enough options to add to my search via drop-downs."

Every proper JOIN means you have to track an index. (I'm assuming you aren't trying to do Cartesian JOINs.) That index takes up memory buffer space. While we would like to think that we can grow the application's allocated memory as much as we wish, that isn't realistic.

I remember when a four-way JOIN was enough to bring a machine to its knees. Now, 15-way JOINs are quite possible if still not common. Besides, if we work for a company known for its ... frugality, shall we say? Anyway, if we don't personally supervise the purchase of the utility programs, we work with what the boss bought. And I'm not going to tell my boss that he bought the wrong thing. But if I did, he'd say (like Tim Gunn on Project Runway) "Make it work."

tokoloshi · Aug 12, 2008

Normalising Mandatory?

One of my clients have several database that grow by about 500gb per day. Yep, I mean .5 Terabytes.

The size of these databases give one some interesting insights. The first is the need for normalisation during transaction processing and the second is the need for denormalising during analysis.

When you are looking at capturing and storing data and you want to optimise the performance during that stage, normalise as best you can then run some tuning wizard against the operational environment and see if you can gain additional performance gains from tweaking the schema any further.

It is actually very eye opening to see what the impact a good design can have on a DB of about 100,000 records. This includes the correct indexing and constraint definitions.

Another client I have has a DB that grows by a gig or so a month. They get down to the point that the address table is normalised right down to the street number. They partner with local government agencies and ensure that the department responsible for roads provides them with all of the road names. They talk to the municipalities and make sure that they get the address details right down to the suite, apartment number, floor, building, street, suburb, city, province...

Many accounting systems I know have no need to store the invoice contact person of your suppliers names as firstname, lastname. They leave you with a single field that you can use for free form entry.

THe number of joins, the complexity of the queries, all of these are par for the course. I encourage the use of visual designers because they help to get the right joins in the right places with very little effort.

The interesting difference between using a join instead of a where clause is the fact that the DBMS knows that the join implies the use of specific index hints and knows that the lookup is going to be restricted to 2 indexed keys that will probably both be found on a single pagefile on the harddrive. This means that the search is an indexed search.

Also, during transactional processing data entered using forms enters FK values for you automatically if the parent child forms are correctly designed.

So there are major advantages to a well designed DB during the OLTP stage. Note: I am emphasising WELL DESIGNED, not necessarily 5NF or higher. Choosing the best design implies a mindset that looks for optimum performance, but production databases should always be run through some form of Data Tuning Advisor like the one that ships for free with SQL Server.

On the other hand, OLAP applications that need high level reports drawn from 15 joins on a frequent basis can cause unwanted table scans and can cause your server to take a performance hit. It is better when designing OLAP applications to consider writing the data into a set of query tables or even into a seperate MDB file, which is going to be used purely for Report Writing.

Here your application can have all of the benfits of performance and ease of use from a lazy developer's perspective. In fact, going down this particular road can be pretty exciting, especially since it is during the OLAP stage of the dev cycle that we get a chance to look at fact tables, dimension tables and cubes. We also get exposed to a completely different thought process since we are constantly de-normalising for performance.

Durig the OLAP stage we also look at generating the data for reports during overnight runs so that the data is up to date in the morning and so that the reports can be ready to roll as if it was preloaded.

De-normalised data is the norm during OLAP stages.

And yet, even here we find that we can gain a lot from running an OLAP database through something like the Database Tuning Advisor.

Many of my clients have SQL Profiler constantly running on a client PC, tracking performance of every query, every table scan and every join. They baseline their performance and are constantly looking for ways to squeeze that extra nansecond out of their queries.

The degree of normalisation is never discussed. It is assumed that you have developed the correct mindset before starting the design work and then you have undertaken to measure and evaluate your performance constantly so that your can constantly enhance the schema.

I think that this approach should be followed, even if it is a database of only a couple of hundred records. The question is not about the size of the database, but rather the mindset that needs to be strived for.

Sean

Pat Hartman · Aug 13, 2008

If your database is growing by 500gb per day, something is wrong with how it works. It sounds like a combination of bulk insert/deletes and embedded SQL.

tokoloshi · Aug 13, 2008

Pat Hartman said:
If your database is growing by 500gb per day, something is wrong with how it works. It sounds like a combination of bulk insert/deletes and embedded SQL.

Nope, the client is one of the top 3 users of SQL Server internationally. Their data is squeezed for every ounce of performance and they have 20 DBAs - all of whom have solid experience and formal training in everything from optimisig and tuning databases and queries to implementing, managing, designing databases for performance.

They have several applications, some of which grow by about 500Gb per day others only grow by a couple of 100Gb per month.

Puts DB design into perspective for me.

Sean

Rabbie · Aug 13, 2008

The_Doc_Man said:
George, on the fine point of saying "I don't use products that don't work."

You're talking to us through Windows and IE, right? 'nuff said. If it's Vista, WAY more than 'nuff said.

Most products work. Very few if any work perfectly. Perhaps I been lucky but my experiences with Windows XP Professional and Access 2003 have been largely trouble free. Any issues with the DB have been due to my mistakes and not problems with the software.

Do you see normalisation as mandatory? (1 Viewer)

AWF VIP

Access Hack by Choice

AWF VIP

split with a cherry atop.

Access Hack by Choice

AWF VIP

AWF VIP

Registered User.

Immoderate Moderator

Mike375

Guest

AWF VIP

split with a cherry atop.

Mike375

Guest

AWF VIP

Super Moderator

Immoderate Moderator

Registered User.

Super Moderator

Registered User.

Super Moderator

Similar threads

Users who are viewing this thread