Advice on proper normalization (1 Viewer)

MrHans

Registered User
Local time
Today, 12:14
Joined
Jul 27, 2015
Messages
147
Hi all,

I need to store details from objects.
I will store many different objects, potentially up to 1 mil.
Each object has 30 to 40 properties and each property has a value.

So I created 1 objects table so that each object gets an ID.
Then I created 1 properties table that contains an ID, ObjectID, Property and value.

Is this correct? Or should I create a seperate table for each property as well? The properties are now repeated which is not normalized, but 40 seperate tables also doesn't make sense...

And what would be the most efficient way to retrieve certain properties from a specific object?

Lets say I have Object A and I want to retrieve the values of property 1, 3, 5, 6, 7 and 8
It doesnt make sense to make 6 different dlookup calls based on the object ID right?

Thanks in advance!
 

theDBguy

I’m here to help
Staff member
Local time
Today, 04:14
Joined
Oct 29, 2018
Messages
21,357
Hi,


If you create a table of "objects" and include fields for each property like property1, property2, property3, ... property40, then you would have a "repeating group," which would be against Normalization rules. This is the same even if you actually use the name of the property. For example: Height, Width, Length, Volume, etc.


However, sometimes, depending on your data and needs, this may be acceptable. For example, a table of "clients" might have fields like: homephone, workphone, cellphone, and fax.


But for an object with 30 to 40 properties, I think you are starting out in the right direction. You might use the following structure:


tblObjects
ObjectID, PK


tblProperties
PropertyID, PK
PropertyName


tblObjectProperties
ObjectPropertyID, PK
ObjectID, FK
PropertyID, FK
PropertyValue
 

MajP

You've got your good things, and you've got mine.
Local time
Today, 07:14
Joined
May 21, 2018
Messages
8,463
I agree with TheDBguy. I would add to table properties a field to hold the data type. All values will have to be text. So you will need someway to do conversions in code.
 

MrHans

Registered User
Local time
Today, 12:14
Joined
Jul 27, 2015
Messages
147
Thank you guys, I will modify the structure as proposed above. It makes sense.

How about the retrieval part in this structure?
Lets say I need to retrieve the values of 6 properties from a certain object, would that be 6 dlookup calls or is there a fancier way?

I would need to find the Object ID, then the Property ID and finally the value of that specific property.
 

Cronk

Registered User.
Local time
Today, 22:14
Joined
Jul 4, 2013
Messages
2,770
Unless a particular Property is associated with more than one Object, the PropertyValue field should be stored in the Property table.


If the Properties will be re-valued over time, the value should be stored in a separate table, either linked to the Property table, or to the join table if the Property has a different value when it is associated with different Objects.
 

theDBguy

I’m here to help
Staff member
Local time
Today, 04:14
Joined
Oct 29, 2018
Messages
21,357
Thank you guys, I will modify the structure as proposed above. It makes sense.

How about the retrieval part in this structure?
Lets say I need to retrieve the values of 6 properties from a certain object, would that be 6 dlookup calls or is there a fancier way?

I would need to find the Object ID, then the Property ID and finally the value of that specific property.

Let’s say we want to retrieve the height and weight of an object, we can create a query joining all three tables and simply use a criteria. For example,

Code:
SELECT ObjectName, PropertyName, PropertyValue
FROM ObjectTable
INNER JOIN ObjectPropertiesTable ON...
INNER JOIN PropertyTable ON...
WHERE PropertyName In(“height”,”weight”)
Or if you knew the IDs, then yes, use the IDs in your criteria.

I’m not sure why you think you need to use DLookups.
 
Last edited:

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 06:14
Joined
Feb 28, 2001
Messages
26,996
If you need it in a query, theDBGuy's suggestion is probably as good as any. You could ALSO use VBA and a simple two-table join of object to property with filtration, an order-by clause in the properties table, and a find-first in a recordset operation. But that WOULD be a lot slower. But still a lot faster than six DLookups. The query WOULD be the fastest but could be awkward depending on how much you wanted to retrieve at once.

The only thing I want you to consider is this: You said you might have up to a million objects and 30 to 40 properties each. OK, if you have a LONG / Autonumbered PK on the objects themselves plus a name. How long is that name? You have 4 bytes for the autonumbered PK field and 20 or 25 bytes for the name. That's 29 million bytes for the object table.

You then have 30 to 40 million entries for the properties table. Again, LONG for the FK that is your pointer to the objects table. If you encode things like the units of that property and a code to identify the property in question, you might end up with 4 bytes for the FK, 4 bytes for a "property code", and 20 to 25 bytes for the text representation of the property. (More bytes if necessary.) If you have some long-winded property values then 40 million of them might reach over 1 Gb. (25 x 40 million).

Obviously, because Access uses "sparse" storage, not all properties will occupy the maximum amount of space - but it is worth monitoring. There is no doubt that Access can hold that much - it can manage a 2 GB back-end file. When you are dealing with a potential gigabyte of data, you need to include a regular backup regimen to protect yourself.

Next comment: Thinking about the idea of using a code for properties, that's not a bad idea. After all, you can't expect to have a million different properties. A million different values? Probably. But properties? I'd be surprised if you had more than a couple of hundred, counting some fairly esoteric ones among that mix. One thing I might do a little differently than suggested by theDBGuy is I would add a "units" indicator field to the property names table. I.e. if you HAVE a height, the property table would have three fields: The code for that property, the name for that property, and the units for that property.
 

MrHans

Registered User
Local time
Today, 12:14
Joined
Jul 27, 2015
Messages
147
Goodmorning guys,


Sorry for the silence on this question, I have been working on creating this table structure and the required function to import and normalize the data this way.


So I now have the 3 tables as the dbguy proposed and included the data type value in the properties table as MajP suggested.


tblObjectProperties
ObjectPropertyID, PK
ObjectID, FK
PropertyID, FK
PropertyValue


Now, I'm still struggling to retrieve the data properly.
Lets say each object has a 'Brand' and 'Type'. These are PropertyID's 1 and 2.
How would I get a count of the combination of these 2 properties?


Brand A - Type 1 - 15
Brand A - Type 2 - 12
Brand A - Type 3 - 7
Brand B - Type 1 - 19
Brand B - Type 2 - 24


I was thinking about a Concat function, but this doesn't sound very efficient.
I guess there should be a way to do this with a single SQL statement?


Thank you in advance.
 

June7

AWF VIP
Local time
Today, 03:14
Joined
Mar 9, 2014
Messages
5,423
Another 2 cents worth. I went the other way with my db. I have Object properties (i.e. attributes) as fields (Weight, Length, Height, etc). As far as I am concerned, this is akin to the 'attributes' of a person - Lastname, Firstname, Birthdate, Gender, etc. which are normally their own fields. Another example: Automobile is the object and its attributes are Make, Model, Year. You said every property will have a value so there would be no empty cells then this is a normalized structure, even if there are 40 fields.

What I would consider a 'repeating group' would be something like: Weight1, Weight2, Weight3, etc. (which I have used quite effectively in spite of non-normalized nature).

It is a balancing act between normalization and ease of data entry/output. Normalize until it hurts and denormalize until it works.

BTW, Type is a reserved word. Should not use reserved words as names for anything. http://allenbrowne.com/AppIssueBadWord.html

If you want to provide db for analysis, follow instructions at bottom of my post.
 
Last edited:

MrHans

Registered User
Local time
Today, 12:14
Joined
Jul 27, 2015
Messages
147
Hi June,


Thank you for the response.
Yes, the objects could be considered as Automobiles and it has properties like Brand, Type, Year, Color etc.


Now my Object Properties table has
ObjectID
PropertyID
PropertyValue


1-1-BrandX
1-2-Type1
2-1-BrandY
2-2-Type2
3-1-BrandX
3-2-Type3
4-1-BrandZ
4-2-Type2


Did I pick the wrong normalization structure?
 

June7

AWF VIP
Local time
Today, 03:14
Joined
Mar 9, 2014
Messages
5,423
Maybe.

I built table to try and get the count you described. If I understand your data correctly, in my structure the query would be:

SELECT Count(*) AS CountRecs, Brand, Type FROM tblObjects GROUP BY Brand, Type;

With your structure, the following all produce the same output but the last one looks best:

Query 1:
TRANSFORM First(tblObjectProperties.PropertyValue) AS FirstPropVal
SELECT tblObjectProperties.ObjectID
FROM tblObjectProperties
WHERE (((tblObjectProperties.PropertyID) In (1,2)))
GROUP BY tblObjectProperties.ObjectID
PIVOT Choose([PropertyID],"Brand","Type");

Query 2:
SELECT Count(*) AS CountRecs, Brand, Type FROM Query1 GROUP BY Brand, Type;

I tried to nest those 2 queries but Access doesn't like it. The following 2 are nested.

SELECT Count(*) AS CountRecs, Brand, Type
FROM (SELECT DISTINCT tblObjectProperties.ObjectID, DLookUp("PropertyValue","tblObjectProperties","ObjectID=" & [ObjectID] & " AND PropertyID=1") AS Brand, DLookUp("PropertyValue","tblObjectProperties","ObjectID=" & [ObjectID] & " AND PropertyID=2") AS Type
FROM tblObjectProperties) AS Q1
GROUP BY Brand, Type;

SELECT Count(*) AS CountRecs, Brand, Type FROM
(SELECT ObjectID, Max(IIf([PropertyID]=1,[PropertyValue],Null)) AS Brand, Max(IIf([PropertyID]=2,[PropertyValue],Null)) AS Type
FROM tblObjectProperties
GROUP BY ObjectID) AS Q1
GROUP BY Brand, Type;

There may be other data manipulation that would be easier with your structure but if this first stab at data aggregation is any indicator, I think not.

Your structure might be preferable if properties are frequently changed (added or deleted). It prevents having to change table structure to delete/add fields, reduces need to change queries, forms, reports, code. Also, tables and queries are limited to 255 fields, so if there were more than 255 properties, my design would have to split the properties to multiple tables (showing all those properties in a single row for each object would be difficult in either design, maybe use a VBA function like Allen Browne's ConcatRelated). So, if the set of 40 properties will rarely change after the db is in full deployment, I vote for my design.

This looks like a nice tutorial on normalization https://www.studytonight.com/dbms/database-normalization.php
 
Last edited:

MrHans

Registered User
Local time
Today, 12:14
Joined
Jul 27, 2015
Messages
147
Thank you June, could you briefly explain a bit more 'your design' ?
How would you structure it?


I guess a tblObjectID containing an ID and name for the object.
And a tblObjectProps containing PropertyID, ObjectID, Brand, Type, Color, Share, Weight, etc, etc up to 40 ?


For now I have identified about 50 to 70 different properties per object.
I could reduce this to the ones I would potentially ever need, but it would still be around 30 I guess and once implemented those would indeed rarely change.


For me speed and efficiency is most important.
Like I initially stated, it could potentially be up to 1 million objects with each object having about 30 properties.


Thanks again!
 
Last edited:

jdraw

Super Moderator
Staff member
Local time
Today, 07:14
Joined
Jan 23, 2006
Messages
15,361
MrHans,

Can you show us a couple of examples based on
" For now I have identified about 50 to 70 different properties per object".

I think the Units field suggested by Doc is a good idea for inclusion in your table.

June,
I agree with you re the Normalization link you gave.
 
Last edited:

MrHans

Registered User
Local time
Today, 12:14
Joined
Jul 27, 2015
Messages
147
Hi Jdraw,


Yes

This shows you the first 50 fields or so.
This can be extended with a 'Date Acquired' field and a 'Date Last Modified' field which would obviously change over time...

Additionally for Fuel information and Frame specifications, additional datasets are available, increasing the number of available specifications up 70 and more for one just object.



But like I said, I could also reduce the number of specifications to the ones I would most likely actually need, but on the other hand it's easier to just download whatever there is.
 
Last edited:

June7

AWF VIP
Local time
Today, 03:14
Joined
Mar 9, 2014
Messages
5,423
My design would be one table for objects and their attributes, 1 record per object, exactly as presented on that web page. I do see a lot of empty cells, which is something I can and have tolerated for the sake of ease of data entry/output.

However, there could be 40+ lookup tables depending on the data. So instead of saving the words (Blue, Green, Yellow, etc) for Color attribute field, save a key that refers back to the Colors lookup table. And that leads to another factor to keep in mind: there is a limit on how many tables and joins can be in a query. I seem to remember hitting that limit of 32 tables.

So now I hope you have enough info to base decision for choosing structure.
 
Last edited:

MrHans

Registered User
Local time
Today, 12:14
Joined
Jul 27, 2015
Messages
147
Clear, thank you very much June7.

I agree that it will be much easier to retrieve the data when I store it horizontally, like it shows on the source.

Back to the drawing board it is... :)
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 06:14
Joined
Feb 28, 2001
Messages
26,996
With 70 separate attributes, the JOIN would almost certainly reach a limit if you had 70 different attribute tables. Using an object table as the parent and an enumeration table of each attribute's name (or code) and value, you can have 1000 distinct attributes. The mental image is harder but has fewer limits.
 

Users who are viewing this thread

Top Bottom