Normalization / Approach question – arbitrary characteristics (ETA: Entity Attribute Value EAV)

directormac · Feb 6, 2021

Greetings Access Wizards – been many years since I used Access, started to seem like the right tool again LAST February but then… well, you know, and my work shifted radically. Now things are getting back on track at my desk, so I’m again trying to scrape the rust off my brain. Which leads me to the following. I’ve searched around a bit but haven’t found anything that hits this quite right (though maybe I’m not using the right search terms?)

I have a need to track things that have multiple characteristics, any of which may or may not apply to any particular thing. Further, each characteristic can get measured/tagged for in a different way. Just to make things even more fun, at any time a new characteristic may need to be added to the set and applied retroactively, or not, to things already being tracked.

For ease of discussion, let’s say you started to make a list of animals in the forest. And you wanted to be able to record/report on them by things they can do, or are, or such. So you find a parrot, and parrots can fly, so “Can Fly?” is now a yes/no characteristic you want to track for all the animals. Next you find a cat, and you see it’s speedy, so you decide you want to know how fast any animal is, so you add “Top Speed” as a numeric characteristic, fill it in for the cat and go back and fill it in for the parrot. Next you find a whale… and so on.

Now one way to do that would be to add a new field to tblAnimals everytime you needed to add a characteristic, with the appropriate field type (bool/int/txt/etc.) to measure that characteristic:

But that’s very high on the PITA factor and anyway means re-doing a bunch of stuff (forms, queries, reports) every time you have a new characteristic to add. Plus it means the average user can’t add any characteristics and has to wait for the DB admin to do it, causing delays (as well as being high PITA).

A better option all around is obviously to break out the characteristics, almost certainly with a many-to-many join table so we can mix and match as we like.

BUT.

We have an issue, in that all characteristics are NOT created equal. The fact that they are different data types means we can’t put them all in the same place, at least at first glance. We’d need a table for each data type, right? And then put everything together with queries, something like:

That’s going to hit some limitations pretty quickly. For one thing, it doesn’t give us a way to differentiate between “unknown” values vs. 0/FALSE values. For another, it restricts our types of answers to the built-in data types. I imagine there will be a bunch of characteristics for which we’d like multiple-choice metrics, like “low/medium/high”. But we can get around both those by using text fields with validation rules, I think?

I keep wondering, though, if there’s not a more streamlined structure possible. I haven’t gone too far down this road, but I’m thinking something like:

This would require a set of VBA Get and Set functions to enforce answer type integrity and parse the results where needed. Almost every answer type would boil down to either (1) some free value that could be stored as text and converted to another type (e.g., int) by the function as needed; or (2) one or more choices from a list of distinct values (e.g., “yes/no/unknown” exclusively, or “walks/flies/swims/burrows check all that apply) which could likewise be stored as text and parsed by the function. All of which would mean that the DB admin would have to write a new function anytime a new answer TYPE was required, but once the tool is built those occasions should be relatively rare. Meanwhile, common rubrics like “low/medium/high/unknown” could get applied to any newly required metrics freely.

On some level I’ve been working this post out as a way to brainstorm which path to go down. As appealing as I find the unified structure idea (fewer tables), I’m also aware that it’s a lot more complex in setup, and likely to exhaust my meager VBA skills. As I was typing that last paragraph, it occurred to me to wonder: is a hybrid possible, with one set of free characteristics and one set of multiple choice characteristics, each handled by a single pair of set/get functions?

Thoughts/reactions/advice are very welcome. Obviously, this is a design approach that I need to settle on before I go too far down any implementation. I vaguely recall a lot of discussion here a number of years ago around creating quizzes/tests for students that felt applicable when I was working this out, but which I haven’t been able to find an applicable thread for now.

So… yeah… thoughts? Anyone set up something similar with great success?

CJ_London · Feb 6, 2021

that is the way I would do it

your function to return a value would be a variant. e.g.

Function getValue(valStr as string, vType as number) as Variant
select case vType
case 1 'integer
getValue=cint(valStr)
case 2 'long integer
getvalue=cLng(valStr)
etc

in a query, if you number your types 1,2,3 etc you could use the choose function

choose(characteristicType,cint(characteristicValue),clng(characteristicValue)...etc)

but recommend only do this if all the results are the same type (at least number v date v text). It is not a good idea to mix datatypes in a column

your function to verify values would need to check range as well as type (integers have a range +-32k for example, or perhaps text should have max 20 characters) and you might say these ranges would be more limited (max 8 legs for example). So I would include range in your answertypes table as well

A fair amount of my work has been around migrating data from one corporate system to another for my clients. So I have a 'mapping table' that maps all the requirements of the new system. and used to first check data validity and then convert data from the old system so the upload runs smoothly.

The_Doc_Man · Feb 7, 2021

You might wish to search for the combination of keywords Entity, Attribute, Value - which pretty much matches your description of inconsistent attribute sets to a "T"

While reports can be a little bit tricky, EAV models let you store darned near anything about an entity. This is also sometimes called Entry, Attribute, Value depending on who filed it.

gemma-the-husky · Feb 7, 2021

The best way to do it is to store the attributes vertically.

Have a table of animals
1 parrot
2 monkey
3 dog

Have a table of attributes, together with the desired data type
1 can fly y/n
2 can swim y/n
3 size as appropriate
4 gestation months

now have a table of animal attributes
animal attr answer
1 1 Y
1 2 N
1 3 Whatever
1 4 Whatever

now you can add and define new attributes without redesigning the database. You don't have to create every attribute for every animal.
This link shows the available vartypes, so you can ensure the attribute value entered is correct for the attribute. So a user doesn't enter generic text, for an attribute you have defined as a number.

VarType constants (VBA) | Microsoft Docs

[edit - you can add to the attributes table another linked table providing the permitted answers you will accept to each attribute. It depends how you want to do it really.]

directormac · Feb 7, 2021

CJ_London said:
that is the way I would do it

your function to return a value would be a variant. e.g.

Function getValue(valStr as string, vType as number) as Variant
select case vType
case 1 'integer
getValue=cint(valStr)
case 2 'long integer
getvalue=cLng(valStr)
etc

in a query, if you number your types 1,2,3 etc you could use the choose function

choose(characteristicType,cint(characteristicValue),clng(characteristicValue)...etc)

but recommend only do this if all the results are the same type (at least number v date v text). It is not a good idea to mix datatypes in a column

your function to verify values would need to check range as well as type (integers have a range +-32k for example, or perhaps text should have max 20 characters) and you might say these ranges would be more limited (max 8 legs for example). So I would include range in your answertypes table as well

A fair amount of my work has been around migrating data from one corporate system to another for my clients. So I have a 'mapping table' that maps all the requirements of the new system. and used to first check data validity and then convert data from the old system so the upload runs smoothly.

Thanks - I'm going to check out @The_Doc_Man 's suggested research approach and then circle back. My main concerns about going this way are the need to me to write a new function everytime there's a new "answer type" needed, and the complexity of calling get/set every time anything gets handled. I'm sure I'll be looking for more help!

directormac · Feb 7, 2021

The_Doc_Man said:
You might wish to search for the combination of keywords Entity, Attribute, Value - which pretty much matches your description of inconsistent attribute sets to a "T"

While reports can be a little bit tricky, EAV models let you store darned near anything about an entity. This is also sometimes called Entry, Attribute, Value depending on who filed it.

I will definitely look for this - thank you! I'll circle back here when I've dug in a bit.

directormac · Feb 7, 2021

CJ_London said:
that is the way I would do it

your function to return a value would be a variant. e.g.

Function getValue(valStr as string, vType as number) as Variant
select case vType
case 1 'integer
getValue=cint(valStr)
case 2 'long integer
getvalue=cLng(valStr)
etc

in a query, if you number your types 1,2,3 etc you could use the choose function

choose(characteristicType,cint(characteristicValue),clng(characteristicValue)...etc)

but recommend only do this if all the results are the same type (at least number v date v text). It is not a good idea to mix datatypes in a column

your function to verify values would need to check range as well as type (integers have a range +-32k for example, or perhaps text should have max 20 characters) and you might say these ranges would be more limited (max 8 legs for example). So I would include range in your answertypes table as well

A fair amount of my work has been around migrating data from one corporate system to another for my clients. So I have a 'mapping table' that maps all the requirements of the new system. and used to first check data validity and then convert data from the old system so the upload runs smoothly.

Thanks - I'm going to check out @the_doc

gemma-the-husky said:
The best way to do it is to store the attributes vertically.

Have a table of animals
1 parrot
2 monkey
3 dog

Have a table of attributes, together with the desired data type
1 can fly y/n
2 can swim y/n
3 size as appropriate
4 gestation months

now have a table of animal attributes
animal attr answer
1 1 Y
1 2 N
1 3 Whatever
1 4 Whatever

now you can add and define new attributes without redesigning the database. You don't have to create every attribute for every animal.
This link shows the available vartypes, so you can ensure the attribute value entered is correct for the attribute. So a user doesn't enter generic text, for an attribute you have defined as a number.

VarType constants (VBA) | Microsoft Docs

[edit - you can add to the attributes table another linked table providing the permitted answers you will accept to each attribute. It depends how you want to do it really.]

Sorry to be thick, but I'm not sure I see the difference between what you've outlined here and the third approach I described above - would you mind being more specific? I'm intrigued by the [edit] portion especially, is there a way to avoid the complexity / amount of coding all those set/get calls?

gemma-the-husky · Feb 7, 2021

We have an issue, in that all characteristics are NOT created equal. The fact that they are different data types means we can’t put them all in the same place, at least at first glance. We’d need a table for each data type, right? And then put everything together with queries, something like:

I hadn't read to the bottom of your post, Yes, it's the similar idea to what you mention, but you never need to do any programming. It's all driven by the way you set up the data tables.

What I was thinking is that you store all the attribute values as strings. if you define an attribute as being of type 11 (vbBoolean) then you only permit values (strings) -1 and 0, (or maybe Y and N). For other attribute types, you permit appropriate values. All you need is a function to get an acceptable response string, and a reverse function to convert the stored string into the correct format. So a Zero against an attribute might mean false, might mean a value of Zero, or might mean no date entered, depending on the specific attribute.

If someone adds a new attribute that they want to be a date, all they need to do is specify the data type eg 7 (representing a vbDate)
Attribute 27, Last Gave Birth, Type 7(vbDate)

Then when someone enters a string value for this attribute, all you need do is validate the string as a legitimate date. Your programme could check the data type of the attribute, and prompt them accordingly.

If you want to be completely specific you can limit the responses the user can enter.
So Attribute 28, Diet, Type 8 (vbString)

But instead, Attribute 28, Diet, Type 3 (vbLong)

now instead of allowing the user to enter any numeric value, (or any string) you have a set of permitted lookup values for attribute 28
28, 1, Omnivore
28, 2, Carnivore
28, 3, Herbivore
28,4, Piscivore

You don't have to reprogram anything. It's just all managed by tables that define the permitted responses, where necessary.
It's also not quite as defined as the related tables you showed. In most cases, you don't need a related table with response choices. The response choices are generally dictated by the data type.

this sort of thing. Whether you allow values to be entered as blank/null is up to you.

Code:

Select case attributedatatype
case 3,vbLong : if permittedvaluelistexists then
                               select from combobox
                          else
                               accept any integer number
                          end if
case 6,vbCurrency: accept currency
case 7,vbDate: accept a date
case 11,vbBoolean: accept Y/N
case 8:,vbString: accept a string
end select

CJ_London · Feb 7, 2021

My main concerns about going this way are the need to me to write a new function everytime there's a new "answer type" needed

don't think you understood my suggestion, you just need the one function, the rest is driven by data and sql. There are a limited number of data types so you would use a case statement in vba (per gemma's suggestion) or the access sql equivalent using the switch function

the complexity of calling get/set every time anything gets handled.

not sure why you need these

Pat Hartman · Feb 7, 2021

I created an application using an entity/attribute schema which is where you are heading. I'm going to post two pictures. One of the main data entry form and the second of the form that is used to define the characteristics of a policy. This app is for insurance policies. Each type of policy has different data requirements and so the company had the same problem you are experiencing. How to add new policy types without having to create new tables/forms/queries or having to modify existing ones. Where you are going off the rails though is you need to group your subjects. So think if various classes of animals as different policy types. I picked the word "class" because it is around the middle of the classification schema. I don't know what makes sense to you but you want to pick a level where you can define most of the characteristics you want to track. You may want to to as low as Animal Families or even Genus since I doubt that you actually need to cover the entire animal kingdom. But species may be too specific.

Think of that level as your "policy". Look at the pictures below and post back if you want to discus the schema I used or even have a GTM where I can show you how it works. My customer paid a lot of money for this app so I cannot post it but I can describe how it works.

directormac · Feb 12, 2021

CJ_London said:
don't think you understood my suggestion, you just need the one function, the rest is driven by data and sql. There are a limited number of data types so you would use a case statement in vba (per gemma's suggestion) or the access sql equivalent using the switch function

not sure why you need these

Thanks to you and to @gemma-the-husky - combining responses here. I've been reading up on EAVs where I can--of course once I knew what the silly thing was called I've found a lot of background reading out there, still trying to absorb/understand...

I can see how, as you two outline, I could have two functions to rule them all, a set and a get that use a data types table to apply the right rules for the particular attribute being called. Not totally clear on how to handle enums/list choices, leaning towards an additional tblEnumDefinitions that would store all the list definitions, I think I could work that out.

Where I'm still fuzzy is how to USE those attributes in meaningful ways, once the information is in the tables. Let's say I wanted a report to spit out all the flying creatures. Or a summary value of how many herbivores there are. It feels like (so far), I'm going to have to create some pretty ugly queries.

In re: this:

the complexity of calling get/set every time anything gets handled.

Click to expand...

not sure why you need these

Wouldn't I need to call the set anytime a value gets changed, to enforce type integrity? And wouldn't I need to call the get anytime I want a value that's not text? Or, I guess, I could type the text value at the time of use, is that what you mean? As long as I trusted my set function to be iron-clad, then I could just use, for example, int(attribute value) at the time I needed to do math on those?

Still researching, still making lots of scribbles on paper, just didn't want y'all to think I wasn't appreciative.

directormac · Feb 12, 2021

Pat Hartman said:
I created an application using an entity/attribute schema which is where you are heading. I'm going to post two pictures. One of the main data entry form and the second of the form that is used to define the characteristics of a policy. This app is for insurance policies. Each type of policy has different data requirements and so the company had the same problem you are experiencing. How to add new policy types without having to create new tables/forms/queries or having to modify existing ones. Where you are going off the rails though is you need to group your subjects. So think if various classes of animals as different policy types. I picked the word "class" because it is around the middle of the classification schema. I don't know what makes sense to you but you want to pick a level where you can define most of the characteristics you want to track. You may want to to as low as Animal Families or even Genus since I doubt that you actually need to cover the entire animal kingdom. But species may be too specific.View attachment 89005 View attachment 89006 Think of that level as your "policy". Look at the pictures below and post back if you want to discus the schema I used or even have a GTM where I can show you how it works. My customer paid a lot of money for this app so I cannot post it but I can describe how it works.

Thanks Pat, that's helpful/inspiring (also, good to chat with you again after my years away!). Forms and reporting are something I'm trying to keep in mind even as I'm sketching out structures, because it's a big factor here, which is one of the reasons I'm concerned about EAV - a lot of my reading so far is saying that EAV is great for getting data IN but not so great at doing things with it.

Adding to the fun, I realize now that since backward-looking reporting is a needful component, I'm going to have to come up with some kind of transactional system as well, so I can say, "last month there were X flying animals, a year ago there were Y", or even "animal #z was 4 pounds in May, 5 pounds in June, and 5.5 pounds in July". I'm not actually doing anything with animals, of course. Some of the reading I'm finding touches on this a bit, with creating multiple EAV intersections that have a timestamp as the 4th key.

I am also considering telling my boss we need to put out an RFP for a real developer

Pat Hartman · Feb 12, 2021

The "output" of the insurance app was word documents. All insurance processes require documents and it is critical that the documents all have the same values in the variables so that was the actual purpose of the app. It didn't have other reporting of the attributes as you would get in most applications. The reporting was about work outstanding. Policies that were pending, documents for which there was missing date and like that.

A "real" developer would certainly be more efficient

but we can help get you started. Maybe you can do it yourself.

gemma-the-husky · Feb 12, 2021

Where I'm still fuzzy is how to USE those attributes in meaningful ways, once the information is in the tables. Let's say I wanted a report to spit out all the flying creatures. Or a summary value of how many herbivores there are. It feels like (so far), I'm going to have to create some pretty ugly queries.

Well, you will know which attributes you need to check.

Flying creatures might be all the animals with attribute 4 set to true.
This might include bats and birds.

Herbivores might be all the animals with attribute 6 set to 3

You might be able to let users pick the attribute they want from a drop down, and then pick the value they want from another drop down that's related to the options in the first.

It's hard to be totally flexible. eg - a user might want to pick 2 types of diet, herbivores and omnivores.

When it gets too complex, just dump (the technical word is output) all the data to a spreadsheet, and then users can select what they want interactively using filters. Much easier for the developer.

CJ_London · Feb 14, 2021

this is a very simple example of what I mean. There is no error handling as such and it does not cater for all eventualities you may require. Open frmAnimals, select attributes as required, put some wrong data in and see what happens. See the control beforeupdate event for how this is handled. See construction of the different tables and the relationships, See the query as to how it might be used and the module for the functions you need.

Normalization / Approach question – arbitrary characteristics (ETA: Entity Attribute Value EAV)

directormac

Occasional Presence

CJ_London

Super Moderator

The_Doc_Man

Immoderate Moderator

gemma-the-husky

Super Moderator

directormac

Occasional Presence

directormac

Occasional Presence

directormac

Occasional Presence

gemma-the-husky

Super Moderator

CJ_London

Super Moderator

Pat Hartman

Super Moderator

directormac

Occasional Presence

directormac

Occasional Presence

Pat Hartman

Super Moderator

gemma-the-husky

Super Moderator

CJ_London

Super Moderator

Attachments

Similar threads

Users who are viewing this thread