Multiple Primary Keys or Composite Unique Index (1 Viewer)

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 22:22
Joined
Sep 12, 2006
Messages
15,614
@galaxiom

What I am saying is that I disagree with the notion that the unique index is necessary in addition to a surrogate key.

I don't want to feed the flames, but I just can't agree with this. Surely a real-world unique index is a-priori essential in all tables. (even if part of this unique index is the surrogate key itself - ie the PK of a subtable may consist of the PK of a parent table - FK in this table - but together with some additional real-world information)

** I say all. I have used tables with no way of uniquely identifying items, and although the system was usable, certain aspects were problematic.

It's not necessarily always clear what that should be, especially in a "persons" table, when virtually nothing can be considered permanent and unchanging. However it is essential that we find a way of uniquely identifying a "person" entity, other than by an artificial surrogate identifier. Otherwise we are liable to admit duplicate data.

No more comments on this from me, anyway.
 

informer

Registered User.
Local time
Today, 23:22
Joined
May 25, 2016
Messages
75
The idea of identifying customers by name and date of birth seems to me far too unrealistic to be worth considering. What kind of business would want to ask their customers for their date of birth? True there are cases where the date of birth might be required for customers, healthcare applications being one example, but those healthcare applications are never likely to use name and date of birth as a key.

Natural keys (composite keys included of course) are incredibly important, even essential to successful database design. Unfortunately the internet is littered with poor examples like yours. That's precisely why the issues are so often misunderstood and misrepresented.

If you have well read my last post, unique index for customer entity is not based on adresse.

To submit to a customer loyalty programm, we need to present an identity card on which we collect different information about birth. An other example, more relevant, for obtaining a mortgage..

And please,ButtonMoon, give us a rich example because your comment is particulary poor.
 
Last edited:

Galaxiom

Super Moderator
Staff member
Local time
Tomorrow, 09:22
Joined
Jan 20, 2009
Messages
12,849
However it is essential that we find a way of uniquely identifying a "person" entity, other than by an artificial surrogate identifier. Otherwise we are liable to admit duplicate data.

As I pointed out in my previous post, the vagaries of how a person identifies themselves can often make Person tables unavoidably problematic. Indexes can only avoid identical entries.

Sometimes we have to rely on the user to some extent.:eek: And given what I have seen users do it certainly isn't ideal. Some will go to incredible lengths to avoid every possible catch. My favourite expression in databases is, "There is no such thing as foolproof because fools are so ingenious.".

All we can do as developers is try to shove potential matches in their faces before they enter new records. Leading horses to water etc comes to mind.

It is especially hard in online databases where the user is the person creating their own record and you can't say, are you sure you aren't "Bill Smith" with DOB and address xyz.
 

Galaxiom

Super Moderator
Staff member
Local time
Tomorrow, 09:22
Joined
Jan 20, 2009
Messages
12,849
Have a look at the function post 10 in this thread.

The function there is far more useful than any unique index.

Damerau-Levenshtein distance is probably the single most useful database tool I have ever found. It allows potential near matches to be ordered. The very day I discovered it I identified dozens of errors in a database that I manage (not my development).

Before that I had already identified and corrected dozens of other mistakes with every query I could think of. The Damerau-Levenshtein distance would have found them all. It is a case where helping someone else with their problem at this site helped me more than I could have ever dreamed.
 

informer

Registered User.
Local time
Today, 23:22
Joined
May 25, 2016
Messages
75
Have a look at the function post 10 in this thread.

The function there is far more useful than any unique index.

Damerau-Levenshtein distance is probably the single most useful database tool I have ever found. It allows potential near matches to be ordered. The very day I discovered it I identified dozens of errors in a database that I manage (not my development).

Before that I had already identified and corrected dozens of other mistakes with every query I could think of. The Damerau-Levenshtein distance would have found them all. It is a case where helping someone else with their problem at this site helped me more than I could have ever dreamed.

Tanks for this information but I don't understand how it works. Could you explain me when you call this fonction?
 

Galaxiom

Super Moderator
Staff member
Local time
Tomorrow, 09:22
Joined
Jan 20, 2009
Messages
12,849
Tanks for this information but I don't understand how it works. Could you explain me when you call this fonction?

It takes two strings as arguments and returns a score for how many changes it takes to get from one to the other. The lower the score the more similar the two strings. The weighting constants assign the value for different types of changes. It is configured for finding typographical errors, so reversing two characters is considered only slightly more than leaving out a character rather than as two changed characters.

I posted an example database in Post 5 of this thread using it to test for similarities between existing addresses in a small table.

When looking for potential matches between a proposed additional records, compare the values in existing records with the textbox values on the form. Order by the score to get the best matches to the top.

When comparing multiple fields at the same time, calculate the score for each comparison and multiply the scores together. This skews the score away from big differences. There are may other mathematical options to combine results. Log could be interesting way to skew the combination of results.

In my case I am not interested in perfect matches which would score zero so I filter them out first. To include perfect matchs, give them a value of less than 1 in the query so that perfect matches bring the score down towards zero when multiplied with the DL scores.

Of course, when comparing addresses, people living at different numbers in the same street would score closely since only the number would be different. This can be adjusted by parsing the address and comparing the parts.

In some cases, the function might be made more powerful by passing the weightings as parameters. Also, consider applying another weighting to reflect the importance of the field being compared. Multiple the DL result by the importance. For example difference in names might more important than differences in street if you want to account for people who may have moved.

It could be made really fancy by working with a table of terms that are considered equivalent, (Will, William, Bill); (St, Street) etc. A value could even be assigned in that table to weight the equivalence of the terms. I would consider dropping out any punctuation marks before comparing addresses.

I have always been impressed at this function's speed. To be honest I have never fully analysed how it works. I just know it produces excellent results.

I included a comment in the function header with a link to the web page where I found it. My only contribution was tidying it up and making it work with Access VBA. It also works with Excel where it can use Excel's built in function instead of the little function by the same name at the end. No doubt that part of it could be done away with and embedded directly in the code but I think it is cool how it works already.
 
Last edited:

informer

Registered User.
Local time
Today, 23:22
Joined
May 25, 2016
Messages
75
You wrote in your post

To compare multiple fields for similarities, apply the function to each field then get the product of all the results

So if a natural key is compound of 5 fields, you have to call 5 times this function and then?

With Access , on which event, do you call this function?

How do you treat the multiple return values to decide if it's a duplicated natural key?
 
Last edited:

ButtonMoon

Registered User.
Local time
Today, 22:22
Joined
Jun 4, 2012
Messages
304
You wrote in your post
So if a natural key is compound of 5 fields, you have to call 5 times this function and then?
In fact you have to compute Damerau–Levenshtein for each pair of values you want to compare, not just for each field (i.e. it's an "edit distance" function). Galaxiom's point I believe is that strict equality (which is what key constraints are about) isn't a very reliable way to compare addresses. I would actually suggest something more than just using an edit distance function. Ideally use address cleansing and deduplication software that understands the semantics of addresses and how to compare them.

The action you take when you find a duplicate depends a lot on the application. Not all duplicate customers need to be eliminated in that way because sometimes multiple membership is part of the implied or actual contract with the customer. Many websites (this one included) allow users to sign-up multiple times. Unless the customer has agreed to terms that prohibit multiple sign-ups then as a service provider you could even be in breach of data privacy regulations if you try to deduplicate your customers (where informed consent is required for using personal data for any purpose).

De-duplication based on name and address may be optional but natural keys are far more important than address deduplication. This site, in common with many others, allows multiple sign-ups but it requires a login name to be unique to each sign-up. I expect the site also requires unique email addresses although I haven't verified that. So login name or email address may be keys for customer accounts. Login names, email addresses or account numbers are probably the most common natural keys for customers and are generally going to be far more useful keys than some combination of name, address or birthdate.
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 22:22
Joined
Sep 12, 2006
Messages
15,614
I think a person table is actually a poor choice to discuss the use of "natural keys, although it is a good choice for a general discussion about data cleansing issues

However one field in a person table that could be used as a unique identifier is an NI Number, or Social Security number. Americans seem to get hung up on this, but it's not an issue in the UK.

So a NI Number is certainly a legitimate candidate to use to determine uniqueness of a "person". NI Numbers are issued by governments to one person only. There are no duplicates. However the fact that there are more NI Numbers than persons is an indication of the issues. People can get issued with more than one NI number. Also children don't get NI numbers.

Another real-world problem is that often we don't have the NI Number at the time we enter the person record. We could also enter it incorrectly. I expect the number has a built in check digit, but we don't know how that works. So an NI number makes a poor choice for a PK. Therefore there are practical reasons why although a NI number is in theory a useful indicator of uniqueness, in practice it is not such a good choice.

I think what Galaxiom is saying that rather than use an index, you could and should test for duplicates and importantly NEAR-duplicates before storing the record. To my way of thinking this is six of one, and half a dozen of the other. The additional tests can be used in conjunction with the unique index.

I still think that it is not reasonable to rely only on surrogate keys within a system. Yes, of course they guarantee relational integrity of all records in the database, but they don't by themselves guarantee compliance with business rules. That's the hard part really. Developing a system to comply with business rules, especially when only a minuscule number of transactions are non-standard. You still have to design for the non-standard cases.
 

Galaxiom

Super Moderator
Staff member
Local time
Tomorrow, 09:22
Joined
Jan 20, 2009
Messages
12,849
However one field in a person table that could be used as a unique identifier is an NI Number, or Social Security number. Americans seem to get hung up on this, but it's not an issue in the UK.

I think you run into legal problems there. We have a TFN (Tax File Number) in Australia. I work in a financial institution where it is very important. Unless the client provides it we are obliged to charge tax on any interest earned at the maximum marginal tax rate and forward that money to the tax office.

However it can only be stored with irreversible encryption so that it can only be compared with the original number and cannot be retrieved. Consequently it certainly isn't practical as a Primary Key.

Moreover, very few Australians could recite their TFN and are not in the habit of carrying it with them. Consequently it isn't very useful as a natural key in the vast majority of databases.

I think what Galaxiom is saying that rather than use an index, you could and should test for duplicates and importantly NEAR-duplicates before storing the record.
Exactly

To my way of thinking this is six of one, and half a dozen of the other. The additional tests can be used in conjunction with the unique index.

I don't see it as such an even race given that there are numerous potential near duplicates and only one exact match.

A composite index on most of the fields in the table becomes a significant overhead. The developer should always compare the costs and benefits of their choices.
 

ButtonMoon

Registered User.
Local time
Today, 22:22
Joined
Jun 4, 2012
Messages
304
one field in a person table that could be used as a unique identifier is an NI Number, or Social Security number. Americans seem to get hung up on this, but it's not an issue in the UK.

In what circumstances in the UK would you use NI numbers as an identifier? I believe Tax and Employment are the only scenarios where NI number is commonly used. I can't imagine many companies (if any) asking their customers for an NI number and I suspect potential customers might be put off if anyone did try that.

Outside of governments and public sector agencies, in the business world we are typically less interested in identifying individual "people". What businesses are interested in is identifying Customers, Contacts, Suppliers, Subscribers, Account Holders, Users, etc. Those things are perhaps proxies for "people" but a single person may be a customer multiple times or may legally hold several accounts, etc. So NI number is not widely used in my experience and doesn't seem terribly useful. What matters more is what identifies the customer, account holder, etc within the business relationship(s) you have with them. The identifier used for that purpose is seldom a NI number I think.
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 22:22
Joined
Sep 12, 2006
Messages
15,614
In what circumstances in the UK would you use NI numbers as an identifier? I believe Tax and Employment are the only scenarios where NI number is commonly used. I can't imagine many companies (if any) asking their customers for an NI number and I suspect potential customers might be put off if anyone did try that.

Outside of governments and public sector agencies, in the business world we are typically less interested in identifying individual "people". What businesses are interested in is identifying Customers, Contacts, Suppliers, Subscribers, Account Holders, Users, etc. Those things are perhaps proxies for "people" but a single person may be a customer multiple times or may legally hold several accounts, etc. So NI number is not widely used in my experience and doesn't seem terribly useful. What matters more is what identifies the customer, account holder, etc within the business relationship(s) you have with them. The identifier used for that purpose is seldom a NI number I think.

I agree. I was pointing out that for an individual, a NI number ostensibly could be a very good unique indicator, but practically would not be useful.

For the same reason, a companies house "company registration number" is a perfect identifier for limited companies, but of no use for non-incorporated entities.
 

ButtonMoon

Registered User.
Local time
Today, 22:22
Joined
Jun 4, 2012
Messages
304
Social Security Numbers and their ilk tend to provoke pointless arguments. I try to avoid discussing them for that reason. Same goes for hypothetical keys based on people's names and addresses.
 

Users who are viewing this thread

Top Bottom