Seven Diversified Questions.

ions · Oct 10, 2009

Dear MS Access Expert,

I have a slew of questions here. Please feel free to answer the one you wish to address.

1) According to this article you can speed up split MS Access applications by having the backend open as a VBA object. I wanted to get your opinion or experiences with this technique.

http://www.fmsinc.com/MicrosoftAccess/Performance/LinkedDatabase.html

2) According to this article decompile should reduce the size of your .MDB. I tried it yesterday and it didn’t reduce the size of my .MDBs. I use MS Access 2003. Should I decompile on a regular basis (to supposedly reduce size) or is this technique useful only when I have unexplained gremlins in my VBA code?

http://www.fmsinc.com/MicrosoftAccess/Performance/Decompile.asp

3) Autonumber as Primary Keys. I find this subject confusing as I experts have different opinions on this. Most support autonumber primary keys but others such as John L. Viescas has an entire section in his book “Building MS Access Applications” titled “Avoid Artificial Primary Keys in Tables.” There are strong arguments from both sides. Please comment about your experiences.

4) DLL files. What are the folders / directories the VBA compiler looks for DLL files before giving up? Is it System, System32, and the local folder which the .MDB resides in? What are DLL files? Is it compiled code? Are they written in a specific language? Why do developers choose to have DLL files instead of VBA Modules for certain solutions. For example I use Stephen Lebans’ mousehook.dll. Why didn’t he simple write this in a VBA module, what advantages did DLL give him?

5) Error Handling. I have read that it is good practice to put error handling in every sub, procedure and function. Why would I put an error handler on something like Me.mycontrol.setfocus? Isn’t this overdoing it? Also I often let child subroutines bubble up their errors to the parent child routine. I don’t see a problem with this as long as there is a customized error message eventually in the Stack call? Your thoughts?

6) Refrain from using Exit Function / Sub. I often use Exit function / sub in cases where the user selects vbNo in a message box. I don’t consider this bad style and I find the code easier to read vs. deeply nested code. What is your opinion?

7) Closing Objects and Setting to Nothing when finished. I don’t explicitly close my Recordsets or set them to Nothing at the end of subroutines. I thought the VBA garbage collector does this? Certain sites recommend closing Recordsets but not necessarly explicitly releasing the object with Set obj = Nothing. Can I continue my practice of not closing and releasing? This does save time, space and makes code shorter.

Uncle Gizmo · Oct 10, 2009

>>>3) Autonumber as Primary Keys. I find this subject confusing<<<

It is very confusing I agree. For that reason I haven’t really studied it enough to give you a definitive answer, however I can tell you what I practice.

I “always” include an ID column in my tables, which gives each record a unique ascending ID. I realise that the nice list of numbers, which is normally concurrent can be damaged when a record is deleted. This is not a problem because all I ever use the list for is to get an idea of when a particular item was entered.

You can’t argue with this unique ID, it will always be unique to that particular record in that particular table. When that particular item in the table is referred to from any other table or form; in your database you know that you will get the correct result returned.

I said “always” .... well, ..... when I’m developing the database I “always” have one of these ID fields, however once the database is finished then I will go through and I may remove this ID from some tables, those tables in particular are normally the ones which provide data in subforms, in other words each record is related by a key in another table.

There is no need for a unique key, unless you need to know the order the items were entered into the table, and in any case I understand you should not use the ID column for this purpose, although I very often do myself!

ions · Oct 10, 2009

Thanks Uncle Gizmo. I also use the Autonumber ID as a primary key but after reading John L. Viescas “Avoid Artificial Primary Keys in Tables.” it opened up some questions in my mind.

Banana · Oct 10, 2009

ions said:
1) According to this article you can speed up split MS Access applications by having the backend open as a VBA object. I wanted to get your opinion or experiences with this technique.

http://www.fmsinc.com/MicrosoftAccess/Performance/LinkedDatabase.html

Has worked well in my experience. The idea is that by keeping a persistent connection, Access doesn't have to recur the cost of opening & closing the connection.

2) According to this article decompile should reduce the size of your .MDB. I tried it yesterday and it didn’t reduce the size of my .MDBs. I use MS Access 2003. Should I decompile on a regular basis (to supposedly reduce size) or is this technique useful only when I have unexplained gremlins in my VBA code?

http://www.fmsinc.com/MicrosoftAccess/Performance/Decompile.asp

This is what Micheal Kaplan has to say about the decompile switch.

3) Autonumber as Primary Keys. I find this subject confusing as I experts have different opinions on this. Most support autonumber primary keys but others such as John L. Viescas has an entire section in his book “Building MS Access Applications” titled “Avoid Artificial Primary Keys in Tables.” There are strong arguments from both sides. Please comment about your experiences.

It would help if you had noted one or two important why John L. Viescas argued against artificial primary keys. It's hard to respond if I don't know the reasons, of which I'm very positive John L. Viescas has very good ones.

It should be further noted that this is not a new debate, a debate unique to Access; this has been argued for very long time in fact. If you do a google search on 'natural keys vs. surrogate keys' you will find many, many hits on both sides for different products.

My inclination is this: On a matter of probabilities, a surrogate primary key is likely to be smaller than a natural key; there is a performance advantage in comparing a pair of integer as opposed to a pair of strings so for that reason, a integer surrogate key will almost alway win out in effectiveness of comparison at expense of extra space consumed by extra index (Note that what would have been had natural key would still have to be indexed when we need to do queries upon those columns... client's name for instance).

4) DLL files. What are the folders / directories the VBA compiler looks for DLL files before giving up? Is it System, System32, and the local folder which the .MDB resides in? What are DLL files? Is it compiled code? Are they written in a specific language? Why do developers choose to have DLL files instead of VBA Modules for certain solutions. For example I use Stephen Lebans’ mousehook.dll. Why didn’t he simple write this in a VBA module, what advantages did DLL give him?

It depends, actually. In case of an early binding, (e.g. you add a reference to a specific dll file via VBA editor's Tools -> References), then there is no need to look up a path; the binding compiles the path so it's 'hard-coded'. In case of late binding, where we call say, CreateObject("Excel.Application"), Access actually makes a API call to the Windows registry to retrieve the actual path to the DLL based on the ProgID (e.g. "Excel.Application"). One even more way is to use a 'Declare xxxx' statement to reference a specific function contained within a DLL. In that case, there are default search paths but even so, you can enumerate full path in that statement, though most code you find normally won't do this because it usually references Windows API, which may be in System, System32, and maybe few more locations.

DLL need not be written in a specific language. It can be any languages. However, it does have to conform to certain specifications. One example is COM, which is the specifications Access uses to automate Excel among others and can consume any other COM-aware DLL. Thus, the advantage of using DLL over VBA is when one need a wider range of functionality or performance that VBA cannot offer. For example, I used two .DLL files written in C# to support cryptography functions in an Access project. Doing it in VBA would require that I re-write the logic in VBA and suffer several performance penalty because VBA is not compiled language; it's a interpreted language so it's very slow in comparison to any compiled language.

5) Error Handling. I have read that it is good practice to put error handling in every sub, procedure and function. Why would I put an error handler on something like Me.mycontrol.setfocus? Isn’t this overdoing it? Also I often let child subroutines bubble up their errors to the parent child routine. I don’t see a problem with this as long as there is a customized error message eventually in the Stack call? Your thoughts?

I believe Luke Chung at FMS Inc. did say error handler should be in every 'top-level procedure'; the first procedure that get called before other procedures are called. So if clicking a button invoked a Click event, which calls a procedure A, then B then procedure B calls C and D, you only really need error handling in Click event.

That said, it may make sense to have local error handling in certain procedure where you know there will be problem and you don't want it to bubble up to other procedures, especially if the error shouldn't cause the whole thing to fail.

6) Refrain from using Exit Function / Sub. I often use Exit function / sub in cases where the user selects vbNo in a message box. I don’t consider this bad style and I find the code easier to read vs. deeply nested code. What is your opinion?

The real problem has to do with having an consistent entry of exit. For example, if we had a cleanup code that closed objects, set some environment settings back to normal, maybe unlock some controls, turn mouse pointer from hourglass back to pointer, etc., etc. we need a one place for all those code. Adding Exit Sub/Function would bypass those cleanup and cause more problem. This is why when studying error handling, you'll see error handling routine return to the 'Exit' part :

Code:

Public Sub Something

On Error Goto Oops

'Code here

ExitSub:

'Clean up here

Exit Sub

Oops:

'Error handling here.

GoTo ExitSub

End Sub

So at minimum, we may want to substitute Exit Sub/Function with a GoTo ExitSub so we only have one exit point. Even so, I don't really like adding arbitrary execution to code. Nesting in a proper If/Then or Select Case block will have just the same effect without the extra line for Exit Sub/Function or Goto ExitSub.

This would be less of problem had we had a proper Try/Catch block in VBA but that's my pipe dream and I don't think Microsoft will be obliging me anytime soon....

7) Closing Objects and Setting to Nothing when finished. I don’t explicitly close my Recordsets or set them to Nothing at the end of subroutines. I thought the VBA garbage collector does this? Certain sites recommend closing Recordsets but not necessarly explicitly releasing the object with Set obj = Nothing. Can I continue my practice of not closing and releasing? This does save time, space and makes code shorter.

Read this. Don't know if this will satisfy you but that's what we know right now. Do, however, note that ADO memory leaks and Excel instance being orphaned are very well-documented and for that reason, a consistent habit of closing & setting to nothing will help help those problems, especially if a new bug arise.

gemma-the-husky · Oct 10, 2009

i see banana has already given some detailed technical stuff with more knowledge than i have on Q4 & 5- however here's a couple of extra thoughts.
also some comments on the autonumber key.

4. DLL libraries

dynamic link libraries - so
a) ready compiled, so faster and
b) design details are kept private

i think they can be located anywhere in the path environment variable, but i am not 100% sure. I am not sure exactly what the requirements are to get them registered in access, so they show in references.

5. include error handlers anywhere an error may arise - which includes every process dealing with tables/queries/recordsets/file io etc - maybe a general backup catchall error handler is also a good idea - alkthough the problem is that a general error handler may not be able to recover gracefully.

3. an autonumber pk is generally useful as a foreign key, as it is a longint, and generally the most efficient variable type available.

however, on its own, (and i have had this argument on this forum) an autonumber field is almost certainly not sufficient to describe the table, as by definition, any record can be inserted into the table, since access will allocate a PK for it. Therefore in all but the most exceptional cases, there will be another field, or combination of fields that represents the true key (generally unique, but not necessarily) for the table. for this reason it can be argued that the autonumber key is superfluous, in true database terms - but not superfluous in practical performance terms.

ions · Oct 10, 2009

Thank you for all the feedback from everyone.

1) I want to confirm that this works in a multiuser split MS Access Applications without problems?

2) Banana the article by Micheal Kaplan warned against the overuse of /decompile because it was not a fully tested by Microsoft. However, the article was written in 1999 ... is his warning still legitimate?

3) Thanks for the comments from everyone in regards to this. I will use my judgement and think twice before jumping to an autonumber PK. John L. Viescas' major points for avoiding Autonumber for PK that I recall of the top of my head are that 1) Searches and Sorts on Combo-Boxes bound to PK (First Column is hidden) don't work as inteded for the End-user. 2) When you have a relationship among tables that is more than 2 levels deep, using artificial keys masks the true identity of the parents rows causing you to make extra joins in your queries. 3) Gemma's argument.

4) As Gemma mentioned I didn't know you could add DLL's to references list. I will try it. I am guessing the benefit of doing this is that early binding is faster than late binding? Do you get the intelisense too with DLL?

5) I think this is a good rule: Have Error handling wen tables/queries/recordsets/file io etc is used. Have a general catchall as well.

6) Banana your answer made sense, when there is clean-up to be made have one exit point.

7) Banana from that interesting thread I got the conclusion that Closing and de-referencing your DAO objects is NOT necessary when using the latest patches with DAO. However, it is still good practice for many reasons.

Uncle Gizmo · Oct 10, 2009

>>>warned against the overuse of /decompile<<<

I knew de-compile was an undocumented feature, however I didn’t know this was because Microsoft never finished testing it, Interesting.

I seldom used de-compile these days, I am much happier copying all of the database objects into a new database. It probably takes a little longer than de-compile, however it does give you the opportunity to create a new database with only the latest copies of the necessary components.

Banana · Oct 11, 2009

ions said:
1) I want to confirm that this works in a multiuser split MS Access Applications without problems?

Yes. Its actually assuming a split database in fact.

2) Banana the article by Micheal Kaplan warned against the overuse of /decompile because it was not a fully tested by Microsoft. However, the article was written in 1999 ... is his warning still legitimate?

As far as I know, the decompile switch remains an undocumented feature and nothing has changed so yes, his warning is still legitimate.

3) Thanks for the comments from everyone in regards to this. I will use my judgement and think twice before jumping to an autonumber PK. John L. Viescas' major points for avoiding Autonumber for PK that I recall of the top of my head are that 1) Searches and Sorts on Combo-Boxes bound to PK (First Column is hidden) don't work as inteded for the End-user. 2) When you have a relationship among tables that is more than 2 levels deep, using artificial keys masks the true identity of the parents rows causing you to make extra joins in your queries.

Thanks. I've not had problem with sorting upon bound combo box with surrogate keys; I wonder if he's referring to user-driven sorting rather than sorting being done in the rowsource query. FWIW, I don't allow users to sort or filter upon a form, especially not against ODBC backend.

As for identity being masked yes that's a problem but given that on a form, I can just grab keys and use combobox/listbox rowsource to retrieve the text value, its not that big problem and actually is effective that way, especially when lookup tables are local.

4) As Gemma mentioned I didn't know you could add DLL's to references list. I will try it. I am guessing the benefit of doing this is that early binding is faster than late binding? Do you get the intelisense too with DLL?

Yes and yes provided the DLL conforms to COM specifications and expose an interface. I have a sample database in the sample db section demnostrating how to consume .NET library using COM and it discuss early and late bindings.

As for points 5 to 8, looks like you're on the track.

ions · Oct 11, 2009

1)

>>Yes. Its actually assuming a split database in fact.

I wanted to ensure it doesn't cause problems with concurrent users. I will try it and see if it speeds things up.

3)

>>I wonder if he's referring to user-driven sorting rather than sorting being done in the rowsource query.

Yes he was referring to user-driven sorting. I don't usually enable users to do this either.

4) Why do most programmers not early bind the DLL then if it is faster and provides intelisense? Is it because late binding makes deployment easier?

4b)What is COM? Windows Standard?

4c) Also I think I understand the difference between compiled and interpreted code. Compiled is already converted into Machine Language before runtime while interpreters check syntax at runtime and then convert to Machine Language one line at a time. Please correct me if this summary is inaccurate.

However, I thought when you Compile your code in the VBA IDE it is compiled? Or does VBA IDE Compile just make sure your syntax is correct and then the interpreter converts the VBA to Machine language at runtime a line at a time?

How about MDE? Is that compiled code or interpreted?

4d) When I compile C++ code for example does the C++ compiler convert the code to the Chip Codes/ Language provided by Intel? I know there are three layers or more ... C++, Assembly Language, Machine Language (Intel Chip Codes????)

I have asked many questions today. Thanks to everyone for their responses.

Banana Congratulations again on surpassing the 5K mark.

Peter.

Banana · Oct 11, 2009

ions said:
1) I wanted to ensure it doesn't cause problems with concurrent users. I will try it and see if it speeds things up.

It's actually the same thing you're doing when you link a backend to enable concurrency; the only difference is that instead of having Access doing the job of opening/closing connection, you're forcing Access to open and keep it open for the session's lifetime, but the process is basically same as what you usually do when you split then link the backend.

3) Yes he was referring to user-driven sorting. I don't usually enable users to do this either.

Thanks. I'm sure there may be case where user-driven sorting is necessary. That said, even in those cases, I'm reasonably sure we can find a workaround that isn't as network-intensive as the built-in sorting provided by Access.

4) Why do most programmers not early bind the DLL then if it is faster and provides intelisense? Is it because late binding makes deployment easier?

Versioning is a major issue. Someone may need to help me out in case my memory is playing trick on me, but here goes: Suppose you developed an application referencing Excel 12 library (early binding) then distributed to workstation that didn't have Excel 12 library but rather Excel 11 library. (e.g. no Excel 2007; just Excel 2003) In this case, you will get runtime errors because it can't find the Excel 12 library. Late binding circumvent this problem because we use a ProgID which is same for all versions and thus always select the most current library that's available on the workstation.

4b)What is COM? Windows Standard?

Component Object Model is more like Windows API. It's basically a specifications (much like ODBC for example but for libraries rather than databases) detailing how should a library can expose itself, how it can help communicate with other libraries or applications and make its methods & objects available for other uses. .NET is a newer framework that should supersede COM eventually but for practical purposes, COM is still much in use.

4c) Also I think I understand the difference between compiled and interpreted code. Compiled is already converted into Machine Language before runtime while interpreters check syntax at runtime and then convert to Machine Language one line at a time. Please correct me if this summary is inaccurate.

However, I thought when you Compile your code in the VBA IDE it is compiled? Or does VBA IDE Compile just make sure your syntax is correct and then the interpreter converts the VBA to Machine language at runtime a line at a time?

Well, it's a bit more complicated than that. I would venture to guess (but I do not know this for certain) that it's alot like Java. You know how Java is supposed to be 'Write Once, Run Anywhere', right? Well, Java does this by having virtual machine that runs compiled Java code. That's why you need to install Java to run any Java applications. VBA is much like Java in that it creates p-code, which is compiled state but it still has to be run by an interpreter at the runtime, just like Java needing its virtual machine to run the code. C/C++ code simply run right on the hardware so it's very close to the metal, so to speak but of course, you can't run a compiled C/C++ on anywhere but the host computer that it was compiled for. (Having a C/C++ source code, OTOH, allows you to compile it for any other machine, provided it has all required supporting libraries & includes to make compilation in different environment possible)

How about MDE? Is that compiled code or interpreted?

All it would mean is that there's no source code. Just p-code, but it still has to be run by VBA's "virtual machine" to read the p-code and thus execute it.

4d) When I compile C++ code for example does the C++ compiler convert the code to the Chip Codes/ Language provided by Intel? I know there are three layers or more ... C++, Assembly Language, Machine Language (Intel Chip Codes????)

Remember that computer works in binary only. That's the only thing it actually understand; a sequence of bits in on/off state, no more, no less. However, it's up to the programmer/manufacturer to decide how a certain sequence of bits has to be interpreted. Thus, for any chip, there's an accompanying instruction set, which are just primitive operations such as OR, AND, XOR, and the likes directly operating upon the bits/bytes.

Assembly language is just a conventional representation of the instruction set so we don't have to actually read the binary. With a compiler, it converts source codes into machine code that is understood by the specific chip. This C code will run anywhere where a C compiler exists:

Code:

int main() {
   printf ("Hello, World!");
   return 0;
}

But its executable file generated by the compiler is going to work only specific chips; the executable compiled on Intel chip probably won't run on AMD chip, PowerPC chip, ARM chip, and same is true for any other chips. It also is partially why programs can run into problems with endianness or unexpected instructions when it attempts to run programs that was compiled in different environment.

ions · Oct 11, 2009

That was a good explanation Banana, specifically the part where you compare Java Virtual Machine to the VBA "virtual machine".

I just want to confirm then. The DLL file is actually binary code that is directly processed by the computer chip? You are getting as close to the metal as possible with a DLL.

>>the executable compiled on Intel chip probably won't run on AMD chip,

Why do most DLL (i.e Stephan Lebans mousehook.dll) not specify what chip they are compiled for then? Is his DLL purely for Intel Chips which is what my machine has? Stephan does provide the source code for the DLL
http://www.lebans.com/mousewheelonoff.htm

Banana · Oct 11, 2009

Well, I think some of my explanation was inaccurate & oversimplifying things.

With DLL, it is very probable that yes, you're using compiled code and thus closer to the metal than from VBA, but you need to remember that even with a DLL, we're dealing with many, many layers of abstractions. COM is one such layer, .NET another layer, and this can be on top of other layers (e.g. Windows MFC classes for example).

The C code I showed you above runs anywhere because it uses only C standard library, and the C standard library is very small. When we want to use Windows functionalities or hook into a custom hardware, then we need to include those extra library, and of course you can't run the DLL without the appropriate includes. Thus a COM-compatible DLL would actually have several includes that may themselves contain even more includes, and that's what binds this to Windows and thus can't be run in other OS. To lesser degree, same is true when trying to port an program from one flavor of UNIX to other UNIX (e.g. HP-UX to Solaris porting may only succeed on the condition there is no missing includes or at least are added along in the process of installation).

As for the chips, I'm doubting that what I'm saying is accurate as that's pushing the boundary of my knowledge. I know that every chip has its own instruction set, and even chips in same family may have different instruction sets. However, the point of having a compiler in first place was to free the developer from the burden of re-writing the program into each possible instruction set. Now when I think about that, I'm actually fuzzy on that point; when you compile a piece of code, it should be able to run for same operating system on similar hardware. But was that because it depends on the OS to handle the low-level tasks or because of the includes?

I went and tested something. I compiled a 'hello, world!' in C on gcc in my bash shell and moved it to my Windows VM and found that I couldn't run it at all. So my earlier assertion was incorrect; even though it only had one include and that was stdio.h which is a standard feature of C library; even that include couldn't be moved from my bash shell to Windows, at least not without me re-compiling the same source code inside Windows.

That said, I do think it's safe to say Windows does magic to keep out the need to differentiate between Intel and AMD chips (same for Mac OS X with its PowerPC and Intel & ARM) but if you look carefully, there are few programs that are chip-specific, usually drivers or maybe some high-end application such as SQL Server that wants to make advantage of unique feature of a certain chip family (e.g. Itanium for example). Knowing that Lebans was quite a brilliant guy, I'm sure he knew that specifying the chips wasn't required and thus didn't specify as such.

I'll dig this question a bit deeper. Welcome anybody else's corrections as well.

The_Doc_Man · Oct 11, 2009

When talking about .DLL files and "chip codes" - you are already looking at it wrong. A DLL file is compiled to some level or standard of Intel chips, true, but you pick one. Your CPU's hardware abstraction layer (HAL) then gets involved when you try to do something that the machine doesn't actually do, by providing a TRAP for the instruction that causes an emulation. Which is why there are many HALs (one for each machine/chip) but one Windows distribution kit.

When discussing the synthetic vs. natural keys, it is nearly religious in nature and therefore impossible to speak with authority. On the other hand, you will get many to speak with fervor.

I am a believer that these days, within broad limits, natural keys are OK but if you have performance issues, synthetic (auto-number) keys are useful because, as was correctly mentioned, they are often smaller. Where you come into problems with synthetic keys is when you try to normalize them. I'm gonna get deep here, pull on your waders, OK?

[rhetorical]What is a prime key in a normalized table? [/rhetorical]

A PK is a field in a record for which all of these properties apply.

1. No other record in the table has that same value in the key field, i.e. the property of UNIQUENESS. This uniqueness means that a copy of a given record's key can be a foreign key from another table.

2. Every other field in the record depend exactly and only on the PK, i.e. the property of APPROPRIATENESS (if you prefer, RELEVANCE - but see the later discussion on this really ugly little property.)

3. Unless the nature of the thing represented by the record actually changes, the PK is INVARIANT over time. I.e. records represent things. If the thing doesn't change, its specific PK can't change either.

4. The PK is expressed in a way that is ADEQUATE to identify the single record and no other no matter how many records exist, because the range of the PK field matches the range of item being represented.

OK, so how do you decide on a PK? First, look for fields that could possibly be acceptable for these criteria. Uniqueness and Adequacy are primary parts of this choice. Invariance is a preference for things that don't change over time themselves. And you would really like relevance. This comes from the old philosophical belief that in some senses, the map is the territory. It is also where arguments start and where synthetic keys rear their ugly heads.

Let's try a couple of candidate key situations.

A. XYZ, Inc. has employees. Each employee has an ID number based essentially on the order in which they were hired. In their employee table in their personnel database, this employee ID number is a candidate key. Let's look at it. UNIQUE? Check, presuming there is a tie-breaker for two employees who start the same day. RELEVANT? Clearly, since it is some property of the employee that is being recorded. INVARIANT over time? Since the order in which you were hired is a function governed by the past, it is as close to invariant as you wish. ADEQUATE? Since we didn't specify the size of the number, we could use a LONG for this and still have room for a LOT of employees. So yes, ADEQUATE. Conclusion? In the employee table, the employee ID is a good choice for PK.

B. XYZ, Inc. sells things based on an inventory stock number. Call it an SKU for lack of something better. The SKU has two parts, a category and an item number within category. The SKU will be a compound PK for which the combination's properties must be considered together rather than the properties of the members of the combination. OK... UNIQUE? As long as you never duplicate item numbers within categories, yes. RELEVANT? OK, because the SKU actually relates to the object. INVARIANT? Short of doing a major overhaul of your representational scheme, probably. ADEQUATE? As long as you allow enough storage for all categories and for the foreseeable number of ites, OK. Therefore, SKU is a reasonable choice for PK.

C. Company-wide, XYZ, Inc. can have spot investigations. These investigations can lead to having someone written up for some negative condition. Or a positive one, just to be fair. What can we use as a PK? How about date? If two investigations can occur in the same day in the same department, the date is not such a good candidate. This is a case where it might make sense to use an autonumber key. OK, let's analyze this choice. UNIQUE? Autonumbers in Access are guaranteed to be unique. INVARIANT? Yes, once assigned, that number won't be re-assigned. ADEQUATE? Autonumbers use LONG fields, so that's over a billion inspections. You tell ME if that is enough. But likely it is. Which leaves us with RELEVANCE. Here, we didn't have a key that was clearly relevant to the event it represented. Here is where the religious discussion kicks in.

On the one hand, you would like to make the PK for each table have some meaning so that when examining the table, it is intuitive as to what that record represents. Helps you debug and makes it clear that much quicker as to what something represents.

But on the other hand, if you have to generate a key, it has no more (and no less) meaning than the employee ID. See, the EmpID concept is merely a synthetic key to which relevance was added after the fact. What about the SKU? Same, really. One part has meaning but the other part is merely a glorified autonumber variant. It gains meaning once it is assigned.

On that Inspections table, you might find it possible to build a compound key that has some meaning, but the randomness of the inspection process would merely complicate things. The more components in your compound key, the worst it is to manipulate. At some point you just give in and build a synthetic key - and then never apply meaning afterwards.

Where has this led us? Perhaps to this conclusion (and you are allowed to disagree)... a PK should be "natural" (have meaning) when a property of the represented record makes tht PK field have meaning. A PK should be synthetic when there is no relatively simple combination of natural fields that can meet the requirements for key candidacy.

You should always have a PK when you are going to look up something; i.e. when the PK in your table will also be an FK from other tables. Whether that PK will be relevant is dependent on whether you could identify relevant candidate keys.

So, ... synthetic or nature keys? Depends on whether the data set cooperates.

The argument over PK as natural vs. synthetic will be waged forever (I'll wager). I'm a pragmatist rather than a purist. So I'll use whichever type of key fits. The purists can argue more later.

Uncle Gizmo · Oct 11, 2009

The_Doc_Man said:
A. XYZ, Inc. has employees......

Hi Doc man,
This is my experience with this sort of key.

Sage gives a customer or unique ID which is in the form of a string and most people represent the customer by using some of the characters which form the customer’s name.

Let’s take if for example if you were using Sage and you had a customer “Microsoft” then the unique ID four Microsoft could well be something like: MSOFT.

Now after several orders have been placed, it is discovered that the person entering the data entered: NSOFT by mistake, a common keying error. Records have been generated based on this key, therefore it cannot be easily changed!

However if there was a hidden auto-number key then it could be changed at any stage. That’s the advantage I get from using an auto-number as a key.

gemma-the-husky · Oct 11, 2009

docman said this, which is sufficient for the discussion of PKs

A

. XYZ, Inc. has employees. Each employee has an ID number based essentially on the order in which they were hired. In their employee table in their personnel database, this employee ID number is a candidate key. Let's look at it. UNIQUE? Check, presuming there is a tie-breaker for two employees who start the same day. RELEVANT? Clearly, since it is some property of the employee that is being recorded. INVARIANT over time? Since the order in which you were hired is a function governed by the past, it is as close to invariant as you wish. ADEQUATE? Since we didn't specify the size of the number, we could use a LONG for this and still have room for a LOT of employees. So yes, ADEQUATE. Conclusion? In the employee table, the employee ID is a good choice for PK.

all of the above but add another one
SUFFICIENT - NEVER - see below

the underlying question to this though, is how does the employeeid (PK) come about. now the company could allocate this sequentially, note that and the number itself may not be continuous, as a lot of payroll systems include a check digit in allocating numbers.

so the company sequence may look like
123457
123463
123470

where the last digit is a check digit based on a hashing function - and therefore the numbers are unique, but not in an intact sequence.

this is OK, but when you think about it, this is no different to just using an autonumber, and letting the computer identify the number allocated as each record is added

THE REAL PROBLEM

So, the REAL problem is nothing to do with id numbers per se, but to do with representation of the real world situation. Why did we decide we had a new employee, so that we needed to set up a new employee.

So lets upgrade this to a employee recording system for a multi-national company with 100,000 employees - not a small company with 20 employees.

Now when you add a name you need to know that it REALLY DOES need adding - ie you dont already have it - bearing in mind you probably have a payroll dept that doesnt know the employees, and bearing in mind that in the UK there is now a requirement to verify that employees are are all legal, etc.

So how do you do this
name? 1. you may already have several John Smiths recorded
name? 2. you may misspell the name and set up a new record by accident - or it may be spelled wrongly on the setup form
name? 3. then there's the business issue. if you employee a guy, and he leaves, and then comes back - does your system design mean you re-open the old record, or create a new one
name/address? it might be the same guy moved house
name/NI no - you may have miskeyed the NI no, or you may have the same NI no recorded against multiple people.

so its all this sort of issue - you ALSO need a key on the file based on something like surname/name/NI No to ensure uniqueness - and even this doesnt deal with the system that says if a guy leaves and comes back, he needs two separate records - so the real unique key might need to be surname/name/startdate - and even this may not be sufficient

hence the numeric PK is fine and very efficient to use to match with other tables such as detailed timesheets, or identifying which user did what (much easier than repeating the multi field key), but you do also need the real world underlying key to ensure integrity of the database. (and you also get the benefit that the realworld identiifer can then be amended if necessary - while it is much harder to change the PK)

but its this real world identiifcation problem that results in national databases being so difficult to maintain - because they have tens of millions of records, which makes it virtually impossible to check things by inspection - and they also need to deal with people dying. so now human errors like miskeying data becomes a real difficult probalem to deal with - All of which makes it very difficult to get a national NI (Social Security) database to confirm an apparently simple thing like how many people are working in a country.

---------
its just the same with a customer's database - how do you manage accounts being setup as follows

AB Contracting
AB Construction
AB Construction Ltd
AB Construction (2009) Ltd

how does your system confirm that these accounts genuinely represent different customers, and are not just different sales guys being sloppy. A formal system might request that every new account needs a customer letterhead, say - then the person setting up the new account can enter the absolute confirmed trading style/name - but in my experience, not all systems are applied with such rigour.

Seven Diversified Questions.

ions

Access User

Uncle Gizmo

Nifty Access Guy

ions

Access User

Banana

split with a cherry atop.

gemma-the-husky

Super Moderator

ions

Access User

Uncle Gizmo

Nifty Access Guy

Banana

split with a cherry atop.

ions

Access User

Banana

split with a cherry atop.

ions

Access User

Banana

split with a cherry atop.

The_Doc_Man

Immoderate Moderator

Uncle Gizmo

Nifty Access Guy

gemma-the-husky

Super Moderator

Similar threads

Users who are viewing this thread