2 Part Question Verifying DB Integrity

sumdumgai

Registered User.
Local time
Today, 09:11
Joined
Jul 19, 2007
Messages
453
Hello all. We have a huge personnel database that needs to be checked for integrity and cleaned up. One table lists all employees, including Emp-ID, Emp-Acct, FName, LName, MName, Addr, City, St, Zip. First question is how can I make sure there are no duplicate Emp-ID's in the table? Second question is how can I make sure no person has more than one Emp-ID? Here are some parameters:
1) I would expect that no one is using an alias so there is only one LName per person.
2) FName, however, can vary for any person (e.g., Thomas, Tom), so a person could have two or more Emp-ID's, each under a different FName.
3) There could be more than one person with same FName and LName in any zip code, but probably not on same street.

Complicated but hope to get some help with this. Thanks.
 
How exactly do you define Emp-Id? Where does it get valued?
What is the primary key of the "Employee" table?

This should show records with same Emp-Id where there are duplicates ---untested---
Code:
Select [Emp-Id], Count([Emp-Id]) from EmployeesTableNameHERE
Group BY [Emp-Id]
HAVING COUNT([Emp-Id] >1

To help review of Thomas, Tom...etc
---untested---
Code:
select fname, mname, lname,[emp-id] from YourTableName a
where [Emp-Id] in
(select [Emp-Id from
  (Select [Emp-id], Count([Emp-id]) from YourTableName
   Group BY [Emp-Id]
   having count(Emp_id) >1))
order by fname, lname
 
Last edited:
Thanks for your help.
Emp-ID is a 10 character text field, assigned to new employees as they come on board. They are not reused. Primary key is system generated 'ID'.

If there are no duplicate Emp-ID's in database, wouldn't your query always return nothing? I'm trying to find all cases where a person might have more than one Emp-ID because they have multiple variations of first name or middle name.

Thanks again.
 
Getting 'Data type mismatch on criteria' with first query.
 
Second question is how can I make sure no person has more than one Emp-ID?

At some point this is going to be a manual process. You can use query to help identify potential dupes, but ultimately its going to require human eyes. Mainly because matching names/addresses requires things you just can't reasonably program for (Robert==Bob, Richard==Dick, a person can have a PO Box and a street address, data gets input incorrectly, etc.)

When I compare names/addresses I first create a query to define matching elements. These will be fields that I will compare between records to see if they are possible matches. Essentially I bring in all the fields from my table and then add a calculated field for every field which truncates it to 4 characters (I exclude the state and only go 3 characters on the zip). That query's data would look like this:


EmpID, FName, LName, Address, City, State, Zip, F4, L4, A4, C4, Z3
15, Steve, Walton, 12 Main, Omaha, NE, 68022, Stev, Walt, 12 M, Omah, 680
44, David, Jenkins, PO Box 444, Denver, CO, 80211, Davi, Jenk, PO B, Denv, 802

Then, I make a new query, bring in 2 instances of the above query, link them via the _4 fields, set the criteria so that the EmpID in one instance doesn't match the other, bring down the fields and run it. that brings up all the potential duplicates.

You should work on a copy of your actual table, so when you find an actual dupe, you delete 1 of the duplicate records (the query will bring back 2 records for every dupe) from the table. Once you work your way through all the records in that query, loosen the criteria. Eliminate the link between Z3 and A4, run it again and see if any records are returned. Continue to loosen/change the links between the two instances until satisfied all the dupes are eliminated.
 
Let's be clear.
A -You may have duplicate Emp-Id.
B -An Employee may have more than 1 Emp-Id. That is different than A.
C -An Employee may have different fname --suggests the question--Why would there be more than 1 record for an Employee. Where would these multiple records enter the system?
D - same as C but for mName.

The code i provided was to find duplicate Emp-Id in answer to your
how can I make sure there are no duplicate Emp-ID's in the table
?

If there are no duplicate Emp-Id values, there will be no records returned.

What exactly is the question you are trying to resolve?
 
Thanks again. To answer your questions:
a) Already verified there are no duplicate Emp-ID's. That's what we want and that's good.
b) We do not want any employee to have more than one Emp-ID.
c) Sloppy data enter or not verifying that employee already had an ID under a different first or middle name.
What query would I use to count records grouped by LName, first character of FName, first character of MName, street and zip? If any count was more than one, wouldn't that give me what I need?
 
Created the query I suggested above and it solves my problem. There are, in fact, employees with multiple ID's, even with identical first name, address, zip, etc. Thanks for your help and suggestions.
 

Users who are viewing this thread

Back
Top Bottom