Find duplicates based in a few characters (1 Viewer)

mkelly

Registered User.
Local time
Today, 00:11
Joined
Apr 10, 2002
Messages
213
I have two tables I deleted the duplicates from each then did a union query based on name.
However the two tables named things different like vermont college vs vermont college inc.
Can I do a find duplicates based on sat the first 5 characters?
If so how.

Thanks in advance
 
Local time
Today, 00:11
Joined
Mar 4, 2008
Messages
3,856
This is a difficult problem. There are a couple of tools that can help but there are few perfect solutions.

I personally use a Jaro-Winkler algorithm for exactly your problem. It doesn't give a black/white answer...more shades of grey. It takes a human to evaluate the answer but it definitely culls the results down to a managable level.

You can also look at Soundex(). Same problem, you won't get an exact answer and will have a lot of mis-matches.

You can also compare using left(), which is in the spirit of your proposed solution. It can still give you an inaccurate reading, however.
 

mkelly

Registered User.
Local time
Today, 00:11
Joined
Apr 10, 2002
Messages
213
Where would I use the left() command??

SELECT First([union duplicates].name) AS [name Field], Count([union duplicates].name) AS NumberOfDups
FROM [union duplicates]
GROUP BY [union duplicates].name
HAVING (((Count([union duplicates].name))>1));

Thanks
 
Local time
Today, 00:11
Joined
Mar 4, 2008
Messages
3,856
You're not going to be able to do it that way. How would SQL know which of the partial matches to use as [union duplicates].name?
 

mkelly

Registered User.
Local time
Today, 00:11
Joined
Apr 10, 2002
Messages
213
If I don't have Jaro or soundex is there any way to do it?
 
Local time
Today, 00:11
Joined
Mar 4, 2008
Messages
3,856
I can provide you with both (just PM me with your personal e-mail address). You'll have to use a different approach, though. They don't work with aggregate functions. Same with left(). This is an insanely difficult problem to overcome using a single standard query.
 

Users who are viewing this thread

Top Bottom