Matching logic

Here is supposedly an updated version, but in some cases better and some not.
http://www.utteraccess.com/forum/Soundex-2-t2019164.html

From what I read Soundex was meant for single family surnames. It was not meant for first and last. Now looking at the code and results I can see why you get some strange results. I think the space is the reason that exact matches were not matched, but not sure why that is because you should get no value for the space.

Soundex is the name given to a system for coding and indexing family names based on the phonetic spelling of the name. The code consists of the first letter of the family name, followed by 3 digits representing the first three phonetic sounds found in the name. Similar sounding family names have similar Soundex codes. Soundex is used to index individuals for US census and other purposes. Anyone wanting to use US census data to find their ancestors will need to become familiar with Soundex.

However, my idea would be to create a new function where it splits each word and gets the soundex for each word. Then concatenate them together. I will give that a try as well.
 
Responding to post #20 as I didn't see #21 until I posted...

I agree with your comments about Soundex.
If you type in two names, Soundex usually gives a very close match.
In my opinion its often then the best of the 3 methods
But only typing in one name won't work because of how Soundex values are constructed. It needs at least 4 consonants (excluding H,W,Y)

It would be good to get the closest matches as is the case for the other methods though I'm not quite sure at the moment how best to achieve that.
Soundex values are like D415 where D is the first letter.
Perhaps match the letter and get the smallest differences in each of the 3 numbers for search string and field searched. Could be very messy to do?

In case it helps, Albert Kallal did a macro based version for web databases: http://www.kallal.ca/searchw/WebSoundex.htm though I've never used it
 
I tried what I was suggesting and seems to fix the problems for strings with spaces

Code:
Public Function MajP_Soundex(varText As Variant) As String
  Dim rtn As String
  Dim aText() As String
  Dim i As Integer
  If Not IsNull(varText) Then
    aText = Split(Trim(varText), " ")
    For i = 0 To UBound(aText)
      rtn = rtn & Soundex(aText(i))
    Next i
  End If
  MajP_Soundex = rtn
End Function
 
Biggest thing I noticed with Soundex is that it relies on the first leter.
So Kathy, Kathie, Kathey are all K300
And Cathy, Cathie, and Cathey are all C300
So Cathy and Kathy will not match.

Seems to me you could do better pretty easily. I am not a linguist, but I would think you could check the first couple letters and pick the phoenetic sound. So check the first two letters and if it was C, vowel or k, vowel then return (pick one, either c or k). This could get real involved, but I assume there is something more robust. I would think for example you could replace every Chr (chris) with Kr.
 
Soundex is already grouping letters that can sound alike
E.g C, K, X all Soundex= 2.
Vowels and a few consonants such as H are ignored because they can sound very different according to the word ...so are ungroupable.
In other words its already using a phonetic pattern match.
In the middle of a word Chr and Kr both give Soundex 26

I mentioned the first letter in my previous reply.
You could try checking for the same first letter or for those to be in the same group which covers the likes of Cathy / Kathy or Chris/Kris

There should be no need to trim out spaces as anything that isn't in one of the six letter groups is ignored.
However title should be removed as e.g. Mrs Any Name will always start with M62 and probably better to use last name then first name

After further thought, my idea of looking for Soundex values that are as similar as possible but not identical, is probably a non starter
Cathy C300; Candy C530; Candice C532 Kathy/Kate/Katy K300
 
There should be no need to trim out spaces as anything that isn't in one of the six letter groups is ignored.

True if you are looking for single name (which is what designed to do), but also explains why it does not work very well using it against first and last. That is why the wrapper I put around the soundex to build "compound soundex" works way better. Soundex only looks at the first three syllables phoentically. So if the first name is Isabella, you are done. It can find no other differences in the last name. But the wrapper can

Test
Code:
Public Sub SST()
  Debug.Print Soundex("Isabella Smith")
  Debug.Print Soundex("Isabella Jones")
  Debug.Print Soundex("Isabella Joans")
  Debug.Print
  Debug.Print MajP_Soundex("Isabella Smith")
  Debug.Print MajP_Soundex("Isabella Jones")
  Debug.Print MajP_Soundex("Isabella Joans")
  Debug.Print MajP_Soundex("Isabella Johnes")
End Sub

Code:
I214 ' All Isabella ... returns the same thing
I214
I214

I214S530 'The compounds
I214J520  
I214J520
I214J520
 
Is it easy to show the % to the right of the "name"?

BdQI7F4.jpg


I'm afraid you lost me.
 
Yes I agree that in some cases it will help but in other cases the opposite is true
In fact Soundex takes the first letter and the next 3 valid characters (rather than syllables)
So searching for Sue or Neal doesn't give enough letters to make a useful result. In such cases using a second name will give better matches.
But as you say where you have a name like Isabella you have all Soundex needs and more … so the second name is ignored. You will get more matches including a wide variety of surnames. Including the second name means better targeting and fewer matches.
Overall I think if the character match is fairly close, using Soundex or your adapted version is going to be best. If your match is rather woolly, then Levenshtein (sp?) works better.

BTW all methods are pretty useless if titles are included e.g try Missy Elliott.

Here's another thought for Soundex (untested)
Use both names as you have done but count the number of matched values up to a maximum of 8. Then list in descending order of matched counts.
This would give a list as for the other two methods but might be very slow?

Finally, I agree with Frothy. When you've finished 'tweaking' Soundex, please post your example to the code repository or sample databases
 
I added a Find As You type listbox, and I think that gives the best utility. If you think you know part of a first name or part of the last name you can type in and narrow your search. If you do not find anything you can go back a letter and try something else. I will post the update later.
 
Here is an update. I will probably upload one more with the search field being a single word like "last name". Soundex was really designed for doing that so the comparison is not quite "fair". I have now added a separate form to do a Find as you type. I think this is probably has the most utility for a user to look for things and try different combinations, but it would be a good tool to have these different options for searching.
 

Attachments

Thanks for the update. Had a quick look. Good idea to include a FAYT option
 
here's a sample of a search I use which uses an exact match, a soundex and levenschtein distance. You can search by first, last, or both names. I have an updated version somewhere but not here in my office.
 

Attachments

here's a sample of a search I use which uses an exact match, a soundex and levenschtein distance. You can search by first, last, or both names. I have an updated version somewhere but not here in my office.

This keeps digging deeper and deeper into the minds of some of the smartest and most sharing access minds ! :-)
 
I have to ask, what is the MSGBOX of strOut that shows a set of numbers, for?
 
Its the ID values of the matched names! If it finds no matches as in my first attempt, the msgbox returns a 0 (which did confuse me!)
Perhaps it would be better if it said something like '3 possible matches found'?
 
I have to ask, what is the MSGBOX of strOut that shows a set of numbers, for?

Should have had that commented out. It's in Public Function fSearchNames down at the bottom of the procedure. It was just a msgbox to show what rec's were returned when I was testing it. Sorry it was a copy I had in my office and I didnt look before posting. Probably some other leftovers in there somewhere too.
 
Moke:
As for MajP's example, it would be good to post this (or better still your updated version) in either the repository or in sample databases
 
How hard would this be.

Change to a BATCH matching process.

The records in the downloaded Table from the Suppliers would contain (among many other fields):

The Customer of ours which was relayed to the clerk and keyed in by the clerk at the time of purchase

A blank field (bestmatch) to be BATCH filled in by just the #1 Levenshtein Match

A blank field (Bestmatch%) to be BATCH filled in by the TOP Levenshtein match %
 

Users who are viewing this thread

Back
Top Bottom