Matching logic

dcavaiani

Registered User.
Local time
Today, 04:01
Joined
May 26, 2014
Messages
385
Just wonder what ur thoughts or ideas might be on this? When a custom carpentry business worker goes to say Home Depot to purchase materials for customer job keyed in the accdb by the name “stanchina”. There is no prior p.o. Cut here! At checkout, the worker tells the clerk that the parts purchased r for “standchin”. When the download from Home Depot is applied to the database, how can we best get the incorrect workers spelling of the customer name to “match up” to the actual customer name ??
 
Why and what exactly does Home Depot have to have an exact spelling of someone's customer name (your Customer is not necessarily HD's customer). Home Depot would need the Sales/Invoice/receipt info re TransactionID,TransactionDateTime, Credit/Debit info(possibly).
It would seem practical for you or your company to record the HD Invoice details with the "part/material" used in your custom carpentry product/service for “stanchina”.
Perhaps I've missed your point, but it seems the HD customer (you) and your customer (“stanchina”) are separate entities.
 
Why and what exactly does Home Depot have to have an exact spelling of someone's customer name (your Customer is not necessarily HD's customer). Home Depot would need the Sales/Invoice/receipt info re TransactionID,TransactionDateTime, Credit/Debit info(possibly).
It would seem practical for you or your company to record the HD Invoice details with the "part/material" used in your custom carpentry product/service for “stanchina”.
Perhaps I've missed your point, but it seems the HD customer (you) and your customer (“stanchina”) are separate entities.
Home Depot supplies the contractor with a weekly download of all the specifics of everything purchased, including the customer reference that the worker relates to the clerk and which is reported back to contractor in the download.
 
First, agree with jdraw. Second, I need to point out that based on your description of what is going on, there is a design issue here that needs to be at least considered.

Your customer is "stanchina" but your supplier is "Home Depot" and they should be in different tables. Your design is ignoring an important aspect of database layout rules. Question: What do you do when Home Depot doesn't have what you need for your second trip to them, but Lowe's does? And then, what do you do when you needed something quick and found what you wanted at Lumber Liquidators on your third trip? And what do you do when you went to HD because you had TWO jobs under way and bought something for each one? So you have your "stanchina" materials and your "joe blow" materials on one ticket.

Let's be clear - you can make Access do pretty much anything you want it to do. Don't let us stop you from being a bull in a china shop. However, Access is designed as a tool that follows certain database standards and guidelines that were meant to make things easier in the long run even if trickier during initial layout. I.e. pay me now or pay me later. What you are describing in words is not consistent with the optimized design with which Access would work best.

Finally, most HD or Lowe's receipts have THEIR transaction number on the printed paper receipt. No reason you couldn't use a compound key like "Vendor Name" + "receipt number" as the prime key for your job invoice table (assuming you had one).

None of us want to rain on your parade, but if you are in the process of designing this database, it is at a critical stage where mistakes multiply to cause ever-growing headaches as you go.

And finally, the DIRECT answer to your question just so you don't feel TOTALLY heaped upon... read this article on the LIKE operator as a means of comparison:

https://support.office.com/en-gb/article/like-operator-b2f7ef03-9085-4ffb-9829-eef18358e931

LIKE is how you manage inexact matches that are inexact through truncation. Note that if HD screws it up so bad that they turn "stanchina" into "stanton" because of a thumb-fingered idiot typist, you might have a bigger problem that the LIKE operator can't so easily address.
 
I will just assume your design is correct and what you are asking is how to find the most similar string in another field "best match"

You need to Google "Fuzzy Find". There is a lot posted and code available to do this.
One method is Soundex which allows you to find similar sounding strings "Mother" and "Other".
http://allenbrowne.com/vba-Soundex.html
The other methods compare all possible substrings and may give the longest substring or a percentage of match. Levenstein Distance
http://www.utteraccess.com/forum/index.php?showtopic=2015762

I have never done this, but if I get time I am going to try this implementation.

Excel even provides an Add-in to do this
https://www.microsoft.com/en-us/download/details.aspx?id=15011

There are DBs that have this as a built in function
 
@MajP

Looks good. Awaiting your tests. Could be a great sample database.

@dcavaiani

I agree with the suggestion by the_Doc_Man. You need an identifier in your environment that can handle interactions with an external environment. That being the Home Depot or Lowes or 84Lumber etc that supply you a transactionID and TransactionDate on an invoice that you must match in your local environment for matching and processing.
You can see that Doc and I are not contractors with HD "weekly download" experience. It appears that you are at the mercy of the typing and spelling skills of individual clerks for the "customer name". Hopefully you have the parts/materials identified in "your customer/project/request" and can "match" or enter into your local system.
 
Last edited:
Just did one quick test on the function
Code:
' Levenshtein3 tweaked for UTLIMATE speed and CORRECT results
' Solution based on Longs
' Intermediate arrays holding Asc()make difference
' even Fixed length Arrays have impact on speed (small indeed)
' Levenshtein version 3 will return correct percentage
'
Function Levenshtein(ByVal string1 As String, ByVal string2 As String) As Long

Dim i As Long, j As Long, string1_length As Long, string2_length As Long
Dim distance(0 To 60, 0 To 50) As Long, smStr1(1 To 60) As Long, smStr2(1 To 50) As Long
Dim min1 As Long, min2 As Long, min3 As Long, minmin As Long, MaxL As Long

string1_length = Len(string1):  string2_length = Len(string2)

distance(0, 0) = 0
For i = 1 To string1_length:    distance(i, 0) = i: smStr1(i) = Asc(LCase(Mid$(string1, i, 1))): Next
For j = 1 To string2_length:    distance(0, j) = j: smStr2(j) = Asc(LCase(Mid$(string2, j, 1))): Next
For i = 1 To string1_length
    For j = 1 To string2_length
        If smStr1(i) = smStr2(j) Then
            distance(i, j) = distance(i - 1, j - 1)
        Else
            min1 = distance(i - 1, j) + 1
            min2 = distance(i, j - 1) + 1
            min3 = distance(i - 1, j - 1) + 1
            If min2 < min1 Then
                If min2 < min3 Then minmin = min2 Else minmin = min3
            Else
                If min1 < min3 Then minmin = min1 Else minmin = min3
            End If
            distance(i, j) = minmin
        End If
    Next
Next

' Levenshtein3 will properly return a percent match (100%=exact) based on similarities and Lengths etc...
MaxL = string1_length: If string2_length > MaxL Then MaxL = string2_length
Levenshtein = 100 - CLng((distance(string1_length, string2_length) * 100) / MaxL)

End Function

I made a table with 1000 last names. Put "stanchina" in the table and searched for "standchin"

Used this query
Code:
SELECT tblFuzzyData.LastName, Levenshtein("standchin",[lastName]) AS Lev_Distance
FROM tblFuzzyData
ORDER BY Levenshtein("standchin",[lastName]) DESC , tblFuzzyData.LastName;

Top results

Code:
LastName	Lev_Distance
stanchina	78
Shanahan	56
Stanton	56
Stanton	56
Stanton	56
Franecki	44
Franecki	44
Streich	44
Streich	44
Strosin	44
Anderson	33
Bauch	33
Bauch	33
Blanda	33
Blanda	33
Blanda	33
Cassin	33
Cassin	33

Seems like a pretty viable start to me.
 
Code:
SELECT tblFuzzyData.LastName, Soundex([lastName]) AS Lev_Distance
FROM tblFuzzyData
WHERE (((Soundex([lastName]))=soundex("standchin")))
ORDER BY Soundex([lastName]) DESC , tblFuzzyData.LastName;

I tried soundex as well. For this example it returned "Stanton". But commons mispellings/differences it works well.

Type in Braun I get
LastName
Braun
Brown
Bruen
 
I will just assume your design is correct and what you are asking is how to find the most similar string in another field "best match"

You need to Google "Fuzzy Find". There is a lot posted and code available to do this.
One method is Soundex which allows you to find similar sounding strings "Mother" and "Other".
http://allenbrowne.com/vba-Soundex.html
The other methods compare all possible substrings and may give the longest substring or a percentage of match. Levenstein Distance
http://www.utteraccess.com/forum/index.php?showtopic=2015762

I have never done this, but if I get time I am going to try this implementation.

Excel even provides an Add-in to do this
https://www.microsoft.com/en-us/download/details.aspx?id=15011

There are DBs that have this as a built in function
The database design is fine. This response is exactly what I was hoping to learn about.
 
OK, dumb question time. Maybe you don't have a PO# but what about a customer number? Surely tradesman could get that right at HD? AFAIK, either HD or Lowes can match job list items against any reference, be that name or number, regardless if on the same receipt because they have itemized project lists to go along with the receipt.
 
OK, dumb question time. Maybe you don't have a PO# but what about a customer number? Surely tradesman could get that right at HD? AFAIK, either HD or Lowes can match job list items against any reference, be that name or number, regardless if on the same receipt because they have itemized project lists to go along with the receipt.

You have the issue correctly identified. Yes, we have a Customer Table keyed by a Customer Last Name. We Have a Supplier table key by Supplier Name: Home Depot, Menards, etc. We will get an Excel download from our Suppliers with the "reference" and/or the P.O. "number" (NOTED FOR THAT CUSTOMER) that our carpenter worker 'relates to' the Check-Out Clerk when Purchases are made ... The customer NOTED by this process, when the download comes back to us, MAY or MAY NOT exactly MATCH to that "name" in our Customer Table ...

Yes, and as mentioned by another, this "first section" batch of purchases could be for customer "Jones" -and then the rest of the batch could be for customer "Smith".
 
Here is an intersting demo employing the three functions: SoundEx, Leveshtein, Simil.

On the left is a list of 2000 full names. Find a name in the list and misspell it. The soundex works will with common names with multiple spellings like
Aiden: Aaden, Adan, Aden, Aidan, Aidyn, Aydan, Ayden, Aydin
Or Katy, Katey, Katie, etc.
Anderson, Andersen.
They all seem to do well, but depending on the mispelling one approach may do better, and get you close enough to find a match. Levenshtein seems to work better than Simil
 

Attachments

This test database u attached is a very cool and sharing way to empower others. Thanks much!
 
Agree that Levenshtein seems best, then Simil, then soundex on some tests.
Thanks MajP for putting these tests side by side.
If you do something with the Excel Add-on, please post any results.
 
Agree that Levenshtein seems best, then Simil, then soundex on some tests.
Thanks MajP for putting these tests side by side.
If you do something with the Excel Add-on, please post any results.

I tried a bit of the Excel. My first impression is that was a bit “rigid”. I will be spending more time on the test we have been given to try.
 
Thanks for providing this example comparing the three approaches.
I'm finding Soundex gives far fewer hits though what it does return is usually well targeted. However, Soundex doesn't return any results in many instances - not even where I type an exact match e.g. Neal, Adele, Daniel.

For me, Levehnstein seems to give closer matches than Simil
But why is soundex so poor?
 
Colin,

I noticed the same thing --Soundex quite often gives no response. I also note that the Levenshtein and Simil responses often have a quite different order. eg High rank in Levenshtein may be 5th ranked in Simil sort of thing. My guess it that it may be dependent on the language involved -that is German names or Asian names... may have totally different ordering. I know I have heard that comment with respect to Soundex.
 
MajP, you absolutely should put that FuzzyFind database from post 13 into either the code repository or the sample database subforum.
 
The problem with Soundex could be with my query, I need to look some more. The other queries take the top 20 by the Levenshtein or Simil so you will always get a return. In the soundex, I set it up where only the two strings have the same soundex. But that also was interesting to me because it would return strings that clearly should not sound the same. So need to actually look at some soundex results for these anomalous returns. Also I pulled the first version from the internet which was Allen Brownes. There may be a more robust version out there.

I would also add to this a Find as you type listbox. Which may be one of the simplest approaches. You could start typing and as you get closer the list would get smaller and smaller. In the original example with "standchin" and "stanchina", by the time you got to "stan" you would have a pretty small list.

For me this is one of those things that can never be fully automated, but if you provide the user a nice interface they can search and find quickly. I would probably allow the user to search all these ways they should quickly be able to identify the correct record.
 

Users who are viewing this thread

Back
Top Bottom