Need suggestions for Data Scrubbing fuzzy logic for mismatch text used in Joins (1 Viewer)

Rx_

Nothing In Moderation
Local time
Today, 12:19
Joined
Oct 22, 2009
Messages
2,803
Any suggestions on how to build a Quality Assurance Query that could identify small changes between three joined fields that don't exactly match?

A large set of database downloads from Oracle across the nation comes in as multiple Excel files. The first one (or two) can be 1 M rows in Excel.
MS Access is used to import and process this data then complete associated process on SQL Server.

The situation requires joins on multiple Text fields, instead of primary number keys.
Until recently, the text was identical. Lately, some of the text is not exactly matching. It might be the word NEW in front of the regular text, a dash, or other small difference.

The process comes down to three text fields joined together that must exactly match. 99.99% of the time they do exactly match.

Trying to envision a Quality Assurance output that could point an end user to extremely similar text that don't quite exactly match.
When they run the accounting, they can identify numbers don't exactly add up in these situations. They need a tool to help then see the difference in the rare instance they are not the same.

Example:
Modem Equipment <---------> New Modem Equipment
Modem Equipment <----------> Modem-Equipment
 

plog

Banishment Pending
Local time
Today, 13:19
Joined
May 11, 2011
Messages
11,638
I think the best you can hope for is to make a function to clean up each field based on every case you can find. Something like this:

Code:
Function make_MatchString(in_String) AS String

ret = in_String
 ' return value, default to what it was passed

if (ret like "* New *") then (remove "New" from ret)
if  (ret like "*-*") then (remove "-" from ret)
' keep adding all the cases you find

make_MatchString = ret

End Function

Then make a query for every table, bring in all the fields and make a calculated one based on the function:

MatchString1: make_MatchString([Field1])
MatchString2: make_MatchString([Field2])
MatchString3: make_MatchString([Field3])

Then use those queries to and JOIN on all your MatchStrings to pull your data together.
 

MarkK

bit cruncher
Local time
Today, 11:19
Joined
Mar 17, 2004
Messages
8,179
You can also join tables on an expression other than equality, so you can do...
Code:
SELECT table1.*, table2.*
FROM table1 INNER JOIN table2 [COLOR="Blue"]ON table1.Field1 LIKE '*' & table2.Field2 & '*'[/COLOR]
WHERE table2.Field2 Is Not Null;
...and in this way implement a kind of fuzzy matching at the SQL level. Probably this would be very slow for big data sets--particularly using strings with the LIKE operator--but it's also a very powerful option to be aware of, because you can fuzzy match numbers, or fit numbers to a range, or using the pythagorean theorem in your match, matching points to min distance from another point, and so on.
 

Frothingslosh

Premier Pale Stale Ale
Local time
Today, 14:19
Joined
Oct 17, 2012
Messages
3,276
That is something that's very good to know....
 

sxschech

Registered User.
Local time
Today, 11:19
Joined
Mar 2, 2010
Messages
792
Are the fields you are matching also the join fields, or are you joining on a different field and doing a comparison on the fields in different tables?

Table1
EquipmentType abcd
EquipmentName Modem

Table2
EquipmentType abcd
EquipmentName New Modem

Join on EquipmentType


Or is it
Table1
EquipmentName Modem

Table2
EquipmentName New Modem

Join on Equipment Name

If the first scenario, you could try simil function
http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=9353&lngWId=1

or with some modification, perhaps can work with a join as in scenario two, but would be more tricky to implement.
 

Users who are viewing this thread

Top Bottom