Join on substring (1 Viewer)

sxschech

Registered User.
Local time
Today, 16:30
Joined
Mar 2, 2010
Messages
791
Trying to join two tables. The "common field" is not an exact match. I used the following and thought it was working, but discovered some that did not get picked up. Generally the pattern appears to be that the fields match from beginning up to a certain point, but there are some that the order of some parts are in different positions or omitted.

Code:
SELECT DISTINCT qrySOURCE_A.DocNo, qrySOURCE_A.DocumentNo, qrySOURCE_B.Author, qrySOURCE_A.DocumentTitle, 
qrySOURCE_B.Title, qrySOURCE_A.DocumentStatus, qrySOURCE_A.DateModified, qrySOURCE_B.letterno
FROM qrySOURCE_A, qrySOURCE_B
WHERE (InStr([qrySOURCE_A].[DocumentTitle],[qrySOURCE_B].[Title])>0 
AND qrySOURCE_B.SentTo="ABC") 
OR (qrySOURCE_A.DocumentTitle Like [qrySOURCE_B].[Title] & "*" 
AND qrySOURCE_B.SentTo="ABC")
ORDER BY qrySOURCE_A.DocNo;

This matches:
SourceA.DocumentTitle:
Terms of Contract Form

SourceB.Title
2018.10.17-019-XYZ-AANN-XYZ-000402-Terms of Contract Form-Rev0

This didn't match
SourceA.DocumentTitle:
2018.10.15-019-XYZ-000100-AANN-ABC-This is the Title

SourceB.Title:
2018.10.15-019-XYZ-000100-This is the Title-Rev1

I can't rely on the fields having dots and/or dashes in order to use split delimiter as I noticed sometimes they simply used a space or not such as 2018 10 15 or maybe 20181015 or the year portion may not be in the title for one of the tables. Seems that the phrase part is consistent "This is the Title", not sure how to match on it since there is stuff before and after in some cases.

I did another query where I hard coded a left(field,25) and that produced a match, obviously I don't want to hard code as there is no guarantee that those positions apply universally.
 

plog

Banishment Pending
Local time
Today, 18:30
Joined
May 11, 2011
Messages
11,613
You didn't really ask a question, which is great because there's no real good answer for you. You're trying to do some fuzzy matching and it is fraught with issues.

The best you can do with a computer is create matching tests and have it apply them hoping the tests catch your matches. My suggestion is to create a matching function, pass it 2 strings then let it go through a bunch of tests to see if its a match. Obviously the first test is if they are exactly the same--if so, quit testing and return true. Next you could see if they are exactly the same after you strip out all unneeded characters (e.g. dashes, dots, etc.)--if true stop testing and return true.

Just keep coming up with tests and adding them to your code. Then, set up a query and see what the results are. IF you can spot a match that wasn't set to true, create a new test that will catch that particular match. Keep doing this until you are satisfied with the function.
 

sxschech

Registered User.
Local time
Today, 16:30
Joined
Mar 2, 2010
Messages
791
Looks like I forgot the
Code:
?
symbol. Thanks for your suggestion, to look into removing dashes and dots etc and see how that goes. I think there must be something off with that particular pair of titles, since other partial matches are coming though fine. I pasted into text editor and didn't see any funny chars, so that theory is out. Odd since the "Terms of Contract" title in the sample in my original post works but the "This is the Title" one doesn't.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 18:30
Joined
Feb 28, 2001
Messages
27,001
Good luck. You are trying to simulate the matching pattern of the human brain which is hard-wired to ASSOCIATIVELY match things almost as a "gestalt" (i.e. parallel, not linear) match. The problem is that doing this linearly via code will potentially take you longer than it would take to match by eye and hand.

Eventually you might be able to do this. But the code will be horrendously complex and will NEVER give you full satisfaction unless you can pre-condition the data set.

plog's suggestion of a function that takes input from two strings and returns TRUE or FALSE? Probably the best way to do this, since if you have a public function, you can call it from the Access query environment. But the problem is going to be building the function to your satisfaction. You WILL NOT write a simple query to do the matching unless it can call a function. The reason is that even queries have limits on the size of the query in SQL. But by writing that function, you put the complex text operations in a module, which has greater limits and greater flexibility.

My comment about pre-conditioning the inputs is that if there are tests you can make ahead of time to make the desired segments available for matching, you could speed up the process. I.e. if there is a way to recognize the title, split it out into another field ahead of time. If there is a way to recognize the serial number or other ID number ahead of time, split THAT out into another field. I.e. divide and conquer the matching process by identifying the matchable components and THEN do the comparisons.

That level of fuzzy logic is doomed unless you can find a way to simplify it. And Access might not be the language or environment that would be best for this anyway. Offhand, I'm not sure what WOULD work best. If you are going to get a handle on this, you must find a way to break up the parts of the problem in a manageable way because otherwise you will go nowhere and eat up a lot of time and effort in going nowhere.

I am telling you this because you need to not treat this as a "damn the torpedoes, full speed ahead" problem. Step back and ANALYZE your inputs very carefully. This will take a LOT of up-front analysis to try to break it apart into usable fragments. I do not see this problem as being solvable by brute force. So my best advice is divide and conquer. If you can't solve the problem, don't change the computer. Change the problem.

Don't take this as me telling you it can't be done. But DO take it as me telling you that there is extreme difficulty hiding under the fairly simple request you made, and thus you need to treat it as a non-trivial problem.
 

NauticalGent

Ignore List Poster Boy
Local time
Today, 19:30
Joined
Apr 27, 2015
Messages
6,286
I see Regular Expressions in your future. The problem is although I KNOW they will give you what you are looking for, I don’t know HOW to do it in your case.

I have a free book that takes you from novice level to Power Ranger. I consult it only long enough to get what I am looking for and then I put it away.

I tried to load it for you but it is too large for the forums 100kb limit. I can try to pm it to you or email it if you like.

Additionally, there is a tester on the Code Respository Forum if you want to test your code.

Let me know...
 

sxschech

Registered User.
Local time
Today, 16:30
Joined
Mar 2, 2010
Messages
791
Based on the posts, I have decided for now to do as follows:
1. Run the current query append to table
2. Run a query to find gaps in numeric sequence (although there may be legitimate gaps, this should help identify new ones)
3. Review gap list if any
4. Search the source table containing the number based on gap list query results
5. Use a visual search of the second source table to see if can find a match using The_Doc_Man's "human brain" approach:
6. If find a match based on my visual observation/interpretation, manually copy and paste the data into the joined table.


If I get a pattern or too many gaps, will rethink, but so far appears to have only been the one record that didn't match up using my existing logic.

Found the Gap query here:
http://www.utteraccess.com/forum/sql-find-gaps-number-s-t465981.html
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Tomorrow, 07:30
Joined
May 7, 2009
Messages
19,169
maybe create a function to strip the dot(.), space, and other special char before making comparison. and make the comparison both ways.
Code:
Public Function fncCompare(ByVal p1 As String, p2 As String) As Boolean
    Dim ignoreChar As String
    Dim i As Integer
    Dim v As Variant
    Dim bolRet As Boolean
    ignoreChar = ".'; " & Chr(34) & ""
    p1 = Trim(p1 & "")
    p2 = Trim(p2 & "")
    If p1 = "" Or p2 = "" Then Exit Function
    ' remove special chars
    For i = 1 To Len(ignoreChar)
        p1 = Replace(p1, Mid(ignoreChar, i, 1), "")
    Next
    For i = 1 To Len(ignoreChar)
        p2 = Replace(p2, Mid(ignoreChar, i, 1), "")
    Next
    ' compare both ways
    bolRet = (InStr(1, p1, p2, vbTextCompare) > 0 Or InStr(1, p2, p1, vbTextCompare) > 0)
    If bolRet Then
        fncCompare = bolRet
        Exit Function
    End If
    
    v = Split(p1, "-")
    If UBound(v) > 0 Then
        For i = UBound(v) To 0 Step -1
            If IsNumeric(v(i)) Then Exit For
        Next i
        ReDim Preserve v(i)
        p1 = Join(v, "-")
    End If
    v = Split(p2, "-")
    If UBound(v) > 0 Then
        For i = UBound(v) To 0 Step -1
            If IsNumeric(v(i)) Then Exit For
        Next i
        ReDim Preserve v(i)
        p2 = Join(v, "-")
    End If
    ' compare both ways
    fncCompare = (InStr(1, p1, p2, vbTextCompare) > 0 Or InStr(1, p2, p1, vbTextCompare) > 0)
    
End Function

youre query:

SELECT DISTINCT qrySOURCE_A.DocNo, qrySOURCE_A.DocumentNo, qrySOURCE_B.Author, qrySOURCE_A.DocumentTitle,
qrySOURCE_B.Title, qrySOURCE_A.DocumentStatus, qrySOURCE_A.DateModified, qrySOURCE_B.letterno
FROM qrySOURCE_A, qrySOURCE_B
WHERE (fncCompare([qrySOURCE_A].[DocumentTitle],[qrySOURCE_B].[Title])
AND qrySOURCE_B.SentTo="ABC")
ORDER BY qrySOURCE_A.DocNo;
 
Last edited:

sxschech

Registered User.
Local time
Today, 16:30
Joined
Mar 2, 2010
Messages
791
Hi Arnel,

Been trying to run your code and getting an error subscript out of range in the v2 for next loop

ReDim Preserve v(i)

i has value of -1

Phrase is like:
Sample System As-Built and Organized Changes
 

sxschech

Registered User.
Local time
Today, 16:30
Joined
Mar 2, 2010
Messages
791
I got Arnel's code to work. At first I thought that Preserve was a variable, but then noticed that it wasn't listed at the top of the code. Then did a search and found a post indicating that can only have one ReDim Preserve. I commented out the first and code still errored, next I uncommented out the first and commented the second one and the code ran. So far seems to be working, still having issue when wording is same but with extra text, but don't think the code could really solve that one. I'll play around a bit more and then mark it solved.

https://forums.windowssecrets.com/s...Redim-(out-of-subscript-error)-(Excel-2K-(VBA))

Example:
Title
  • XYZ-000111-AANN-ABC-Switch Setting Panel Follow Up-Rev0
  • Switch Setting Panel

Both got mapped to
DocumentTitle
  • 2018.09.20-011-XYZ-000111-AANN-ABC-Switch Setting Panel Follow Up-Rev0

So now need to look out for multirecord rows
 

Users who are viewing this thread

Top Bottom