Pulling data out of many historical documents. Put together this regex pattern and has been working fine. Unfortunately, the latest batch has a slightly different format. I modified the pattern, however, would like to know if it could be refined to maintain the original output format of previous documents. Otherwise, will I need to create a separate loop or function to remove/replace unwanted characters? Can Regex convert to uppercase or do I need to wrap the result in a function?
Originally was MP space number or CP space number. (C|M)?P\s\d+\.?\d*
MP 29.3
Now is M dot P dot space number and variations such as: (C|M)\.?P\.?\s\d+\.?\d*
M.p 29.3
m.P. 29.3
mp. 29.3
Sample Input. Actual text is read in from document so function has a variable instead of being hard coded.
Origianl Pattern:
New Pattern:
Current Output:
mp 8/C.P. 25.2/Cp. 4.5/c.p. 88.92/m.p 9
Desired output:
MP 8/CP 25.2/CP 4.5/CP 88.92/MP 9
Originally was MP space number or CP space number. (C|M)?P\s\d+\.?\d*
MP 29.3
Now is M dot P dot space number and variations such as: (C|M)\.?P\.?\s\d+\.?\d*
M.p 29.3
m.P. 29.3
mp. 29.3
Sample Input. Actual text is read in from document so function has a variable instead of being hard coded.
Origianl Pattern:
? regexallmatches("This is a sample of MP 8 CP 25.2 and extracting CP 4.5 in order to capture this too MP 88.92 afterwards ending with MP 9 during Fri.","(C|M)?P\s\d+\.?\d*", "/")
New Pattern:
? regexallmatches("This is a sample of mp 8 C.P. 25.2 and extracting Cp. 4.5 in order to capture this too c.p. 88.92 afterwards ending with m.p 9 during Fri.","(C|M)\.?P\.?\s\d+\.?\d*", "/")
Current Output:
mp 8/C.P. 25.2/Cp. 4.5/c.p. 88.92/m.p 9
Desired output:
MP 8/CP 25.2/CP 4.5/CP 88.92/MP 9
Code:
Function RegexAllMatches(txt As String, Optional pttrn As String, Optional sepchar As String = ",") As String
'Match all instances of the pattern
'https://stackoverflow.com/questions/44979363/excel-vba-regex-function-that-returns-multiple-matches-into-a-single-cell
'adjusted to similar syntax of existing regex functions
'20240314
Dim regex As Object
Dim myMatch As Object
Dim s As String
Dim arrayMatches()
Dim i As Long
Set regex = CreateObject("vbscript.regexp")
On Error GoTo notFound
With regex
.IgnoreCase = True
.MultiLine = True
.Global = True
If Len(pttrn) = 0 Then
.Pattern = "(Code)[\s][\d]{2,4}"
Else
.Pattern = pttrn
End If
If .Test(txt) Then
For Each myMatch In .Execute(txt)
ReDim Preserve arrayMatches(i)
arrayMatches(i) = myMatch.Value
i = i + 1
's = s & " " & myMatch
Next
End If
End With
RegexAllMatches = Join(arrayMatches, sepchar) '" ")
If Len(RegexAllMatches) = 0 Then
RegexAllMatches = "Not Found"
End If
Exit Function
notFound:
RegexAllMatches = "Not Found"
End Function