Replacing Multiple Spaces from the Middle of a String (1 Viewer)

ebs17

Well-known member
Local time
Today, 08:10
Joined
Feb 7, 2020
Messages
1,946
Regarding the speed test: I wouldn't call 16 milliseconds more time over 10,000 attempts a speed catastrophe. Who feels that without measuring?

In a database environment, I would be more interested in whether I have to deal with such data permanently and in bulk, or whether such data should be cleaned up and then stored in a structured manner. Considering individual solutions to individual tasks related to speed is only part of a relevant consideration.

I like to use RegEx. But if I can really use them in a database, I did something wrong before. String processing of longer texts and processing of atomic informations are very different things.
 
Last edited:

KitaYama

Well-known member
Local time
Today, 15:10
Joined
Jan 6, 2022
Messages
1,541
So much complexity suggested to do something so simple! No loops required.
I'm not as good as most of you, but to me, several lines of vba, a loop or a regex is much more simple than having 5 or 6 queries and two additional tables (the demo in given link).

Everybody is anxious about several milli seconds of running time, but I wonder how long does it take for an expert to manage the query method, adding temp tables, etc. And you can not even be sure if it works as expected or not. You have to test several types of data, different patterns to see how the result is.
I don't even need to test the code. I can look at the 4 lines of the loop or regex pattern and see if it works or not.

But again, as I said, I'm not that good in Access as you are. Maybe going through those steps to add queries and helping tables is simpler for you. But surely not for an average Access user.
 

isladogs

MVP / VIP
Local time
Today, 07:10
Joined
Jan 14, 2017
Messages
18,221
I added in @arnelgp's new code based on Regex and re-tested each 3 times.
I pasted 3 text strings of differing lengths into a table for testing

Firstly, with the ridiculously short string I used before

Code:
Test1 - Short String (19 chars)
1. MajP - Loop: ReplaceMultiSpace : Time Taken = 0.0859375 s
2. AGP - Loop: SngSpaceOnly : Time Taken = 0.078125 s
3. AGP Regex: OneSpaceOnly : Time Taken = 0.0546875 s
4. EBS17 : RegExReplace : Time Taken = 0.0859375 s


Test1 - Short String (19 chars)
1. MajP - Loop: ReplaceMultiSpace : Time Taken = 0.08203125 s
2. AGP - Loop: SngSpaceOnly : Time Taken = 0.078125 s
3. AGP Regex: OneSpaceOnly : Time Taken = 0.0546875 s
4. EBS17 : RegExReplace : Time Taken = 0.0859375 s


Test1 - Short String (19 chars)
1. MajP - Loop: ReplaceMultiSpace : Time Taken = 0.08203125 s
2. AGP - Loop: SngSpaceOnly : Time Taken = 0.07421875 s
3. AGP Regex: OneSpaceOnly : Time Taken = 0.0546875 s
4. EBS17 : RegExReplace : Time Taken = 0.0859375 s

As before, there is very little difference in the results ... but with such a short string its easy enough to edit it manually!

And again, with the longer string I used before

Code:
Test2 - Longer String (561 chars)
1. MajP - Loop: ReplaceMultiSpace : Time Taken = 0.5 s
2. AGP - Loop: SngSpaceOnly : Time Taken = 0.59765625 s
3. AGP Regex: OneSpaceOnly : Time Taken = 0.06640625 s
4. EBS17 : RegExReplace : Time Taken = 0.15625 s


Test2 - Longer String (561 chars)
1. MajP - Loop: ReplaceMultiSpace : Time Taken = 0.49609375 s
2. AGP - Loop: SngSpaceOnly : Time Taken = 0.59375 s
3. AGP Regex: OneSpaceOnly : Time Taken = 0.06640625 s
4. EBS17 : RegExReplace : Time Taken = 0.15625 s


Test2 - Longer String (561 chars)
1. MajP - Loop: ReplaceMultiSpace : Time Taken = 0.49609375 s
2. AGP - Loop: SngSpaceOnly : Time Taken = 0.59375 s
3. AGP Regex: OneSpaceOnly : Time Taken = 0.0625 s
4. EBS17 : RegExReplace : Time Taken = 0.16015625 s

Here arnel's new code is a clear winner

And once more with a much longer string of almost 12K characters:
Code:
Test3 - Even Longer String (12K chars)
1. MajP - Loop: ReplaceMultiSpace : Time Taken = 16.22265625 s
2. AGP - Loop: SngSpaceOnly : Time Taken = 22.16796875 s
3. AGP Regex: OneSpaceOnly : Time Taken = 0.16015625 s
4. EBS17 : RegExReplace : Time Taken = 1.1171875 s


Test3 - Even Longer String (12K chars)
1. MajP - Loop: ReplaceMultiSpace : Time Taken = 16.05078125 s
2. AGP - Loop: SngSpaceOnly : Time Taken = 23.18359375 s
3. AGP Regex: OneSpaceOnly : Time Taken = 0.171875 s
4. EBS17 : RegExReplace : Time Taken = 1.5625 s


Test3 - Even Longer String (12K chars)
1. MajP - Loop: ReplaceMultiSpace : Time Taken = 17.2265625 s
2. AGP - Loop: SngSpaceOnly : Time Taken = 23.10546875 s
3. AGP Regex: OneSpaceOnly : Time Taken = 0.16796875 s
4. EBS17 : RegExReplace : Time Taken = 1.1328125 s

Even more dramatic differences in this case with @arnelgp crowned as Regex King!

The outcomes are clearly significant in this set of tests
Both Regex versions are significantly faster and in particular arnel's version is particularly impressive in tems of speed for this set of tests.

I'll update my article with an additional example app based on some of these tests - probably just arnel's as he achieved both the fastest and slowest results of the largest sample 👍😏

Nevertheless, my conclusions in the original article still stand.
For a simple task particularly with a small dataset, using Regex offers no speed advantages and may be overkill.
However, for a more complex task, especially with a large dataset to process, it clearly can offer real advantages

Of course, speed isn't the only factor here. For example, the additional time needed to devise a Regex solution to solve a specific task may outweigh the time it may save in terms of execution
 

ebs17

Well-known member
Local time
Today, 08:10
Joined
Feb 7, 2020
Messages
1,946
additional time needed to devise a Regex solution
I see it rather the opposite way. If you have some familiarization with the RegEx topic, especially with the high level of abstraction in the search patterns, then the creation of a solution goes quite quickly. That's why I use a somewhat more general-purpose function with arguments instead of a specialized single function. The development is limited to the use of suitable arguments, which is quite clear.

Combined with the statement above that the job is for more one-time processing, I have a satisfying result in adding development time and execution time. In case anyone notices: The RegEx solution was one of the first on the table, together with the InnerTrim function, which was ignored, but has the same operating principle as the two functions mentioned below and then considered.
 

isladogs

MVP / VIP
Local time
Today, 07:10
Joined
Jan 14, 2017
Messages
18,221
@arnelgp
Sorry I hadn't seen your test results in post #20 until now
One big problem. Your Regex code doesn't give the same output as the other 3 tests.
Please can you recheck your code

Code:
Test1 - Short String (19 chars)
1. MajP - Loop: ReplaceMultiSpace : Time Taken = 0.0625 s             Modified length = 15 chars
2. AGP - Loop: SngSpaceOnly : Time Taken = 0.078125 s   Modified length = 15 chars
3. AGP - Regex: OneSpaceOnly : Time Taken = 0.0625 s      Modified length = 16 chars
4. EBS17 - RegExReplace : Time Taken = 0.078125 s       Modified length = 15 chars

Test2 - Longer String (561 chars)
1. MajP - Loop: ReplaceMultiSpace : Time Taken = 0.484375 s           Modified length = 488 chars
2. AGP - Loop: SngSpaceOnly : Time Taken = 0.578125 s   Modified length = 488 chars
3. AGP - Regex - OneSpaceOnly : Time Taken = 0.046875 s    Modified length = 559 chars
4. EBS17 - RegExReplace : Time Taken = 0.15625 s        Modified length = 488 chars

Test3 - Even Longer String (11948 chars)
1. MajP - Loop: ReplaceMultiSpace : Time Taken = 15.59375 s           Modified length = 10236 chars
2. AGP - Loop: SngSpaceOnly : Time Taken = 21.671875 s  Modified length = 10236 chars
3. AGP - Regex: OneSpaceOnly : Time Taken = 0.140625 s    Modified length = 11926 chars
4. EBS17 - RegExReplace : Time Taken = 1.109375 s       Modified length = 10236 chars

Just checked. Your Regex code seems to be only acting upon the first instance of a multiple space which inevitably means its faster!

I've put updating my web article on hold for now!
 
Last edited:

Gasman

Enthusiastic Amateur
Local time
Today, 07:10
Joined
Sep 21, 2011
Messages
14,299
a little bit faster:
Code:
Public Function OneSpaceOnly(ByVal var As Variant)
    Static oReg As Object
    Dim s As String
    On Error GoTo create_object
    If IsNull(var) Then Exit Function
    s = var
    With oReg
        .Pattern = " {2,}"
        .Global = True
        s = .Replace(s, " ")
    End With
    OneSpaceOnly = s
    Exit Function
create_object:
    Set oReg = CreateObject("vbscript.regexp")
    Resume Next
End Function
forgot the evidence:
View attachment 105387
View attachment 105388

my own conclusion, with the right code, RegExp can be very fast.
OK, I could not understand why you have on error to set oReg?
So I copied the code and tried it with string below.

Nothing changed? Am I missing a reference?

Edit: Found regex library is MSVbscript library. I tried 5.5 and 1 and it still does not work, BUT, I found that if I ran it again, it then worked?, on either library?
1671819111383.png
 
Last edited:

isladogs

MVP / VIP
Local time
Today, 07:10
Joined
Jan 14, 2017
Messages
18,221
@Gasman
Both of the Regex examples are using late binding, so you don't need the reference - at least not in recent versions of Access

@arnelgp's Regex example doesn't work correctly at the moment, but the code by @ebs17 does work
 

Gasman

Enthusiastic Amateur
Local time
Today, 07:10
Joined
Sep 21, 2011
Messages
14,299
I cannot work out why it goes to the error label for each of the regex parameters the first time around, then does not subsequently?

I amended to the code below and the output comes back as required (I think?)
Code:
Public Function fnOneSpaceOnly(ByVal var As Variant)
    Static oReg As Object
    Dim s As String
    On Error GoTo create_object
    If IsNull(var) Then Exit Function
    'Set oReg = CreateObject("vbscript.regexp")
    s = var
    While InStr(1, s, "  ") > 0
        With oReg
            .Pattern = " {2,}"
            .Global = True
            s = .Replace(s, " ")
        End With
    Wend
    fnOneSpaceOnly = s
    Exit Function
create_object:
    Set oReg = CreateObject("vbscript.regexp")
    Resume Next
End Function

Why not just create the object before trying to use it? :unsure:
 

theDBguy

I’m here to help
Staff member
Local time
Yesterday, 23:10
Joined
Oct 29, 2018
Messages
21,473
Why not just create the object before trying to use it? :unsure:
I think it was mentioned before that using a persistent object makes it run a little bit faster.
 

ebs17

Well-known member
Local time
Today, 08:10
Joined
Feb 7, 2020
Messages
1,946
@arnelgp spoke of "with the right code", but ignored the Multiline flag.

Strong spells would be appropriate if he had shown a better = revolutionary pattern, as that is the real focus.

Why not just create the object before trying to use it?
In the code, the object is set via error handling. I myself avoid controlling programs via errors. It is usually better to check first and then act. An error handler should handle unexpected errors, less intentional errors.
 

isladogs

MVP / VIP
Local time
Today, 07:10
Joined
Jan 14, 2017
Messages
18,221
@Gasman's revised version does work but it is indeed SIGNIFICANTLY slower than the original Regex code by @ebs17

Code:
Test1 - Short String (25 chars)

1. MajP - Loop: ReplaceMultiSpace, Time Taken = 0.109375 s             Modified length = 15 chars
2. AGP - Loop: SngSpaceOnly, Time Taken = 0.109375 s     Modified length = 15 chars
3. AGP/Gasman Regex: OneSpaceOnly, Time Taken = 0.09375 s               Modified length = 15 chars
4. EBS17 : RegExReplace, Time Taken = 0.078125 s         Modified length = 15 chars

Test2 - Longer String (562 chars)
1. MajP - Loop: ReplaceMultiSpace, Time Taken = 0.46875 s              Modified length = 488 chars
2. AGP - Loop: SngSpaceOnly, Time Taken = 0.578125 s     Modified length = 488 chars
3. AGP/Gasman Regex: OneSpaceOnly, Time Taken = 0.234375 s              Modified length = 488 chars
4. EBS17 : RegExReplace, Time Taken = 0.140625 s         Modified length = 488 chars

Test3 - Even Longer String (11949 chars)
1. MajP - Loop: ReplaceMultiSpace, Time Taken = 15.515625 s            Modified length = 10236 chars
2. AGP - Loop: SngSpaceOnly, Time Taken = 30.515625 s    Modified length = 10236 chars
3. AGP/Gasman Regex: OneSpaceOnly, Time Taken = 2.9140625 s             Modified length = 10236 chars
4. EBS17 : RegExReplace, Time Taken = 1.109375 s         Modified length = 10236 chars

To reiterate, with the right code & with a suitable task, Regex can indeed be much faster than a loop
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 14:10
Joined
May 7, 2009
Messages
19,243
Nothing changed? Am I missing a reference?
i don't know what is wrong with your pc but it definitely works for me Without alteration:
here is the complete code of test:
Code:
Option Compare Database
Option Explicit

' in a standard module
Private pRegEx As Object

Public Property Get oRegEx() As Object
   If (pRegEx Is Nothing) Then Set pRegEx = CreateObject("Vbscript.Regexp")
   Set oRegEx = pRegEx
End Property

Public Function RegExReplace(ByVal SourceText As String, _
      ByVal SearchPattern As String, _
      ByVal ReplaceText As String, _
      Optional ByVal bIgnoreCase As Boolean = True, _
      Optional ByVal bGlobal As Boolean = True, _
      Optional ByVal bMultiLine As Boolean = True) As String
   Static RegEx As Object
   If RegEx Is Nothing Then
       Set RegEx = CreateObject("Vbscript.Regexp")
   End If
   With RegEx
      .Pattern = SearchPattern
      .IgnoreCase = bIgnoreCase
      .Global = bGlobal
      .Multiline = bMultiLine
      RegExReplace = .Replace(SourceText, ReplaceText)
   End With
End Function

Public Function InnerTrim(ByVal ThisString As String) As String
    Dim sResult As String
    sResult = ThisString
    Do While InStr(1, sResult, "  ") > 0
        sResult = Replace(sResult, "  ", " ")
    Loop
    InnerTrim = sResult
End Function

Public Function ReplaceMultiSpace(strIN As String) As String
  Dim old As String
  Dim newout As String
  old = Trim(strIN)
  newout = old
  Do
    old = newout
    newout = Replace(old, "  ", " ")
  Loop Until old = newout
  ReplaceMultiSpace = newout
End Function


Public Function SngSpaceOnly(ByVal value As Variant)
    If IsNull(value) Then
        SngSpaceOnly = value
        Exit Function
    End If
    Do While InStr(1, value, "  ") <> 0
        value = Replace$(value, "  ", " ")
    Loop
    SngSpaceOnly = value
End Function

Public Function OneSpaceOnly(ByVal var As Variant)
    Static oReg As Object
    Dim s As String
    On Error GoTo create_object
    If IsNull(var) Then Exit Function
    s = var
    With oReg
        .Pattern = " {2,}"
        .Global = True
        s = .Replace(s, " ")
    End With
    OneSpaceOnly = s
    Exit Function
create_object:
    Set oReg = CreateObject("vbscript.regexp")
    Resume Next
End Function


Function SpeedTest(strIN As String)
    Dim strOUT As String, dblStart As Double, dblEnd As Double
    Dim lngCount As Long
    
    dblStart = Timer
    For lngCount = 1 To 10000
        strOUT = SngSpaceOnly(strIN)
    Next
    dblEnd = Timer
    Debug.Print "1. SngSpaceOnly" & " : Time Taken = " & dblEnd - dblStart & " s"
    
    dblStart = Timer
    For lngCount = 1 To 10000
        strOUT = ReplaceMultiSpace(strIN)
    Next
    dblEnd = Timer
    Debug.Print "2. ReplaceMultiSpace" & " : Time Taken = " & dblEnd - dblStart & " s"
    
    
    dblStart = Timer
    For lngCount = 1 To 10000
        strOUT = RegExReplace(strIN, " {2,}", " ")
    Next
    dblEnd = Timer
    Debug.Print "3. RegExReplace" & " : Time Taken = " & dblEnd - dblStart & " s"
    
    dblStart = Timer
    For lngCount = 1 To 10000
        strOUT = OneSpaceOnly(strIN)
    Next
    dblEnd = Timer
    Debug.Print
    Debug.Print "Input String to OneSpaceOnly:    " & strIN
    Debug.Print "Output String from OneSpaceOnly: " & strOUT
    Debug.Print
    Debug.Print "4. OneSpaceOnly" & " : Time Taken = " & dblEnd - dblStart & " s"
End Function



Sub Test1()
    Dim strIN As String
    strIN = "abc    def ghi  xyz"
    
    Debug.Print "Test1 - Short String" & vbCrLf & "===================="
    SpeedTest (strIN)
    Debug.Print "" & vbCrLf

End Sub

Sub Test2()

    Dim strIN As String

    strIN = "Regular   expressions are a     very powerful  tool for developers  to make use of   ." & _
    "However,   in  this particular set   of tests, using   RegEx   was   clearly  disadvantageous." & _
    "Perhaps   its use  was  overkill  for  the  test   done?" & _
    "    " & _
    "In  other cases,  Regex may  provide the best   or only   method of obtaining results." & _
    "  " & _
    "I   would be   grateful for any  feedback on   this  article" & _
    "I would also welcome   any     suggestions   for other tests  in order to further  assess the  comparative" & _
    "     strength of regular     expressions against  other    methods."
    
    Debug.Print "Test2 - Longer String" & vbCrLf & "======================"
    
    SpeedTest (strIN)
    
    Debug.Print "" & vbCrLf
End Sub

and here is the result Again:
out1.png


out2.png
 

isladogs

MVP / VIP
Local time
Today, 07:10
Joined
Jan 14, 2017
Messages
18,221
@arnelgp
i don't know what is wrong with your pc but it definitely works for me Without alteration:

Except that you did alter the code ....

Code:
'ORIGINAL CODE

Public Function OneSpaceOnly(ByVal var As Variant)
    Static oReg As Object
    Dim s As String
    If oReg Is Nothing Then
        Set oReg = CreateObject("vbscript.regexp")
    End If
    If IsNull(var) Then Exit Function
    s = var
    With oReg
        .Pattern = " {2,}"
        s = .Replace(s, " ")
    End With
    OneSpaceOnly = s
End Function
'

Code:
'NEW CODE

Public Function OneSpaceOnly(ByVal var As Variant)
    Static oReg As Object
    Dim s As String
    On Error GoTo create_object
    If IsNull(var) Then Exit Function
    s = var
    With oReg
        .Pattern = " {2,}"
        .Global = True
        s = .Replace(s, " ")
    End With
    OneSpaceOnly = s
    Exit Function
create_object:
    Set oReg = CreateObject("vbscript.regexp")
    Resume Next
End Function

The new code does give the correct output and remains the fastest . . .
. . . but not as fast as the original version which gave the wrong output
 

Gasman

Enthusiastic Amateur
Local time
Today, 07:10
Joined
Sep 21, 2011
Messages
14,299
i don't know what is wrong with your pc but it definitely works for me Without alteration:
Oh, it will not be my PC, but me :)

I walked through the code line by line as I advise people to do.
On the very first run, it goes to the error label for each time a regex property is referenced, so 3 times, then does not change anything.
Run it again and it does not error and produces the required string reduced in spaces. :unsure:
 
Last edited:

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 14:10
Joined
May 7, 2009
Messages
19,243
but not as fast as the original version which gave the wrong output
if you will rename the original, it will also do what the 2nd version does.
the difference is that only on the second, i remove the Delay of everytime checking If oReg Is Nothing...
 

isladogs

MVP / VIP
Local time
Today, 07:10
Joined
Jan 14, 2017
Messages
18,221
Not from my earlier tests
The original version only removed the first instance of multiple spaces and ignored the rest as you originally omitted the .Global=True line.
It ran faster despite checking If oReg Is Nothing... as it wasn't processing the entire string
See post #26
 
Last edited:

isladogs

MVP / VIP
Local time
Today, 07:10
Joined
Jan 14, 2017
Messages
18,221
Just updated the web article

Many thanks to @arnelgp, @ebs17 & @MajP for providing code used in the latest set of tests

Further examples would also be appreciated
 

Users who are viewing this thread

Top Bottom