Extracting varying length string between two characters

To be pedantic strSplitArray should be dimmed as Variant as that's what Split returns, and good coding practice suggests the function should be returned explicitly as variant (which is what it returns by default)
Not to be pedantic, but I am pretty sure you got it backwards. It has to be declared as string. Split returns strings
Returns a zero-based, one-dimensional array containing a specified number of substrings.
In the following test it works if declared a string and fails if declared a variant with a type mistmatch. Which to me is suprising that it does not cast to a variant.


Split.PNG


Am I missing something?
 
I did not use a RegEx for this simple test - just the VBA version of the Mid and Instr functions
I am now testing mid/instr in a VBA function, mid/Instr directly in SQL and Regex in a VBA function with 65000 data records. Each value was read from the column in a loop over the recordset.

Result:
mid/instr + VBA function: ~1.8 sec
mid/instr in SQL: ~3.8 sec .. why?
Regex: ~2.7 sec
 
Perhaps try the split function as well? (That’s my go to solution for this sort of scenario😀)
 
I just tested it on a table with only 20,000 records, and the difference in performance was VERY noticeable - the JET expression was essentially instant, while the VBA function took several seconds.
I would like to see the complete test environment.
just the VBA version of the Mid and Instr functions
If you do exactly the same thing and just add some overhead, you shouldn't be surprised if runtimes increase. This conclusion seems plausible to me even without testing.

but I very much doubt if this statement is true
The Expression Service only works if the query is started from Access. Jet-SQL can also be called from other places, via DAO/ADODB in Excel, VBScript & Co., but there is no direct access of the database engine to external commands. This makes it easy to test what is part of the actual language scope of Jet-SQL and what is not. You can use tests to test your beliefs.

Result:
mid/instr + VBA function: ~1.8 sec
mid/instr in SQL: ~3.8 sec .. why?
Regex: ~2.7 sec
That's pretty much what I expected, although the second line also surprises me a bit. Is it possible that the VBA code is not compiled here and that the VBA commands are therefore constantly being reinterpreted?

Regarding data types: If you break a string down into substrings, the substrings are also strings. You can only get something different if you also perform data type conversions.

I prefer a solution using Split or RegEx because they are extremely variable. You notice this when you want to isolate the third or fourth subexpression instead of the second one. You only change one argument when you call the function.
 
Last edited:
I am now testing mid/instr in a VBA function, mid/Instr directly in SQL and Regex in a VBA function with 65000 data records. Each value was read from the column in a loop over the recordset.

Result:
mid/instr + VBA function: ~1.8 sec
mid/instr in SQL: ~3.8 sec .. why?
Regex: ~2.7 sec
If you're doing it in a loop, that means you're doing it in VBA, which complete negates the point of having JUST AN EXPRESSION in a query. Yes, your results are not at all surprising. You are in VBA, and you are adding the overhead of calling JET to your already slow VBA. The way to do it, which is what the OP was asking about, is putting the expression I have cited here several times directly into the query:
SQL:
SELECT fldBlah1, fldBlah2, etc.,
    [My Expression] as QuickResult,
    fldBlah98, fldBlah99
From SomeTable
If you do it that way, the results are very quick. If you replace [My Expression] with a call to a VBA function, it slows down, a lot.
 
Last edited:
If you're doing it in a loop, that means you're doing it in VBA, ..
I opened a recordset in VBA and evaluated a calculated field of a query.
Of course, I had to scroll through the records in VBA using a loop. How else can you measure the time?
 
scroll through the records in VBA using a loop
Isn't that additional work that overshadows the actual effort of query execution?

Code:
=> TakeStartTime
Set rs = db.Openrecordset(SqlStatement, dbOpenForwardOnly)
rs.MoveLast
=> TakeEndTime
For a clean measurement, I would also expect that each compound expression is unique, i.e. occurs only once in the data set, in order to exclude any cache effects.

A test environment could be found here:
Now all you need to do is add a table with lots of unique test data. I spare myself the effort to prove the obvious.
 
I seem to recall that I defined an array as numeric to split a string of numbers and got type mismatch?
 
Not to be pedantic, but I am pretty sure you got it backwards. It has to be declared as string. Split returns strings

In the following test it works if declared a string and fails if declared a variant with a type mistmatch. Which to me is suprising that it does not cast to a variant.

View attachment 115688

Am I missing something?
According to Microsoft you are. Which says explicitly that it returns a variant array. The substrings are of variant sub-type string.

Where is your function throwing the error?

I have literally hundreds of instances of assigning the results of Split (And Array) to variants, albeit NOT declared not as an array. Eg Dim aSplit as Variant: no brackets.

And as I said in my first reply - if it works use it. I see a lot on internet example do it with string arrays.
 
Isn't that additional work that overshadows the actual effort of query execution?
Opening the query is not enough, otherwise the expressions will not be calculated.
The functions were not called by calling rs.movelast. (Rs.movelast is good for testing the where condition, grouping, etc.)
Of course, it also takes time to run through the data records. However, this is not tragic for the comparison, as it always runs in the same way.
And statements such as twice as fast etc. are not useful anyway. ;)

@CJ_London:
Perhaps try the split function as well?
Split was the fastest in my test.
 

Attachments

Last edited:
I would like to see the complete test environment.
I don't know what you mean by that - I'm not going to mail you my computer. The 'complete test environment' is an Access database, nothing more. You can easily make a blank one yourself, create a table, load it with the sorts of strings the OP described and make your own tests. If you do it with a pure SQL query, your results will be quick. If you put a call to a VBA function into the query, they will be a lot slower, and it will not matter in the least what the VBA function does.

If you do exactly the same thing and just add some overhead, you shouldn't be surprised if runtimes increase. This conclusion seems plausible to me even without testing.
Well, of course - that's my whole point. Calls to a VBA function add overhead, and apparently quite a lot.

The Expression Service only works if the query is started from Access. Jet-SQL can also be called from other places, via DAO/ADODB in Excel, VBScript & Co., but there is no direct access of the database engine to external commands. This makes it easy to test what is part of the actual language scope of Jet-SQL and what is not. You can use tests to test your beliefs.
I did test, and the results are exactly as I described, and exactly what I expected.

That's pretty much what I expected, although the second line also surprises me a bit. Is it possible that the VBA code is not compiled here and that the VBA commands are therefore constantly being reinterpreted?
No. The overhead is in the repeated calls to VBA. If you compile the code first, the results will not be materially different.

Regarding data types: If you break a string down into substrings, the substrings are also strings. You can only get something different if you also perform data type conversions.
Of course - who would expect anything else?

I prefer a solution using Split or RegEx because they are extremely variable. You notice this when you want to isolate the third or fourth subexpression instead of the second one. You only change one argument when you call the function.
Yes, no argument - calling a VBA function gives you lots more options. BUT, at the cost of a monumental increase in run time. If you have complicated requirements and only a few records, it's clearly the way to go. If you have simple requirements and many records, it is clearly a bad choice. The OP had simple requirements, I do not know how many records. But his specific request, copied directly from his original post is, "If this can be done in an expression written into the design view of a query, that would be most ideal as I am not great with VBA at all." THAT was what I was answering, and the fact that it is also MUCH faster is simply a bonus.
 
I opened a recordset in VBA and evaluated a calculated field of a query.
Of course, I had to scroll through the records in VBA using a loop. How else can you measure the time?
If you put the expression I showed you directly into the SQL statement the defines the recordset, you can then open it, go to the last record via the .MoveLast command and measure the time. But the OP wanted an expression that would go directly into a query, without any VBA involvement, so that was what I posted.
 
But the OP wanted an expression that would go directly into a query, without any VBA involvement, so that was what I posted.
I also prefer to work without VBA in SQL.
The speed of execution (of the select part) is not an argument in this case. ;)
 
I also prefer to work without VBA in SQL.
The speed of execution (of the select part) is not an argument in this case. ;)
What do you mean, 'not an argument'? Just that you don't care about the speed?
 
Not to be pedantic, but I am pretty sure you got it backwards. It has to be declared as string. Split returns strings
...... etc

Am I missing something?
As a follow up to my first reply I realised that you are missing something - a Variant array is not the same thing as an array of variants.

I tried your code with a simple variant declaration and it works without problem.

Assigning the result to a String array works because the returned array is converted implicitly to the string array.

All in all the argument discussion is fun but pretty irrelevant - mea culpa but I am pedantic sorry!
 
I did test, and the results are exactly as I described, and exactly what I expected.
If two people test, there will be three different results. Therefore, it is always relevant to show exactly how the test is carried out.
 
If two people test, there will be three different results. Therefore, it is always relevant to show exactly how the test is carried out.
Well, I described exactly what I did, way back in #7. Opened a query, once with my expression, once calling a VBA function to do the exact same thing as the expression. The expression was fast, the VBA function slow. I don't know what more you think I can add to that.
 

Users who are viewing this thread

Back
Top Bottom