Trying to Improve Performance Time/File Size of Modules

mulch17 · Feb 15, 2016

Hello all,

I have a module that imports/parses text files, and inserts their contents into various tables/fields, using a wide variety of logic.

The code does what it's intended to do, without any errors. The problem is that it takes a long time to run the code, and my file size grows out of control. It's taking about 6-8 minutes to insert ~15,000 records into the table.

Additionally, the .accdb file will grow from ~2 MB to well over 100 MB after running the code. If I do a compact and repair, the file size drops back down to about ~4 MB. There must be some inefficiencies in my code that are hogging memory and slowing things down.

The code is basically performing ~15,000 different insert statements as follows (these are all dummy names - some inserts only have 2-3 fields, others have 20-25, and they all go into different tables using all kinds of different logic):

Code:

Set db =  CurrentDb
Set qdf = db.QueryDefs("InsertQuery") 'Exists in the file already

strSQL = "PARAMETERS RecordNamePar Text, RecordDatePar DateTime, RecordProperty1Par Text, Record Property2Par Text; " & _
         "INSERT INTO [Table1] ([record_name_tx], [record_dte], [record_property_1_tx], [record_property_2_tx]) " & _
         "VALUES ([RecordNamePar], [RecordDatePar], [RecordProperty1Par], [RecordProperty2Par]);"

qdf.SQL = strSQL
qdf.Parameters("RecordNamePar") = RECORD_NAME 'Variable
qdf.Parameters("RecordDatePar") = RECORD_DATE 'Variable
qdf.Parameters("RecordProperty1Par") = RECORD_PROPERTY_1 'Variable
qdf.Parameters("RecordProperty2Par") = RECORD_PROPERTY_2 'Variable
qdf.Execute

Basically, for each line, it overwrites the SQL statement behind the InsertQuery object in the database, and then executes it. Then it moves on to the next line, and repeats that process over and over.

My hunch is that this is causing the inefficiency issues here. I've learned from this forum that it's important to use parameterized queries, and I like the visual layout of this code, but it may be too slow to actually use.

I've read that it's faster to use CurrentDb.Execute strSQL, dbFailOnError instead, but it seems like it would be pretty tedious/risky/ugly to build a huge concatenated string like that.

Is there a problem with the way I've approached this code, or is there something else that might be causing my performance/memory issues here? I'm happy to answer any other questions that may help shed some light on this here.

Thanks for reading!

jdraw · Feb 15, 2016

I think you are correct.

This seems to be repeated probably unnecessarily

The code is basically performing ~15,000 different insert statements as follows (these are all dummy names - some inserts only have 2-3 fields, others have 20-25, and they all go into different tables using all kinds of different logic):

Not knowing your data, it is difficult to be precise with options.

If you had a working table to contain records to be distributed, you could have :

a) some saved queries to populate only certain tables (one query per table)
b) some vba to process the working table with either the saved queries, or using the db.execute construct

With this you would process the data in a series of loops until all records in the working table were transferred.

As I said, without knowing your data - the original and each of the target tables, it is difficult to offer more.

mulch17 · Feb 15, 2016

jdraw said:
I think you are correct.

This seems to be repeated probably unnecessarily

Not knowing your data, it is difficult to be precise with options.

If you had a working table to contain records to be distributed, you could have :

a) some saved queries to populate only certain tables (one query per table)
b) some vba to process the working table with either the saved queries, or using the db.execute construct

With this you would process the data in a series of loops until all records in the working table were transferred.

As I said, without knowing your data - the original and each of the target tables, it is difficult to offer more.

Thanks for your quick reply, jdraw.

Here's a high-level description of the data process (feel free to skip this, if you want):

1. FileLoop sub goes through a specified directory and finds files with either a .txt or .dat file extension, and calls the ParseFile sub for each file

2. ParseFile goes through each line and identifies what "group" the line is associated with, by looking at specific characters. It identifies if a record for that "group" exists in the "group" table. If a record does exist, it returns the PK. If no record exists, it adds a record and then returns the PK. Then it calls the ParseGroup sub for each line

3. ParseGroup is just a big long Select Case statement that identifies what "type" each line is, by looking at specific characters. There are a wide variety of line "types", and each line "type" contains varying numbers/names of fields. Once the line "type" is identified, it calls the corresponding ParseLine sub for that type (again, there are many different possible parse line functions)

4. ParseLine goes through each character of the line, splits it into different fields, and then inserts a record into the corresponding "type" table (you guessed it, there are many different "type" tables too) with the parsed fields. The PK from ParseGroup is passed through as a FK here.

Then ParseGroup is called for line 2, steps 2-4 are repeated for that line, ParseGroup is called for line 3, and so on until the end of the file. Then FileLoop goes to the next file, starts steps 2-4 for that file, and so on.

Hopefully that explanation makes at least a little sense.

Anyway, your reply got me thinking, and I realized that I was re-writing the same INSERT functions over and over. There is only one possible insert statement per "type" that I would want to use, so I went with option (a) in your reply, and manually hand-jammed an INSERT query for each of the line "types". Then I referenced those pre-existing queries in the code explicitly, rather than building them every single time.

This cut the processing time in half - now the same 15,000 line file took just under 3 minutes. The file size was also cut in half - it was now only ~35 MB after completing. So that did help a lot, thanks for that suggestion!!!

However, I'm going to keep thinking and see if I can do better. I still have some variation of this statement running for every single line:

Code:

Set qdfType1 = CurrentDb.QueryDefs("Insert_Type_1_Q")

I feel like that's not necessary, that it should just be sufficient to create one QueryDef object for each type in VBA, and then write one Set qdfType line for each type at the very beginning, and then be done with it.

However, I still haven't figured out how to do this - seems to be a scope issue of some kind, because that statement works fine within each ParseLine sub, but the same exact statement throws an error if I define the QueryDef in an outside module (even if it's a public variable, which is what confuses me).

In any case, even if I don't get any more replies and don't think of any other improvements, your suggestion alone helped cut the run time/size by over 50%, so thanks jdraw!!!

jdraw · Feb 16, 2016

It would probably help if you posted some code so readers can see your work in context.
Or post a copy of the database.

It seems you're doing a lot of parsing, and that may just be a fact in your process, and not something
easily changed.

Glad you are making progress.

MarkK · Feb 16, 2016

Another thing that can speed your inserts is wrap them in a transaction. Inside a transaction, disk updates are cached so writes to the table happen in a batch. Outside a transaction, every update is explicitly written to disk. Consider this code . . .

Code:

Sub Test82q98741920347(InputFile as Object)
On Error GoTo handler
    Dim var
            
    DBEngine.BeginTrans [COLOR="Green"]'start a transaction[/COLOR]
    
    With CurrentDb.QueryDefs("YourInsertQuery")
        For Each var In InputFile.DataRows
[COLOR="Green"]            'in this loop we repeatedly execute inserts[/COLOR]
            .Parameters("p0") = var.Data1
            .Parameters("p1") = var.Data2
            .Execute dbFailOnError  [COLOR="Green"]'this insert is cached inside the transaction[/COLOR]
        Next
    End With

    DBEngine.CommitTrans    [COLOR="Green"]'commit all the inserts as a batch - faster[/COLOR]

final:
    Exit Sub

handler:
    DBEngine.Rollback   [COLOR="Green"]'rollback all the inserts - none of them succeed[/COLOR]
    MsgBox Err.Description, vbExclamation
    Resume final
End Sub

Cronk · Feb 16, 2016

Not all the routine code is provided in post #1 but it might be that the query is being re-created with each pass of the loop. This could be an explanation of slow performance, as well as bloat. Recording times before and after lines can show where delays are happening.

As indicated in the previous post, the query sql does not have to be regenerated in the loop.

Rx_ · Feb 17, 2016

Just as a point, is the back end an Access DB or SQL Server?
Is the logic and parsing integrated (mixed) in the SQL Code?
Calling a vba function for example in a SQL statement linked to SQL Server has implications.

Trying to Improve Performance Time/File Size of Modules

mulch17

Registered User.

jdraw

Super Moderator

mulch17

Registered User.

jdraw

Super Moderator

MarkK

bit cruncher

Cronk

Registered User.

Rx_

Nothing In Moderation

Similar threads

Users who are viewing this thread