Performance issue on Append with Data Macro

Roger Reinsch

Registered User.
Local time
Yesterday, 17:48
Joined
Sep 3, 2012
Messages
18
I'm redesigning a database that had 1 master table and 5 dependent tables - all with one to one linkage. The 6 table structure was to save space by not creating rows in the dependent tables when that sort of data was absent.

The queries, while not hard, were complicated in that they all had more than one table and needed Left-Outer-Join linkage most of the time. I got advice on this forum that it made more sense to have all one table. So that's what I've done this go.

With separate tables, I automatically had a way to time stamp changes made to those sections. If Access wanted to write it, it had changed. Period. So a simple data macro that did Setfield LastChange Now().

To get that same level of change notification with the single table, there are lots of SetLocalVar Changed Old.Field <> Field or (old.field is null and field is not null) or (old.field is not null and field is null) {author note: I hate nulls} to detect changes. Then the appropriate set of these "Changed" Local variables are ORed together in an IF statement to put a new timestamp in the appropriate field.

The new data macro {I really wish these could be written in VBA} is about 1500 lines in "pretty" XML. I've tested it with a form that shows all the fields and when any field changes, the timestamp does get set in the right place.

But now for the performance problem.

For testing, I'm doing an append query to an emptied out table. The first hundred or so rows go in at about 60 a second. I can live with that. To populate the table with production data of 125,000 rows would take 35 minutes or so. During testing I'm appending about 1700 rows and the last rows are going in at 2 (two) per second. This is not OK. Here are more numbers.

Row 100 - 60 per second
Row 200 - 20 per second
Row 300 - 12 per second
Row 400 - 8 per second
Row 500 - 6 per second
Row 600 - 5 per second
Row 700 - 4 per second
Row 900 - 3.5 per second (average near row 900)
Row 1100 - 3 per second
Row 1300 - 2.5 per second
Row 1500 - 2.1 per second
Row 1700 - 2.0 per second

I don't know what's going on. It will take to the end of time for the production data to load. Any thoughts would be appreciated. Thanks. Roger

[Access 2013 (32 bit), Win 7 Pro SP1 (64 bit), 16GB ram, Intel i-7 3930K]
 
really depends on what your macro is doing - if it is referencing the whole dataset each time then it will slow as the dataset gets bigger.

I presume your table is properly indexed - lack of proper indexing can have a huge effect on performance.

I've investigated using data macros and personally came to the conclusion that although they mimic the sort of thing you get in sql server, they don't do it particularly well - and from my point of view almost impossible to document, so I have stayed with trusty vba to do what is required.

Without knowing what your macro is actually doing it is not possible to suggest alternatives
 
Roger,

It is rare to see a database with 1 main table and 5 related tables all in 1:1 relationships.
Similarly, it is rare to me to have a database with only 1 table.
Perhaps, if readers knew more about your database and the underlying "business" is is intended to support, other design options might be suggested.

We know nothing of your business, tables, structures or design. More info please.
 
Neuter the data macros and time the append operation only.
 
In reverse order:

spikepl - With the field procedure reduced to one SetField Now() 1700 rows were appended in less than a second. The timestamp is the same for all of them.

jdraw - Short answer, we integrate 125,000 flatfile rows we get from the county (registered voters) and augment that data with extra information we have for about 1700 names. There are several support tables and queries to build the main table. We provide answers to "is John Doe registered?", produce lists for phone banking (those annoying calls you get all the time during elections), and build queries on demand for adhoc questions. The "Requirements Document" stresses flexibility above all else. I'm preparing a longer version of that story now.

CJ_London - I agree it looks like the whole record set is being referenced for each row added. That's what's slowing it down. But I don't think data macros can look at more than the one record they are triggered for. But it really does look like some side effect is based on how much data is there. For this load operation, I actually removed all the indexes except the primary key (auto generated) to speed things up. I much prefer VBA for this but data macros have to stay as macros. You can't even convert them to VBA.

I don't have any "LookupRecord" actions. I have lots of "SetLocalVar" and "SetField" actions. Nothing else. How are these dependent on the size of the file/record set?

Roger
 
Where is this data used -- online/ desktop/ lan???
Tell us more about several support tables .

Give us 5-10 samples of what you are bringing together.
Use names like Porky Pig etc if you have to maintain anonymity. Towns like Smithville or Centertown is fine. We really need to see some data and how it is to be used.

Good luck.
 
Where is this data used -- online/ desktop/ lan???
Tell us more about several support tables .
Give us 5-10 samples of what you are bringing together.
Use names like Porky Pig etc if you have to maintain anonymity. Towns like Smithville or Centertown is fine. We really need to see some data and how it is to be used.
Good luck.

I am resistant to generating a bunch of fake data as that is not going to illustrate the problem I'm having with performance.

Instead, I've attached the table definition and a screen shot of the "expert" form that is used to create new entries by hand. Each of the fields on the form has tool-tip hover text that describes what goes in each. There's no way to capture that on the screen shot. The way it works is you enter parameters in the upper left to produce a reasonable list at the upper right. When you find the one you want, just clicking on the row will populate the sub form below with ALL of the data in the YCRC record/row.

The "description" column that is included in Design View of the YCRC table is NOT included by the documenter tool. Is there a way to query the internal tables that Access uses to maintain its definitions of things? Then a better report could be produced.

I've also included the whole data macro in two parts. The .docx file was 21K. The .doc version was 138K - so had to be broken into 2 chunks.

It's clear at this point (having a minimal data macro) that it isn't the macro per se that's causing the problem. But rather something in side it. I'll do more selective pruning and experiment to see if I can find something specific that is causing the problem.

Please let me know if you find something in the macro. That has to be where the problem is. Thanks.

Roger
 

Attachments

  • YCRCtableDefinition.txt
    YCRCtableDefinition.txt
    12.4 KB · Views: 166
  • ScreenShot4Forum.PNG
    ScreenShot4Forum.PNG
    86.3 KB · Views: 167
  • BeforeChange data macro for the YCRC main table PART 1.doc
    BeforeChange data macro for the YCRC main table PART 1.doc
    89.5 KB · Views: 152
  • BeforeChange data macro for the YCRC main table PART 2.doc
    BeforeChange data macro for the YCRC main table PART 2.doc
    74 KB · Views: 157
I am resistant to generating a bunch of fake data as that is not going to illustrate the problem I'm having with performance.
Well, I'm resistant to reading huge amounts of code with no idea what it is supposed to be doing, let alone what it is actually doing (other than getting slower).

Can you not just explain in simple terms what it is supposed to be doing that can then be related to the code.

Also consider taking screenshots of you macro rather than the code.

Also to clarify, which version of Access are you using and is this a web app or windows based db?

At the moment it feels like you have developed an extremely complex solution which could be achieved very simply
 
CJ_London,
OK. I'll try for concise and clear and include screenshots of the macro edit window.
For the easy questions, I'm using Access 2013 (32 bit - downloaded as part of Office 365), Win 7 Pro SP1 (64 bit), 16GB ram, Intel i-7 3930K. The application is completely desktop based, no lan, no web, no sharepoint, no cloud. Bulk data transfer is via optical media or thumb drive.
I really do want to provide what you need to be able to help. More tomorrow. Thanks for keeping at it.
Roger
 
Still don't know what it is supposed to be doing - inserting data into a table? transfers from optical media/thumb drive will be slow - suggest as a first step, copy data to your c drive.

If data is not in an access file then it won't be indexed so would be better to add you data into a temporary table and index as required (i.e. those fields used in your criteria)
 
CJ_London,

I have attached a screen shot of a segment of the macro. This segment deals with addresses and copying Residence Address into Mailing addresses (to make creation of future mailing lists easier). There are lots of SetLocalVar steps just to see what has changed. If a function like "IsDirty" existed that part would be simplified tremendously. The macro checks if (a) any fields have changed, (b) lists which ones have, (c) propagates mailing address if needed, and (d) timestamps the Address section and the Whole Row for any rows changed.

I have attached a screenshot of the "expert" form filled in for one Fake person. This shows the sort of information that goes in each field including the "what Changed" and timestamps. The timestamps at the top reflect when major merge/sync operations have occurred.

I have also attached a one page Long Description of the "Business" of the Yavapai County Republican Committee and how the database is used.

I do copy the optical and flash drive data to my "C" disk before processing. It's a 1TB SSD drive. Some spreadsheets are processed with Excel before being imported. For example, sheets with merged cells can't be imported. So while Excel is required to be in the process, some of the field level processing is done in Excel. Then they are imported.

In the new design, these are some of the "auxiliary" tables. They are really only used to merge the information they hold into the big main table. The day to day and "instant" response work is done on just the main table.

The long form explanation describes the setup as "Excel on Steroids" to provide comfort to our volunteer users. Under the covers, there are indexes that speed up their queries - and they don't even have to know or care about them.

I have tried to streamline (rather than kill) the macro. I have taken out all the "what things have changed" logic and it didn't help much (20% or so). I think most of the compute time is burned just seeing what has changed. And that should be the same for row 100 and 1700. The only thing I can think of that would grow as the append progresses would be the number of locks held. I can't see anywhere to specify no-locks any lower than the form level. I can't find anything on the table definition or query definition that would allow me to set it.

I still don't see anything that should cause the slow down. One possible clue is that while it's working, the little busy wheel stops spinning after a few seconds. Then in a few seconds more, the screen get noticeably refreshed and the wheels turns again - for a while.

I know some database implementation details from mainframe days.

First, when utilities were doing restore type operations, they told the database engine to use no locks - for two reasons. Reason one was that they weren't going to "commit" the work they were doing until the end so there would be a huge number of locks clogging the lock manager. Reason two is that the utility had exclusive use of the whole table so record locks weren't needed.

Second, they allowed for "hints" about how large this database was going to be. This had to do with allocation of space and the use of Primary allocations and overflow space. Access to data in overflow space was lots slower and the more increments of overflow space that were used, the worse it got. It was also used to recommendations on indexes - indexing small tables was counter productive. I don't think this applies in the current case since the exact same number of rows of data were appended in less than 1 second with the minimal macro.

So I'm still at a loss. I'll start pruning with a chainsaw. If I cut out half of the macro and it get twice as fast (half as slow) I'll know something. And if there's a built-in function to determine "dirty" that would certainly speed thing up. Maybe going back to 6 tables with cascaded delete would be better. Access knows somehow to not write back the rows that have not changed. And I can create a query for the one big table. And no one will ever access the individual tables. As you might be able to tell, I'm grasping at straws.

Roger
 

Attachments

  • LongExplanation.doc
    LongExplanation.doc
    32 KB · Views: 146
  • MacroSegment4Forum.PNG
    MacroSegment4Forum.PNG
    75.9 KB · Views: 165
  • FilledInScreenShot4Forum.jpg
    FilledInScreenShot4Forum.jpg
    102.3 KB · Views: 157
I'm now away on business for a few days so cannot respond in detail. I

From the description so far as I can see all can be done using SQL and run in a few seconds so not sure why you are using a data macro which is triggered after each and every record insert/update or whatever. Using sql would be significantly faster - typically you would have 2 or 3 queries

one to insert new records

one to update records where there is a change

and perhaps one to delete or 'mark as deleted' where they no longer in the import source

some comments on your document

Due to this, the change history and timestamping has to be attached directly to the data – as macros – rather than VBA code in forms
Why? why not just set the timestamp as a default value - and why can you not use vba in forms - what do you think people did before data macros became available?

From what you have described in your document e.g.
The person’s membership on other republican clubs (5 or 6 clubs), [700]
The things the person has volunteered to work on – Voter registration, phone banking, fund raising, etc., [0 yep zero – the volunteer coordinator did her own thing.]
The public offices the person holds and has held. [150]
and in your first post

To get that same level of change notification with the single table, there are lots of SetLocalVar Changed Old.Field <> Field or (old.field is null and field is not null) or (old.field is not null and field is null) {author note: I hate nulls} to detect changes. Then the appropriate set of these "Changed" Local variables are ORed together in an IF statement to put a new timestamp in the appropriate field.
says to me your data is not normalised - as a minimum I would expect the above three items to be in their own tables

e.g.

for the first table

tblMemberships
MembershipPK
PersonFK
ClubFK

and perhaps a couple of date fields to indicate when they became a member and when they stopped being a member

For the new implementation, the idea is to blend our 1700 records with the 125,000 records into one table
The users want to see thinks that look like spreadsheets – for comfort.
You would just do this with a query when required

So in summary, I think you need to rethink your database design, a single table for what you want to do is not the way to do it

 
I appreciate all the time and effort you're putting into my situation.

I have attached the manual input form for the prior implementation.
It was normalized as you suggest and had a single time stamp applied to each section every time that particular record was changed. Access knew which parts had been changed and only wrote those out. The "freshness" of each kind of data was determined by the time stamp.

My statement: "The users want to see thinks that look like spreadsheets – for comfort."

Your comment: "You would just do this with a query when required."

I absolutely agree. However, the operative word in your comment is "You". I am the only one in the active committee that knows anything at all about database and SQL. I am in the loop for every single request. This results in too little use of the data we have (IT is too slow!) and more time than I have to keep up with it. What happened in the old design was a few queries were produced to create extracts into Excel. From then on, data got updated in one extract and not in the others. Our customers would complain that their updates got lost - It was updated for the committee records but not the donor records. And so on.

I am trying to dumb down the database enough that "less skilled" people can use it directly for truly ad hoc queries. It is seriously de-normalized - on purpose. I'm trying to implement a lingua franca that bridges the gap between Excel spread sheets (which people often think of as databases) and a better database that allows powerful queries. In the query builder they can pick the fields they want, understand that ANDs go across and ORs go down in the query builder and check their result in datasheet view. It is (hopefully) a small enough add on to the Excel skills they already have that they can do it on their own and stay on the database. Building forms with queries is another step reserved for more advance users.

I really like Visual Basic. I've written applications in Visual Basic, Visual Studio's flavor of Visual Basic, VBA on top of Spread Sheets, VBA imbedded in Word Documents and VBA for forms in Access. It's nice, very nice. But it isn't supported at the deeper levels of the system.

Thanks again. I'll continue my testing to see what's really going on before I hack it apart again. I'll post what I discover.
 

Attachments

  • OldDatabase4Forum.PNG
    OldDatabase4Forum.PNG
    89.1 KB · Views: 134
Good luck - I really think you are build a rod for your own back by going down the single table route.

I would keep it as multiple tables then create one giant query for your users which combines all the data together and exports to a workbook. I wouldn't give them direct access to the tables at all - only through forms
 
Just to say I concur with CJ. Exxxxtremely rare to see a 1 table data base. But you know your requirements better than readers, and you are familiar with the database you are redesigning.
Good luck. I'm sure readers would like to hear how it all turns out.
 
I've done many more trials with macros. Mostly they're a bad idea. And the bigger they are, the worse they are. It doesn't really matter too much what the logic is that's making the macro large. Access seems have have trouble just working its way through I'll call it "text" - the characters that make up the macro code. This is true during editing as well as execution.

There also seems to be a memory leak... Between two of the runs the commit level of memory went from 423 MB to 574 MB. It kept more after every trial. There's 16 GB available on the machine but it didn't use it well. The CPU utilization peaked at about 9%. Not so good.

Based on CJ's suggestions, I have done more experiments using Now() as a default value for a timestamp. That works fine. And would work better if I keep each of the sections separate...

Also, I have also experimented with append queries that have an expression column that puts Now() into the right places. This works fine too.

I also have experimented with VBA code to stamp Now() into fields when other fields are changed. This has worked for both single record at a time forms and forms that are lists of records (sort of like spreadsheets). This is good, very good.

The old design, with separate tables for each kind of stuff, had 0 or 1 occurrence in the extra information table. The idea was to save disk space by not having a bunch of empty fields. Disk space is free these days - for the size database I have. So why not just use it. CPU clock rates have leveled out. So that's what I want to conserve.

There are two major types of solution with two different CPU loads:
One: have one giant table with 95 fields and build queries directly on that or
Two: have 6 tables with a total of 105 fields (extra PK and FK references) and have Access assemble that query (to hook things up) under every other query we would write to do our work.

It would seem easier (less CPU) if it didn't have to do all the extra joins.
I get the impression that Access keeps all the data for one row/record together (except for the huge memo fields). And if each cache block has more useful data in it, that would seem to be an advantage. If there are multi-field indexes that would speed up query processing then I think all the parts have to be in the same table - tables get indexed, not queries.

I really do appreciate the help you two have provided. And please let me know if I'm off base on the CPU conservation approach.

Roger
 
Access has a 2GB size limit. When you create a table, query or form or macro etc., do some testing, then change/edit/redesign that object Access is in charge of the edit/delete/redesign. The old version is "garbage" and the space it used to occupy is no longer available. The more edits/deletes/changes, the more "garbage" and space that is no longer available for your use.
The Compact and Repair utility is the mechanism that is used to "collect and remove the garbage" and give as much space as possible back to you and Access. So during development, when you are creating/redesigning/deleting things, it is expected that your database size will grow, and the developer is expected to do a Compact and Repair to recover this unavailable space.
 
Access has a 2GB size limit. When you create a table, query or form or macro etc., do some testing, then change/edit/redesign that object Access is in charge of the edit/delete/redesign. The old version is "garbage" and the space it used to occupy is no longer available. The more edits/deletes/changes, the more "garbage" and space that is no longer available for your use.
The Compact and Repair utility is the mechanism that is used to "collect and remove the garbage" and give as much space as possible back to you and Access. So during development, when you are creating/redesigning/deleting things, it is expected that your database size will grow, and the developer is expected to do a Compact and Repair to recover this unavailable space.

That's good to know. I'll run that utility more often. Access never experienced a hard fault so NO paging was going on. But the cache probably wound up with the garbage and good stuff all mixed together. Cleaner should be faster - at least a little.
 
OK. The votes are in: Macros on Data: 0, Regular SQL as defaults, in queries and in forms: 3.

I've done the final load (until the data changes) of the 125,000 row table. The append query had 6 expressions putting Now() in columns, 5 expressions recording the source of the data (a string constant) and 80 or so columns of data from a source table. It took 7 seconds. Call it 18,000 rows per second. Wow.

7 seconds and it looks like Access never came up for breath during the whole append. All the time stamps in all 125,000 rows are the same. My guess is that Now() is evaluated at the time the query is "compiled" and is treated as a constant during execution. That's just fine. Saves lots of CPU time. Data macros would not have the context provided by the query and would have to do each one separately. That may also be true for using Now() as the default value. But I now know putting in a query works great.

Thanks again JDraw and CJ_London for leading me out of the swamp and into the sunshine. :D
 

Users who are viewing this thread

Back
Top Bottom