Access keeps growing way too much

Dryseals · Dec 1, 2017

Need some help. I have just over 14,000 excel files to process, each work book has a sheet “Data1” which is a time and date stamped record of Boolean outputs on a system. The first two columns are Time and Date the remaining columns are the names of each Boolean output. What I am looking for is a change in the Boolean state, capture the time and data and record it in another table. I wrote a VBA program in MS Access where I would open the Excel app and import the sheet “Data1” into a temporary table. Then open the temporary table as a recordset and by using a few arrays, step through the records and record the data. It worked perfectly…….Until…..because the column count on each sheet varies and names change, I delete the temporary table before importing the new sheet. Each time it does this, the size of the Access files grows and it hit the 2gig limit and crashed after processing just under 900 files.
What I need is a way to pull this sheet “Data1” into memory somehow and manipulate it there with out creating a new table every time and then deleting it, but I just can’t seem to find a clean way to do this.
Any help?

pbaldy · Dec 1, 2017

You can perform a compact/repair to shrink the file periodically. Instead of importing, you may be able to link to the files and open your recordset on that.

arnelgp · Dec 1, 2017

create a New Database and import your
temporary table there.

now on your main db, create a Linked Table
from this New Database to the Temorary table
there.

Everytime you import, it will be importing
to the new db, therefore your main db will
not bloat.

moke123 · Dec 1, 2017

I have a sample on creating a Temporary DataBase at the bottom of this thread
https://access-programmers.co.uk/forums/showthread.php?t=297147. While there is very minimal bloat to the mysysnamemap table cause by linking and unlinking to the temp DB, it wont cause any front end problems like deleting temp tables.

Dryseals · Dec 1, 2017

Thank you!! I will give it a try.

The_Doc_Man · Dec 1, 2017

Your "database bloat" is due to the activity in your temporary table where you create and delete a bunch of records. Access doesn't re-use that space until you do the compact and repair operation. (Compaction is the part of that which helps you here.)

You've got some good suggestions from some good sources. I would like to throw out another idea but it might require you to do a little reading. On the other hand, you seem have at least some knowledge about some of it already. But I'll be a bit explicit just to be sure.

Research these terms:

Application Object (and more specifically) Excel Application Object

File System Object including the .FilesFound collection and the .Copy, .Rename, .Delete, and other methods associated with it.

Recordsets, including the .AddNew, .Update, .Close, and the OpenRecordset operation.

Collections (and traversing them or enumerating them and also testing their properties)

What I'm going to suggest will take longer but your "bloat" problem will be so drastically reduced that you'll hardly notice it.

First, using a File System Object, you can identify the 14,000 Excel files by setting up an appropriate directory search that will populate the .FilesFound collection. Having this collection, you can step through the "found files" one at a time.

Open your Excel Application Object, which will at first have no files. Also, this might be the right time to open the Recordset that you will use later to update your table with the change data you were seeking.

For each file in the .FilesFound collection, open that file (as a workbook) through the Excel App Object. That workbook will have a collection of Worksheets.

One of them will be the one you want. You can select a worksheet by name, so one of those worksheets will have the name "Data1" that you mentioned in your description. Make the desired sheet become the ActiveSheet.

The ActiveSheet contains TWO collections but based on your data description, you want the Rows collection, which is the set of all rows in the sheet. The .Rows.Count property tells you how many rows are populated. If there is a header row, you can read it to verify everything, test column headers, or whatever else you need to do. Each column of the spreadsheet is represented in the .Rows collection as a member of the .Cells collection, and each column/row intersection is a .Cell object. You can get the contents of each cell as a text string.

Since you are looking for changes, deal with your header row and then read in the first data row. (Note: If you know there is a gap between header and first data row, you can just skip that blank row.)

Then step through the rows and test each of the cells against your "reference" row for whatever changes you were seeking. You can see the columns (cells) with the date and time data. You can see the columns that changed. For each change, you can add a new record to your destination table using the Recordset. Do this in your loop until you step past the last row as noted by the .Rows.Count for that spreadsheet.

Now you can do a .Close on the workbook (using the option to not save changes). Your Excel App object is still open but idle at this point. More important, the workbook file is closed and thus can be manipulated using the File System Object.

So it is time for any file-end processing. You can use a second FSO to rename the file you just processed, or copy it elsewhere and delete the copy you processed, or just outright delete it. You need a second FSO because you are still working on the collection associated with the first FSO that you used for the directory search. But that's OK, you can have multiple active FSOs.

When file-end processing is done, go back and do the same thing again for another file. And when you have run out of files, close the Excel App Object and the Recordset, and then exit the subroutine that drove the whole mess. You don't need to do anything about FSOs - they aren't "open" except when you actually DO something.

That approach will have minimal bloat and should be fairly reliable as long as your data source is reliable in how it creates the spreadsheets. It isn't the easiest thing to do but it will process any number of sheets one at a time and never create a temporary object that could lead to bloat.

Dryseals · Dec 2, 2017

The_Doc_Man said:
Your "database bloat" is due to the activity in your temporary table where you create and delete a bunch of records. Access doesn't re-use that space until you do the compact and repair operation. (Compaction is the part of that which helps you here.)

You've got some good suggestions from some good sources. I would like to throw out another idea but it might require you to do a little reading. On the other hand, you seem have at least some knowledge about some of it already. But I'll be a bit explicit just to be sure.

Research these terms:

Application Object (and more specifically) Excel Application Object

File System Object including the .FilesFound collection and the .Copy, .Rename, .Delete, and other methods associated with it.

Recordsets, including the .AddNew, .Update, .Close, and the OpenRecordset operation.

Collections (and traversing them or enumerating them and also testing their properties)

What I'm going to suggest will take longer but your "bloat" problem will be so drastically reduced that you'll hardly notice it.

First, using a File System Object, you can identify the 14,000 Excel files by setting up an appropriate directory search that will populate the .FilesFound collection. Having this collection, you can step through the "found files" one at a time.

Open your Excel Application Object, which will at first have no files. Also, this might be the right time to open the Recordset that you will use later to update your table with the change data you were seeking.

For each file in the .FilesFound collection, open that file (as a workbook) through the Excel App Object. That workbook will have a collection of Worksheets.

One of them will be the one you want. You can select a worksheet by name, so one of those worksheets will have the name "Data1" that you mentioned in your description. Make the desired sheet become the ActiveSheet.

The ActiveSheet contains TWO collections but based on your data description, you want the Rows collection, which is the set of all rows in the sheet. The .Rows.Count property tells you how many rows are populated. If there is a header row, you can read it to verify everything, test column headers, or whatever else you need to do. Each column of the spreadsheet is represented in the .Rows collection as a member of the .Cells collection, and each column/row intersection is a .Cell object. You can get the contents of each cell as a text string.

Since you are looking for changes, deal with your header row and then read in the first data row. (Note: If you know there is a gap between header and first data row, you can just skip that blank row.)

Then step through the rows and test each of the cells against your "reference" row for whatever changes you were seeking. You can see the columns (cells) with the date and time data. You can see the columns that changed. For each change, you can add a new record to your destination table using the Recordset. Do this in your loop until you step past the last row as noted by the .Rows.Count for that spreadsheet.

Now you can do a .Close on the workbook (using the option to not save changes). Your Excel App object is still open but idle at this point. More important, the workbook file is closed and thus can be manipulated using the File System Object.

So it is time for any file-end processing. You can use a second FSO to rename the file you just processed, or copy it elsewhere and delete the copy you processed, or just outright delete it. You need a second FSO because you are still working on the collection associated with the first FSO that you used for the directory search. But that's OK, you can have multiple active FSOs.

When file-end processing is done, go back and do the same thing again for another file. And when you have run out of files, close the Excel App Object and the Recordset, and then exit the subroutine that drove the whole mess. You don't need to do anything about FSOs - they aren't "open" except when you actually DO something.

That approach will have minimal bloat and should be fairly reliable as long as your data source is reliable in how it creates the spreadsheets. It isn't the easiest thing to do but it will process any number of sheets one at a time and never create a temporary object that could lead to bloat.

Thank You The_Doc_Man, exactly what I was looking for.

I've been using Access now since version one back in windows on 386 machines, no formal training, school of hard knocks and lots of trial and error and I don't use it every day, but some how they found out I could do some things special with it and now its on my plate. I knew there had to be a better way. The other problem is I tend to shy away from excel VBA so I really don't know how to manipulate it very well.

This has been a task so far, the particular vendors software creates these event files, a .CSV, a .DAT and .XLS. The .DAT and the .CSV are useless, CSV only shows tagnames for the columns. The real info in in the .DAT as a continuous file with special values for record length and beginning/end of record I could determine with a hex editor (up loaded a few files to online hex editors and could see the pattern), but I do not have one and because of network security, they won't allow me to have one and I really need it to work through the bugs.

The "excel" file is created as a very old version of excel, and when you open it, it's full of errors, but I have gotten around that portion, I open it and save it as a newer version, shrinks it by about seven times the original size. I will try your method Monday, I think I can get it working.

Thanks Again

The_Doc_Man · Dec 3, 2017

Let us know if you can make it work.

Lightwave · Dec 3, 2017

Yep once you start getting to that larger scale automation you do need to get creative.

Sounds like you are well on your way. Be aware that programmatically implementing compacts and repairs is a bit tricky because Access can only perform a compact by closing the database which effectively stops whatever vba routine you are running. I would ensure that compact on close is selected in the database then call a vbscript to open the database after a wait and then close the database (which performs the compact). The opening of the database will have to be in a visual basic script which you call from within MS Access. Here is a link to something similar it actually calls a MS Access function within the database from a visual basic script running outwith the database. So it will be a start. Also slightly tricky as if you are half way through an import it might forget where you are after opening so you will have to figure out a way of recording where you are in the list of files and kickstart the import off again when you get back into MS Access - just requires extra coding though and sounds like you have a good understanding of working with recordsets.

with 14,000 I think you will need to close and open access a number of times and initially you will have to empirically balance how often you do this.

SQL express 2016 has a 10gb limit if you start reach limits of compacting and I think you can still run all the ETL scripts through Access but I expect you've considered this.

Call a compact function from visual basic script

Dryseals · Dec 5, 2017

Thank's to all, I did manage to get it working using just objects in Excel with out doing an import of the entire sheet, its much slower than doing an import and working it as a recordset. So I have two methods, one with the import which I limit it to 300 sheets, takes about five seconds to process each sheet, stops at 300 and I do a compress and repair.

The new method using the cells is a lot slower around 30 seconds per sheet but no bloat. Calcs out to about 83 hours of run time to complete this task, were as with bloat, its around 13 hours. Still haven't given up, I'll try method three today. I have to find a way to get the entire sheet into an array in one movement or move it to an array and do my comparisons once its there.

Thanks again and as always, any clues are welcome.

JHB · Dec 6, 2017

Have you considered the method of creating a temporary database suggested by arnelgp and moke123?
You could manage it from your main database, import Excelsheet into bundles of 300 Excelsheet, transfer data to your main database, delete data in temporary database, (or delete the database itself and copy a new one from a master database) and make a "Compact & Repair" and then retrieve 300 Excelsheet again and so on.

moke123 · Dec 6, 2017

bear in mind that you can create and connect to more than 1 temp database. It is a relatively easy process. Take a look at the modules in the sample I referenced previously.

The_Doc_Man · Dec 6, 2017

Talking about doing something to improve speed, we cannot hope to give you a faster way to do this without knowing just a little more about your data sets.

Let's take that description of the input sheets. You said you have a row with a date and time in it plus some number of Boolean inputs recorded in the sheet. You are looking for something to change in the Boolean columns. So...

1. Do all of the columns change exactly once in the data set or can the inputs toggle back and forth more than once?

2. Do all of the columns always start in a predictable state? (Like always FALSE, e.g.)

3. If the inputs toggle back and forth, do you need to trap each transition or only the first one or only the transitions to TRUE?

Knowing this, we might be able to help with something a bit faster than the method I suggested earlier.

JHB · Dec 6, 2017

What setup do you have, Windows system, Computer, RAM?

Dryseals · Dec 8, 2017

The_Doc_Man said:
Talking about doing something to improve speed, we cannot hope to give you a faster way to do this without knowing just a little more about your data sets.

Let's take that description of the input sheets. You said you have a row with a date and time in it plus some number of Boolean inputs recorded in the sheet. You are looking for something to change in the Boolean columns. So...

1. Do all of the columns change exactly once in the data set or can the inputs toggle back and forth more than once?

2. Do all of the columns always start in a predictable state? (Like always FALSE, e.g.)

3. If the inputs toggle back and forth, do you need to trap each transition or only the first one or only the transitions to TRUE?

Knowing this, we might be able to help with something a bit faster than the method I suggested earlier.

1. Do all of the columns change exactly once in the data set or can the inputs toggle back and forth more than once?
Back and forth at any time

2. Do all of the columns always start in a predictable state? (Like always FALSE, e.g.)
No.

3. If the inputs toggle back and forth, do you need to trap each transition or only the first one or only the transitions to TRUE?
Yes

This is a Safety PLC, I'm in a chemical plant making explosive products. Part of the permits to operate require us to follow rules established by IEC International Electrotechnical Commission and ISA International Society of Automation. These safety functions are all results of a long drawn out process to derive at RRF number, risk reduction factor, which by selection of equipment we try to mitigate any faults and prevent catastrophic events from happening.

Part of the over all calculation used also leans on the amount of times the system has a "Demand Event"...too many Demands and we have to redesign and recertify the system, statistics say too many demands means an eventual failure of the safety system and oops.

Each column is either an input/output/or Memory point. Its monitoring the function of a safety PLC. Each row is a time interval of 200 milliseconds with a time and date stamp.

Here's the trick, only a selected group of triggers are used to indicate a "Demand" And that demand says our process was out of it's limits and we had to provide a safety interlock. But, that is not always true, an oddball bit can start the interlock process and the end result appears to be a demand but may not be.

When a "demand" is seen, the system has a program tracking all the points (columns), five minutes before the event and two minutes after the event at 200 millisecond intervals and generates this excel file. They were having some problems with on of the units and couldn't find the problem. I took a look at the file, figure 184 columns of ones and zero's and 2000 rows of them switching back and forth and truing to determine who was the real fault. So I wrote this little program to capture each time a bit changed state and we found it which turned out to be a loose connection in a limit switch, only toggled for one scan but started a domino effect. Thats not considered a "Demand" due to process conditions, thats an equipment fault, goes into another category.

No good deed goes unpunished, next thing I know they hand me a portable drive with just over 14K files, a few years of history, process these and let us know the results of each.

Sorry for the long winded post, just thought some back ground might help. So far I'm just using the bloat program and stopping it after 300 files. The excel was working with no bloat but I kept getting errors opening and closing the excel files and couldn't work my way around it so after three days of hair pulling, (which this old head can't afford) I gave in and went back other way.

I'll post the code tomorrow and tell you the error, maybe we can figure it out, I'm not a learned programmer, so please don't laugh at my coding style. I'm an ex field grunt who does programming on control systems which is far different and mainly in proprietary and basterdized control languages.

I do appreciate the help and I have learned how much I really need to learn.

The_Doc_Man · Dec 8, 2017

OK, so you have multiple and variable columns, multiple transitions, starting from indeterminate states, and each transition is required to be noted. The worst of all possible worlds, but here goes nothing.

Using some of the other techniques mentioned to create a database on-the-fly will help.

Open up a File System Object to find the list of Excel files to be analyzed as previously discussed using the .FilesFound collection.

Create a database external to your primary app. You now have 2 GB to play with.

Using an Excel Application Object, analyze the next workbook to determine how many columns you will need. Create a table in the external database with fields for your date and time info and the right number of Boolean columns.

The table name is arbitrary and in this method I am suggesting, SHOULD not match the name used for other tables. If there are no special characters in the name of the workbook file, that could be the table name right there. Now that you know the number of columns to be processed, close the App object and import the spreadsheet to the table you just defined, which is of course empty at the moment.

Create an SQL query with the date, time, and the right number of Boolean fields. Include an ORDER BY on date and time so that when you open the query you would see the records in strict chronological order.

Open a recordset to this query.

For each Boolean field in turn, do a .MoveFirst and note the value of the current Boolean field for the first record. This is your reference value for this iteration.

Now do successive .FindNext operations where the column in question is NOT EQUAL to the reference value. At each successful find you note the date and time and value of that column. Reset (flip) the reference value and continue doing .FindNext until you hit the .EOF marker.

Step to the next field, do a .MoveFirst, and repeat the cycle until you have no more columns to play with. At this point, you are done with that query. Since I am suggesting that you use the file name as a table name, there is no need to delete the individual table here. Just be sure that you close all references to it.

Close the external database for just a moment. Test the size of the external database using a File System Object. If it is still less than 1 GB, re-open it and continue. Otherwise, delete the whole file, create another external but empty database, and then continue.

Now take the next file in sequence. Repeat the above sequence until you run out of files or something blows up on you.

I am thinking that using the .FindNext one column at a time is going to be a lot faster than reading each row of the spreadsheet because unless you have a column that flip-flops every other row, you probably will have long sequences with the same value, and the .FindNext code should be pretty fast for that operation.

I'm NOT SURE whether linking external tables to individual workbooks will be as fast because I recall there being a speed issue with linked tables. Not to mention that you would STILL need to use the Excel App Object to know how many columns were in each file to which you were linking because you would still have to build the table description. From your prior discussion, you don't necessarily have a constant format in terms of the number of columns.

Dryseals · Dec 14, 2017

Just a follow up, its running like clock work. I was pulled off of it for a few days but got back on it yesterday and found a few things that needed attention.

A simple Workbooks open and close was all I needed. I was using the excel.application and having troubles.

In a nut shell, do a workbook open, find the sheet range, Redim the array and pull the entire sheet into an array. Then using a couple of For/Next walk column by column comparing the Row X-1 to the Row X . If the bit is different, grab it and document it. Then once finished close the work book and loop back to the next file. And the big winner, no Access bloat, just ticked along.

Took just over 90 minutes to process 10920 excel files, probably could have been faster but I had it updating the screen with the current excel file count.

The main thing is it worked so well we were able to view the sequence of events on every file in a few short minutes and identified tons of errors we now need to address as far as the programming in the system and the way in which we capture the data. And all with just a page of code....I love this work

Thanks for all the help, think I'll hang around and learn some more

The_Doc_Man · Dec 14, 2017

Ah, an array method. Not a bad option and unlike database records, array memory IS reclaimed for re-use. As long as your worksheets are small enough, this should work fine and should be fast enough. 10920 files in 90 minutes sounds decent. Congrats on finding a viable solution.

Access keeps growing way too much

Dryseals

New member

pbaldy

Wino Moderator

arnelgp

..forever waiting... waiting for jellybean!

moke123

AWF VIP

Dryseals

New member

The_Doc_Man

Immoderate Moderator

Dryseals

New member

The_Doc_Man

Immoderate Moderator

Lightwave

Ad astra

Dryseals

New member

JHB

Have been here a while

moke123

AWF VIP

The_Doc_Man

Immoderate Moderator

JHB

Have been here a while

Dryseals

New member

The_Doc_Man

Immoderate Moderator

Dryseals

New member

The_Doc_Man

Immoderate Moderator

Similar threads

Users who are viewing this thread