Looking for Faster way to update Table - boolean field

Taking a step back, the core purpose of my database is to catalog different files on my computer. I start out by scanning specific directories and building a dictionary of all found files (I used to use a Collection for these found files, but the dictionary runs much faster).
I do it the other way around. I create a dictionary of all the files in the table. Then I iterate through the directories with FSO and
test - If ExDict.Exists(myFile.Path) = False Then ... - and write the file info to the table if missing.

edit: This post uses that method - https://www.access-programmers.co.u...port-of-directory-listing.315741/post-1743323
 
Last edited:
One question that hasn't been asked is how often will you want to do this? If it is seldom, or once only, or just an initial load then anything that does the job is OK.

Only if it is a regular occurrence for regular updates does seeking the best way makes sense. Also there is no hint at the number of files involved: tens, hundreds, thousands, tens of thousands?

I must admit the whole approach suggested at the start of the thread seems flawed to me. My archive database processing is achieved with File System object and Update queries. Similar I expect to Moke123, but without any dictionary step.
 
One question that hasn't been asked is how often will you want to do this? If it is seldom, or once only, or just an initial load then anything that does the job is OK.

Only if it is a regular occurrence for regular updates does seeking the best way makes sense. Also there is no hint at the number of files involved: tens, hundreds, thousands, tens of thousands?

I must admit the whole approach suggested at the start of the thread seems flawed to me. My archive database processing is achieved with File System object and Update queries. Similar I expect to Moke123, but without any dictionary step.
The original number was stated to be @ 40,000 files. For me it was a question of recursion. Hard to think there would be that many files in a single directory.
 
Correct. I'm using a recursive directory function on about 25TB of miscellaneous drives and directories. The frequency I run this function is perhaps every couple of weeks. the Database is essentially a personal multimedia catalog that I want to maintain as "current" relative to deletions, downloads, creations, etc.... I have several directories where the sources files live and the database gives me a direct way to search and launch the multitude of various files. It's basically a pet project I started back in 2011 and as my collection grew, my elementary and crude coding cobbled together from the internet has proven highly inefficient.

Hence this thread. The ideas presented are helping me as I grow to better understand the ins & outs of database design, use, and associated vba coding. I am NOT a competent and experienced programmer, but I do enjoy working and growing my understanding of such subjects.... Thanks for all the advise. I'm now slowly rebuilding the vba such that I build tables and queries rather than the multitude of dictionaries I had been building....
 
The paradigm I use albeit on about only 8% of the number files is two part process.

First iterate through the table where (assuming you have stored the full paths) checking every file listed exists ( using FSO.FileExist() and if not delete the record in the table.

Secondly, iterate through the relevant directories checking that a record exists (rst.findFirst) and if not creat a record using rst.AddNew, using the details from an FSO.file

This is all predicated having the full paths in the table and usage of FileSystem Object

As a throwaway observation this seems to work about 20% faster using UNC names rather than mapped drives. The whole process lasts about 6 to 10 seconds depending on nework traffic.
 
Last edited:
Cool. I have a bit of sophistication in my code (If statements to detect specific directories and excluding certain conditions), but perhaps I could simplify things a bit using your method. It may be another option to speed up the process.
 
Iterate twice?

I find it more efficient to write all relevant folders with its files into a table, you need one pass to enter new files anyway. Once I have a table, I can simply compare it with a table, a DB developer can write queries.

A worse choice:
rs.FindFirst... it's really worth it to transfer a table into a dictionary so that you can then query it quickly and specifically.
 
it's really worth it to transfer a table into a dictionary so that you can then query it quickly and specifically.
I've looped a table of 180,000 records into a dictionary and was surprised it took only 2 seconds. (using the universally accepted 1 mississippi 2 mississippi method of timing)
 
Iterate twice?

I find it more efficient to write all relevant folders with its files into a table, you need one pass to enter new files anyway. Once I have a table, I can simply compare it with a table, a DB developer can write queries.

A worse choice:
rs.FindFirst... it's really worth it to transfer a table into a dictionary so that you can then query it quickly and specifically.
I'll just say that I have tried many methods, including dictionaries, and this method has proven to be the quickest. Just remember we are talking of thousands of files.

I do iterations twice because they are separate logical processes. One is to remove extraneous records, the other to record new files.

Both writing to an extra table and using dictionaries are time consuming and unnecessary steps. Dictionaries are very slow and inefficient for more than a few 10s of records, as they reorder themselves as every record is added. (An exponential process!)

The iteration through the actual files is the slowest part of the process, and all methods described have to do this
 
I also base my statements on my own experience. An rs.FindFirst is also noticeably slow with large tables. In this case, it would be better to use Seek or the intermediate step via collection/dictionary.

When writing to a table, I would only set the index to be used later after the data has been entered.

The iteration through the actual files is the slowest part of the process, and all methods described have to do this
One should keep that in mind for now.
For larger quantities and performance considerations, one should then resort directly to Win APIs.
 
I'm using a recursive directory function on about 25TB of miscellaneous drives and directories.

You obviously have been making this work for a while, so I'll just add a little warning here. Recursion can be pretty effective for directory structures, but be very careful about how you implement it. True recursion is possible in VBA. I've used it for a family tree application. But the problem is that if you do true recursion - in the sense of a routine calling itself to do something at a deeper directory level before returning to a previous level by exiting from a subroutine - you risk running into the problem of stack overflows. (No, not the Stack Overflow web site... the actual error message.) If you ever get the "Stack overflows heap" (or equally bad, "Heap overflows stack") you will have a mess on your hands to clean up. When I was working with my recursive function, I got that a few times and it was NO FUN.
 
IMO the fastest way to update single record on a table with thousands of records is using the "seek" & "update" Methods in a indexed tabledef recordset (Only for MS Access BE)
 
The data provided by FSO is essentially an array. The data in a table is also an array. But the table has a whole language called SQL that lets you easily manipulate the data. You cannot easily match the two arrays so rather than load the FSO data into another incompatible array (dictionary or collection), why not just load it into a table as @ebs17 suggested. Then you can use SQL to match the two tables and SQL is pretty efficient.

Since you have no need to keep the current FSO data once it's been matched to your stored data, then load the temp data into what people call a side-end. It is just a separate database so you don't need to deal with bloat or having to compact.
 
This whole thread has been fascinating to me so I did a bit of experimenting with my Archive database tables. Dictionaries, temp files, findFirst, Seek etc.

There was actually little to choose between any of the method. Surprisingly using Seek instead of FindFirst gave only marginal improvement which I suspect was because of the length of the text entries in the index which are all complete file paths.

Turns out the simplest method and easily quick enough to not impact the speed of the process was to traverse the document folders and try to write a new record (.addNew) for all of the files and let the write (.Update) fail on duplicate records.
 
The iteration through the actual files is the slowest part of the process
Therefore, it would be beneficial to monitor the file system itself and to immediately record current changes and pass them on to the database. Something like this could help.
 
unfortunately, as i tried the code on a new db, it halts my db.
probably waiting to see if there is any changes on the folder.
i then added a file to the monitored folder and temporary the db wakes up.
since i have the code on the Timer event, the db again goes to limbo, waiting for
any activity on the folder.
 
Therefore, it would be beneficial to monitor the file system itself and to immediately record current changes and pass them on to the database. Something like this could help.

This would be a useful idea but not for me. I only run the complete system occasionally as a housekeeping and pretty much it seldom ever identifies any discrepancies. Mostly my Database Adds records as I add new files to my archive, and deleting a record deletes the associated file.
 
In this topic, however, we are primarily dealing with the task described in #1: marking the records in the table in which the associated file no longer exists.

Something missing can only be determined by comparing what is complete with what is present => a classic task for SQL, and for this you need two usable tables.
Or the file system itself reports immediately what changes have occurred.
 
In this topic, however, we are primarily dealing with the task described in #1: marking the records in the table in which the associated file no longer exists.

Something missing can only be determined by comparing what is complete with what is present => a classic task for SQL, and for this you need two usable tables.
Or the file system itself reports immediately what changes have occurred.

I think you are oversimplifying it. As I read it, it was also identifying the opposite - files where no record exists: and this has been the basis of most of the efficiency arguments(discussion). The missing files task you refer to is the simple part of the operation!
 
If the BE is Jet or ACE, you can use an "upsert" query which inserts and updates in one pass. Then you can run a second query to find the deleted files. I don't think the "upsert" works with SQL server but someone will correct me if I am wrong. So for SQL BE, you need three queries. New, match, deleted. The "upsert" or match/add queries would set the DateChecked to today. Then the Delete query would either update or delete (depending on your objective) all rows where the DateChecked is not today. You don't even need a join. In fact, if you choose to leave the "deleted" records in place but just tag them, you don't even need the "deleted" query because the fact that the DateChecked is older than the most recent DateChecked, then the file was deleted.

SQL is soooooooooooo much faster than any VBA code loop that there shouldn't even be a discussion about this, especially one that uses an intermediate dictionary or collection. You need the code loop to get the current list of files into a table but once that is done, the rest of the task is pretty simple SQL.
 

Users who are viewing this thread

Back
Top Bottom