Recursive file listing help, Please

notarat

Registered User.
Local time
Today, 15:49
Joined
Dec 8, 2008
Messages
15
Greetings!

I just signed up and performed a normal, then advanced search, and combed through the 70+ threads that were returned in the search results. None of them really answered my question, so I am posting in hopes of finding some help. (Either my terms weren't spot-on or the topic hasn't been covered yet)

I am posting my question in hopes of getting some help.

I have several large shares on my home network that contain about 5TB of data total in about 1.4 million files. It's all the home movies and pictures and stuff for my 60+ member extended family for the last 3+ decades. (Kodak loves us)

From a file management standpoint, it's more than a little cumbersome to manage that many files so I want to automate the process of gathering the information.

I currently have a rudimentary MS Access 2003 database in which I programmed a Macro to change drives to the shared drive and call the DOS command:

Dir /S /N /Q /T:W /4 >d:\MyFiles\Input.txt

That command gives me the following information and pipes it to a text file:

File Name
Size
Last accessed Date/Time
Owner
Full pathname to file

I then take that text file and manipulate the crap out of it to pull the information out and populate several temporary tables that hold the information I need, and it then populates the "final" table with the information above, AND the pathname to each file.

It's a very cumbersome method that takes 25+ clicks of the "ok" button as I yank various bits of information out of the text file, then put it into temporary tables, then populate the "real" table from the various tables.

I understand there is a method to pull the information straight from the filesystem(?) to populate the table from within MS Access itself so I do not have to deal with the major manipulation of the DOS text file.

What I want to do is have a button in my MS Access table that calls the filesystem and recursively lists all files and folders on my shared drive to a table, skipping all the DOS crap, and lists the following file & folder information:

File Name
Size
Last Modified Date/Time
Last Accessed Date/Time
Owner
Full pathname to file

I found a great little MS Excel macro that does this for a spreadsheet, which I attached, but I have too many files (well over 1 million files)

I found a script at Allen Browne's great site that grabs some of the information but I can't seem to get it to work to grab the rest of the information I need. The date field in his script seem to default to Today's date (If I follow his instructions) and no date at all if I leave the Default Value field blank in the Field Definition. That leaves me with only the file name and path. I need the Size, Last Modified Date/Time, Last Accessed Date/Time, and Owner information as well.

I really need that extra information to help determine when files can be archived to DVD and sent to my relatives because I am running out of disk space, lol.

Can I get some assistance on making the script grab the rest if the information I need, please? I'm not allowed to post the link because I don't have 5 posts or more, so I have to point you to the post in a round-about manner.

There is no www in the link below.
The script is at allenbrowne dot com/ser-59alt.html
 

Attachments

Let me follow up by saying I wouldn't mind too terribly if the script I included could be modified to create a CSV file, instead of a spreadsheet. I could live with that, but I would be happiest if I could populate the table straight away rather than use an intermediate step.
 
Can't you simply retrieve this information from a Scripting.File object? And I'm not clear: Do you need help to collect more or different information than is available via a scripting.file object, or do you need help writing a recursive procedure that can dig down through a hierarchical directory structure and plug data straight into an Access table? Or Both?
 
Be aware that that recursion consumes a lot of memory. If your recursive procedure runs into out-of-memory problems, you can probably create a stack to simulate recursion, but this takes a little doing.
 
How do you simulate recursion? How could some difficult simulation of recusion consume less memory?
 
How do you simulate recursion? How could some difficult simulation of recusion consume less memory?
I'm speaking of my experience in VB.Net and C#. I'm not entirely sure whether VBA is structured in the same way. That's why I said "probably."

In .Net, memory (RAM) is divided into two categories, the stack and the heap. The stack is a small amount of memory allocated at runtime for simple variables called value types (such as Integer, Byte, Char, and Long). Operations with these types execute very quickly. However, .Net assumes that complex variables (such as a customized Employee class) would exhaust this memory so it moves them to your larger memory store (the "heap").

By default, recursive procedures make use of "the stack", at least to some extent. Each recursive call to the procedure pushes another instance of it onto the stack, potentially exhausting it. The stack isn't used only for speed - but also because, logically, a stack is the natural way to keep track of, and therefore evaluate, the multitude of recursive calls.

This means two things. First, you can simulate recursion by implementing your own stack (an array can be used as a stack). Secondly, if your array is such that the runtime puts it on the heap, you are less likely to run out out of memory. Of course, heap operations tend to run slower than stack operations.
 
This article seems to imply that VBA uses stack and heap (maybe all modern programming languages do so):

http://www.fmsinc.com/MicrosoftAccess/Performance.html

I think the first thing to try, though, if a recursion exhausts memory, is to convert your variables to a class (classes are put on the heap). If that fails, then try to create a stack to simulate recursion.
 
are you happy with recursion - you can write efficient stuff with recursion that real hard to do any other way - eg, scanning multiple level of folders

this might be useful - this sub is a recursive algorithm to scan all files in a tree, (code of the top of my head, but it should be OK) but the vba dir function doesnt return file attributes, so you need to get these in another way, from the file name - if you havent got too many levels of folders this shouldnt cause stack overflow


Code:
sub listfiles(folder as string)
dim fname as string

fname=dir(folder)
while fname <> vbnullstring
 if isfolder(fname) then 'function - is this file a folder
   listfolders(fname) 'call this sub again, one level down the tree
 else
   extractinfo(fname) 'this is a file - extract any info you need
   fname = dir()'get the next file 
 end if
wend

end sub

sub main
'process ther file tree from any given folder
listfiles("rootfolder")
end sub
 
Can't you simply retrieve this information from a Scripting.File object? And I'm not clear: Do you need help to collect more or different information than is available via a scripting.file object, or do you need help writing a recursive procedure that can dig down through a hierarchical directory structure and plug data straight into an Access table? Or Both?

I haven't coded in VBA before, so I'd appreciate any help you can provide helping me write the code.

Even something as simple(?) as modifying the script I included to just output the results to a CSV file (that I could then macro on my own to be imported) would help me greatly!

My current method works, but because of the sheer amount of queries involved (make table, append, update, delete) makes the current process a pain in the rear because I cannot script a way to automatically click the "OK" button. I'm forced to sit there and click it nearly 30 times as this thing runs.
 
Be aware that that recursion consumes a lot of memory. If your recursive procedure runs into out-of-memory problems, you can probably create a stack to simulate recursion, but this takes a little doing.


If I understand you correctly, you're saying that if the structure is very nested and there are a lot of files involved, the script could run out of memory and crash.

I'm currently able to grab the data I want from the DOS script. My only complaint with it is that I have to do an inordinate amount of parsing to get the path name information out of the txt file because of the way the text file is formatted.

If there was a way to get reliable pathname information (along with the other fields I need) from the DOS command, I'd use it. I simply do not know enough about VBA yet to be able to do this without help. :(
 
The example Gemma gave should get you started, maybe I'll expand on it a bit, if I have time.
 
Are all the network drives mapped to local drive letters on the main machine?
 
Nevermind the local drive letters. I think you just need to make a list of the computer names and share names on the network - I'm guessing we can use UNC paths like this:

\\Computername\ShareName

where sharename is the folder representing the Cdrive (do this for all drives where data is stored).
 
Last edited:
Once you have the list of shares, call it like this (for each share):

getAllFilesInThisFolder "\\Computername\ShareName"


Private Sub getAllFilesInThisFolder(ByVal pathToFold As String)
Dim fold As IWshRuntimeLibrary.Folder
Set fold = fso.GetFolder(pathToFold)
Dim fil As IWshRuntimeLibrary.File
For Each fil In fold.Files
Dim fileName As String
fileName = fil.Name
'INSERT this info into the table
' fil.DateCreated
' fil.DateLastAccessed
' fil.DateLastModified
Next fil
Dim subfold As IWshRuntimeLibrary.Folder
For Each subfold In fold.SubFolders 'recursion
getAllFilesInThisFolder (subfold.Path)
Next subfold
End Sub

I haven't written any code to do the actual INSERT into the db. I don't know the most efficient way to do this - I don't know, for instance, if recordsets are very reliable with a million records. I'd probably use a Command object, with parameters, but then I'm not very experienced so could be wrong.
 
Sample code for the INSERT:

'global variable - at the top of your code
Private cmd As New ADODB.Command


'put this code in the calling sub (away from the recursive sub)
'three parameters
cmd.ActiveConnection = CurrentProject.Connection
cmd.CommandText = "INSERT INTO tblFiles (LastModified, DateCreated, DateLastAccessed) VALUES(?, ?, ?)"
cmd.Parameters.Append cmd.CreateParameter("?", adDate, adParamInput, 20)
cmd.Parameters.Append cmd.CreateParameter("?", adDate, adParamInput, 20)
cmd.Parameters.Append cmd.CreateParameter("?", adDate, adParamInput, 20)

'Put this code in the recursive sub, within the loop, to insert a record into the table:
cmd.Parameters(0).Value = fil.DateLastModified
cmd.Parameters(0).Value = fil.DateCreated
cmd.Parameters(0).Value = fil.DateLastAccessed
cmd.Execute 'inserts the three values into the DB
 
Did I forget to mention? Add a reference to Windows Scripting Host.
 
Another line i forgot? At the top of your code:

Private fso As New IWshRuntimeLibrary.FileSystemObject
 
gotta go! Let us know how it works out.
 
I noticed this error in my code. The three zeroes below

cmd.Parameters(0).Value = fil.DateLastModified
cmd.Parameters(0).Value = fil.DateCreated
cmd.Parameters(0).Value = fil.DateLastAccessed

should be 0, 1, 2


cmd.Parameters(0).Value = fil.DateLastModified
cmd.Parameters(1).Value = fil.DateCreated
cmd.Parameters(2).Value = fil.DateLastAccessed

I also forgot to include a column for filename. You'll need to set that up and add a corresponding parameter for it

cmd.Parameters(3).Value = fil.Name
 
Here's a fast and dirty stab at the problem. Has a little treeview to show progress.
Cheers,
 

Attachments

Users who are viewing this thread

Back
Top Bottom