Why Is This Recursive Function with DIR....so slow ?

Isaac

Lifelong Learner
Local time
Yesterday, 17:29
Joined
Mar 14, 2017
Messages
9,910
So I'm using some code I cobbled together off of the web (I cannot remember from where any more), and you will note that the goal here is to START at one particular directory, and traverse all files in all subfolders (to each one's lowest level), listing each filename on a worksheet.

Now I know from using DIR to populate ONE folder's files (on a listbox which I did this once recently), it is VERY fast......like really astonishingly fast, even if the folder has 3000 files, DIR can populate them to an array in a moment.

I cannot figure out why this code, which starts at a top-level folder which does probably have a *total* of ... maybe 500 subfolders, and in some of them only a handful of files (5-10) and in others, maybe a couple hundred.........Why it is still running after half an hour!
Now it could be "correct" - but if you can spot any way to optimize this let me know? Thanks !!!

I wrote a small sub to create 1) a test folder to start with, and 2) about 80 levels of subfolders, each one with one file.
Then I ran this code on THAT structure, and it took maybe about 10 seconds to list out my 80-ish files.

Just wondering if anyone can see any part of my code that could be optimized - or rewritten differently - if the goal is to choose one top level folder and list out every file contained in any subfolder, going as deep as needed in all subfolders etc.

Excel, Access, I don't care - either is fine.

Code:
Sub LoopAllSubFolders(ByVal folderPath As String)

Dim fileName As String
Dim fullFilePath As String
Dim numFolders As Long
Dim folders() As String
Dim ws As Worksheet
Set ws = ThisWorkbook.Worksheets("FilesListing")
Dim i As Long

If Right(folderPath, 1) <> "\" Then folderPath = folderPath & "\"
fileName = Dir(folderPath & "*.*", vbDirectory)

While Len(fileName) <> 0

    If Left(fileName, 1) <> "." Then

        fullFilePath = folderPath & fileName

        If (GetAttr(fullFilePath) And vbDirectory) = vbDirectory Then
            ReDim Preserve folders(0 To numFolders) As String
            folders(numFolders) = fullFilePath
            numFolders = numFolders + 1
        Else
            'Insert the actions to be performed on each file
            'This example will print the full file path to the immediate window
            'Debug.Print folderPath & fileName
            ws.Range("a" & ws.Rows.Count).End(xlUp).Offset(1, 0).Value = folderPath & fileName
            'ws.Range("b" & ws.Rows.Count).End(xlUp).Offset(1, 0).Value = FileLen(folderPath & fileName) / 1000 'megabytes
           
        End If

    End If

    fileName = Dir()

Wend

For i = 0 To numFolders - 1

    LoopAllSubFolders folders(i)

Next i

End Sub
Sub loopAllSubFolderSelectStartDirectory()
Dim wb As Workbook, wsNew As Worksheet, strWorksheetName As String
ThisWorkbook.Worksheets("FilesListing").Range("xfd2:xfd60000").ClearContents

Call LoopAllSubFolders(frmMenu.txtFolderPath.Value)

End Sub
 
Aww, darn.......I just remembered there is one particular folder that has 10,000 files.

Still, if you see any way to make it faster please do share. If not it's OK. Thanks all

PS also this is on a network over VPN using WiFi, which can be slow.....and yet, my test where the code had to traverse 80 subfolder layers occurred in 10 seconds.....and when I manually use Explorer to go to the folder that has the 10k files, it only takes it 5-7 seconds to populate/show all files, and accurately show the count at the bottom...so I'm not sure why DIR is being such a beast about it?
 
Last edited:
Just for clarification, is "ThisWorkbook" a global item? I don't see where you open it anywhere so it would help to have a little clarity.

You have variables (objects) wb and wsNew plus strWorksheetName DIM'd but not initialized in the loopAllSubFolderSelectStartDirectory routine, which appears to have no VBA parameters incoming. They actually are not in scope for very long but they don't do anything either, because they are not Public. They exist for the life of the recursion but don't do anything.


By default, variables declared with the Dim statement in the Declarations section are scoped as private.

To make it faster, you need to toss timing hacks into the code - for debug purposes. Create a table with a SINGLE and a SHORT TEXT. Then write a sub to update this table with logging entries using Timer() to update the SINGLE, and pass in a parameter. If you keep the logging recordset open, you can pass it as a ByRef.

Code:
For i = 0 To numFolders - 1
    WriteLog "Recursing " & folders()
    LoopAllSubFolders folders(i)
    WriteLog "Returning " & folders()
Next i

Then you open the logging recordset before the first recursion and close the logging recordset after the recursion re-emerges (finishes) to the top level. Display the times using a format like "#####.##" and just display the text as-is. This will give you hints of where you are. If your recursion is done correctly, your "maybe 500" folders will give you a table of 1000 entries - one entering and one leaving. If you have significantly more than 1000 entries, your recursion isn't implemented correctly.
 
Sorry for not specifying ... this is written in Excel, (but as you know Excel forum gets much less view) - and plus, the code is basically the same in access or excel, with the exception of, how/where I am writing the data (in this case to a worksheet - but it could be anything - I'm open minded - a text file, an access table except that would be a lotta inserts, wherever).

Yeah sorry, wsNew and the other ones are the leftover ruins from an earlier code version...sorry.

You make a very good point about timing....I have taken 2-3 troubleshooting approaches so far, including starting at various other folders other than the "top most" one (which takes a long time).....and several of them finished quickly, so I am slowly but surely sort of narrowing the focus to that one that has the 10k files, but I'm still not 100% sure. I also thought maybe Zips were being seen as a Directory and slowing it down exponentially, but, that wasn't the case, as I tested the GetAttr line on a known zip file and it came back as vbNormal (not a directory).

Your point about the timing is well taken and I will seek to implement it, perhaps slightly differently than the table method but still with the point of getting insight into the start and stop time of specific folders (etc).
I think I will do the text file log method, as I have used it before....while the Textstream object is still in code scope, it can't be manually opened, BUT it can be copied and the updated copy opened, so that will work for me here.

Thanks as always for the great ideas. Sorry about the not clarifying and for including unused var's.
 
You can try a different recursive version of this, but I am not promising anything. I do not remember this being super fast either. Need reference to MS Scripting Runtime. However, make sure when you run your real code to comment out any debug.prints you may have for error checking. That will really slow down things.
There is some application specific code still left in that you can get rid of like the folder level and parent id.
Code:
Option Compare Database
Option Explicit
Private fso As Scripting.FileSystemObject
'---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
'*************************************************************  Span and update Folders  *****************************************************************************************
'---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Private Sub spanfolders(SourceFolderFullName As String, DefaultFolderNumber As Integer, Optional ParentID As Long = 0, Optional ByVal FolderLevel = 0)

' lists information about the files in SourceFolder
' example: ListFilesInFolder "C:\FolderName\", True
    'Dim FSO As Object 'Scripting.FileSystemObject
    Dim SourceFolder As Scripting.Folder 'Scripting.Folder
    Dim SubFolder As Scripting.Folder 'Scripting.Folder
    Dim FileItem As Scripting.File 'Scripting.File
   ' Dim ParentID As Long

   
    Set SourceFolder = fso.GetFolder(SourceFolderFullName)
   
    FolderLevel = FolderLevel + 1
    Debug.Print "Folder " & SourceFolder
    ParentID = 1234
    For Each FileItem In SourceFolder.Files
        Debug.Print "File" & FileItem
    Next FileItem
   
    For Each SubFolder In SourceFolder.SubFolders
        ParentID = 456 'GetFolderID(SourceFolder.Path) ' The record has just been added so get PK by name
    '   LogFilesFolders SubFolder.Name, SubFolder.Path, SubFolder.Type, ParentID, fft_Folder, FolderLevel
       If (SubFolder.Attributes And 2) <> 2 And (SubFolder.Attributes And 4) <> 4 Then
            spanfolders SubFolder.Path, DefaultFolderNumber, ParentID, FolderLevel
       End If
    Next SubFolder

    Set FileItem = Nothing
    Set SourceFolder = Nothing

End Sub

Sub spanallfolders()
   Set fso = New FileSystemObject
  spanfolders CurDir, 1

End Sub
 
Don't write the results directly to the workbook, put them into an array.
When you are finished write the whole array in one fell swoop to the worksheet.

If you don't see a massive improvement n performance, I'll eat one of my wife's hats.
 
Code:
Don't write the results directly to the workbook, put them into an array.
When you are finished write the whole array in one fell swoop to the worksheet.
That would be a good test.
1. Time you recursion and do nothing with the results. Just loop everything.
2. Now add in the arrays to store the data and time it.
3. Now add the method to write from the arrays.

If 1 is still painfully slow see if you can speed up the recursive loop. If not everything else will not matter.
 
@MajP
Thanks! I might use that especially since it looks like once they like my proof of concept, they'll probably see my point about this getting a bit dicey...and people needing a better way to actually 'manage' the doing of it (considering huge network folder locations could take more time to do in one fell swoop than the 12 hours their VPN will even let them stay connected!) ... and they'll take my advice about giving me 1-2 days to create an Access app that FIRST gathers the folders...all of them...and then lets the user do one at a time (the "do" part will end up being more than I've referenced so far here and has to do with renaming or categorizing the actual files in preparation for changes coming up).
Once I do that, the idea of just looping through the folders first might be especially relevant, at which point I might loop through them using Scripting Filesystemobject as you have mentioned, given several reasons, not the least of which I'm more familiar with it and in fact even if it's LESS fast than DIR, nobody will care, it is just listing folders, will only take a minute.

@Minty and @MajP
Yes!!! Yesterday I was thinking "hmm, I wonder if that whole writing-to-the-workbook thing is taking an inordinate amount of time".......and so I turned off application.screenupdating (which I should have known from the beginning but forgot), and sure enough, that alone sped it up a noticeable bit. Next I was thinking of writing it all to a string (just with line breaks), and putting that string in a text file at the end, then on the Worksheet later.......but in the afternoon, some corporate antivirus nonsense must have started targeting me, because I suddenly became totally unable to write any set ts = fso.createtextfile or set ts = fso.opentextfile(path,ForAppending) .....once it targeted me, it was viciously effective and comprehensive! This has happened to me twice before, "permission denied".......and I use textstreams so much, I know I'm not making a mistake.......tried 1) my personal network userhome, 2) my desktop, 3) my appdata folder, 4) the same folder I was working in........and got them all.
I think today it will clear up, if the past is any indication.

However, your comment is well taken, because an Array plays very well with multi-cell Range placement, and I should have done that from the beginning...thanks!!

Even after years, sometimes the use of DIR still confuses me, I'll admit. This was good for me to stare at some real full use of DIR code (other than just the If dir(path to an file)="" code that I've used to tell if a file exists or not)......and also a lot of great suggestions given here with other considerations like yours.

Thanks all!! Minty, I sincerely hope you do not end up eating a hat ... I'll try your array and let you know
(Or should I just say it's slow to make you eat one?) :LOL:
 
Googling edible hats now as we speak...
 
@Isaac
The code I posted came from this extremely long thread that loops folders of video types and logs and gets file counts of directories. If I remember I ran this on a very large directory with 30k files and lots of subfolders. I was logging into the db as I went the files and folders. So that alone should have been pretty slow. I think that took a while but not more than 5-10 minutes. There may be some code in there of interest.
I was then using the table to create a treeview.
 
ok great, thanks for the link, will check it out
 
The array method tested out almost exactly twice as fast - a high volume folder location I was testing now takes 10 minutes, instead of 22.
Great reminder Minty, thanks !!!! Can't believe I was so lazy about learning the array-to-range technique.....so helpful

Because the sub is recursive and is constantly calling itself over and over, I'll admit I began to get a bit confused about variable scope.
What I did quickly and clearly realize is that my "normal" usage of local variables wasn't going to work exactly as it normally does in my usage, thus, I made several global variables (the array holding the list of files, and some counters), and simply continued manipulating it inside the sub.
Works great, although I probably deserve a good scolding about doing it that way - I'm sure there is probably a better one.

Even if I do migrate this tool to Access, I'll probably keep involving Excel, because

this:
vba > array > excel range > TransferSpreadsheet into Access

is clearly going to be a LOT faster than anything I can think of access-only, like
vba > array > Inserts
or
vba > Inserts
or
vba > Recordset
..etc

yay for AWF thanks folks !! 🍺🍺
 
Glad to have helped.
I was processing a ton of web scrape data in an excel spreadsheet (20,000 lines) and each run was taking 3-5 minutes, and a colleague said chuck the whole lot into an array - the source and the output. Worst case run time went down to 20 seconds.

Reading and writing to the cells is very slow by comparison with using an array for both.
 

Users who are viewing this thread

Back
Top Bottom