A topic most men tend to avoid...wait, wrong topic! Anyways, not a bad read. If you have about 20min to spare to read both articles, grab a cup of coffee and have a read...
Let's revisit the argument about whether to declare variables at the top of a procedure or closest to first use. We'll ask ChatGPT to make the best case for each position.
I worked with another developer once who was a "closest to first use" proponent. I hated working on his code because it was always a chore to determine how, where and if variables had been declared appropriately. That actually pushed me to the practice of declaring variables at the top of the procedure. I got to the point where I had to group them by datatype to be comfortable. Objects, then dates, then strings, then numbers, for example.
But I get the counter-arguments as well. It's hard to make an objective decision, so it comes down to a subjective preference.
There is a consideration regarding older vs newer languages. Mechanically, languages create variable blocks for variable allocation purposes, mostly to keep code and data in separate memory ranges. For those O/S cases that allow "code protection" and "data non-execution protection", discrete areas are set aside within the memory allocation scheme of the eventual program's memory so that different memory management hardware settings can be applied to code vs. data. In such cases, code cannot be modified while the program is running and data cannot be executed as though it were code. This defeats a lot of threats that use code-injection or buffer overrun tricks to compromise a program. The first time I personally came across that was in the late 1970s/early 1980s for the VAX/VMS environment and something called p-sections.
But for VBA, we NEVER write "main" programs where data block layout is an issue. Everything we write is a sub or function for which any locally declared variables go on the program stack in something called the Stack Frame. (Doesn't apply to STATIC variables.)
If I try to think about this dispassionately, I have to say that it should not matter which way you go because either way, a DIM / PRIVATE / PUBLIC statement is merely going to allocate a new slot in the stack frame as an offset from the stack pointer (hardware register). In the VBA environment, variables declared at the top of a sub/function don't even exist until you call the function, at which point the stack frame exists with all of those pre-allocated slots. Variables declared in a module's declaration area don't exist either, at least until the module is activated for the first time and thus loaded to memory
Further, since VBA is pseudo-compiled, VBA code is technically data anyway,... data for the p-code interpreter. Therefore, the code/data protection hardware schemes don't apply. In the absence of hardware protection for our VBA code, I don't see a big difference.
I personally try to declare everything at the top of the module or routine entry point, but I can say with some certainty that I do that because in the period when I learned programming, it was a language requirement, as an early way to protect code and data segments from each other. Therefore, I recognize my preference as habitual behavior, ingrained by rote learning at a time when there was no viable alternative. (Yes, I am that old...)
I will keep on pre-declaring rather than in-line declaring because I have developed habits that make documentation easier to write. (Don't have to go looking for declarations, they are all in a bunch.) That is the only advantage I see that specifically applies to VBA code. Note that I think more advantages apply in true-compiled languages - but for VBA, shouldn't be a big deal.
I love MZ-Tools because it sorts all my variables alphabetically. It makes it much easier to find what I need to check/change. It’s a relief to see them in the right order.
I came to Access from a strongly typed language (COBOL) so VBA has always been too loosy-goosy for my tastes. I always define my variables at the top of the module, partially due to habit but also out of respect to their scope. I understand the concept of "at first use" but random is just plain sloppy.
I prefer top of the procedure because I practice coupling and cohesion and so my procedures are focused and therefore do not contain multiple bits of unrelated code that would use unrelated variables.
Defining the variables in a clump also eliminates the problem of forgetting that something was already defined if you have more than a few variables and then defining it again with a slightly different name and then using one of them in the wrong place and so the results are inconsistent. This is a problem I have encountered multiple times in apps created by novices and it is a really hard bug to find.
It shouldn't really matter which variant you use. I simply claim that the code is too long as soon as it makes a difference.
In C#, I only use the "closest to first use" variant (with var and direct assignment). var parameters = new List<IParameter>();
However, this is not possible in VBA.
I have already seen this variant: Dim Parameters as Collection : Set Parameters = new Collection;
I don't use it myself, but I could get used to it.
The advantage of the top declaration is that you can see which variables are (could be) in use.
The advantage of the "closest to first use" declaration is that an incorrect double use of a variable should be noticed.
With short procedures (designed according to the Single-responsibility principle), this should not generally be a problem.
In terms of readability, I don't see any difference for me with any variant. For me, the variable name has to be meaningful, then I don't need the type.
When troubleshooting, I prefer the "closest to first use" declaration - especially if the procedure is too long.
It depends on how short the code is. If you think you know the variable name and you type it and it compiles, how do you know that you have used TotAsts rather than TotAssts? Are you always going to scroll back to refresh your remembrance of every variable name? I don't think so.
Defining the variables at the place of use makes sense in a block scope situation. For anything wider, defining at the top of the procedure leads to fewer problems. And if a developer define 30 variables for every procedure, he doesn't understand coupling and cohesion and is doing too many different things in each procedure.
Maybe it would have helped if you had also read the 2nd sentence after this one
If you think you know the variable name and you type it and it compiles, how do you know that you have used TotAsts rather than TotAssts? Are you always going to scroll back to refresh your remembrance of every variable name?
If I declare the variable directly before using it, I know it quite precisely. However, if the compiler then complains, I've done something wrong. It's not the variable name that's wrong, but the procedure is too complex because I needed the same variable name twice for different things. (I'm thinking about procedures with several For or While loops or temporary variables for buffering results)
To be on the safe side: this problem does not occur at all with clear procedures (SRP).
Therefore, I meant that it shouldn't really matter whether you define the variables as a block at the beginning or just before use.
Only if the declaration line is visible while you are writing the line of code that references it. You did not read my entire thought. OF COURSE you know what the variable name is if it is right in front of your eyes. If it isn't visible, are you going to scroll up to review the variable names? No, you are not.
If I have confusable variables [TotAsts and TotAssts] in the code, where I even have to scroll, then (in my opinion) my main problem is not where I declared the variables, but the code design itself.
If I understand you correctly, you are arguing that exactly these confusable variables are created when you declare them right before the 1st use.
I can only agree with that.
Does it make the code better to declare them at the beginning? They can still be confused.
But maybe then you realize earlier that you should redesign something.
The problem I have with this idea is that it is based on a code construct with "long" procedures (more than one responsibility), which I consider to be error-prone and difficult to maintain. Is the location of the variable declaration really the relevant improvement that should be made?
Shouldn't the comparison be made with clear/clean code?
Example GetProductStats() in https://nolongerset.com/premature-declaration/
In my opinion, the variable declaration is the minor problem in terms of readability and maintainability of the procedure.
Code structure:
Code:
Do
...
If
do
if
...
end if
loop
...
do
if
...
end if
loop
end if
loop
...
(I know this is just an example to show DAO technology, so I don't question the content or implementation.)
How would the comparison be between declaration in the block at the beginning and declaration before 1st use, with a refactored variant of this procedure, which then calls other procedures instead of “doing everything itself”?
NO, I'm the one who writes SHORT procedures. That is what I said right up front. Apparently you have never worked on applications created by others. And, I'm not just talking about outright novices. Way too many experienced developers don't understand coupling and cohesion. Why do you think that practically every release of Windows or Office breaks something totally unrelated? It is because of sloppy coding and procedures that do more than one thing and so if you change one part of that procedure, it can have adverse effects on other "off label" uses of that procedure.
I just ran into one of these stupid functionality breaks last week. All of a sudden, my pinned documents disappeared from Excel, Access, and Word. After the second day, I also noticed that there were huge gaps in the most recently used files lists. The apps were all working so who knew that something flaky happened to my account. Finally, I rebooted. It was still broken. Then someone suggested that I log into my Office account again. That fixed it.
I used to write them all at the top because that's what most people did. Then I went to .NET and I enjoyed declaring and initializing at the same time, so I started doing it in VBA. However, I follow some rules as much as I can:
- It depends on the length of the initialization, but if it fits in one line, I'll do it. If I divide my screen in half between the VBA window and the app, I must be able to read the entire line, that's about 80~120 characters long, so, if the initialization fits in the same line with the declaration, it stays there, otherwise, I initialize in the next line.
- With loops where the variable will be reassigned, I declare my variables in order of appearance within the loop, leaving all the iterables right before the loop. I also tend to do this: Dim i As Long: For i = 0 to X
Because i is such an universal iterable that I don't like declaring it solo.
And that's what I can remember for now, after all, small procedures are best procedures.
I have long been a fan of the "divide and conquer" methodology of programming. If you divide small enough, a lot of your routines will be short enough that you can see the whole routine on the same screen. (OK, we aren't ALWAYS that lucky....) When you can see everything, putting all local declarations at the top is still close enough to where something is actually used that it doesn't matter that much. Which is, I'm sure, what Pat also strives to accomplish based on her comments above.