Access's Yes/No Field=1 bit, VBA's boolean type=2 bytes?

Banana

split with a cherry atop.
Local time
Today, 07:25
Joined
Sep 1, 2005
Messages
6,318
I just noticed that Yes/No field (e.g. in Table Design view) is defined as one bit long, while VBA boolean data type is two bytes. That's pretty a big difference for a simple variable that is supposed to only have two states. ;)

I am not at all concerned about the typecasting or anything like that as it obviously was not a problem for numbers of developers, myself included.

However, it does poses two questions:

1) How did they define Yes/No to be one bit? If I'm not mistaken, normally computers works in sets of 8 bits (e.g. bytes), so having any data types that isn't a factor of 8 bits would variably be expanded/truncated to be a factor of 8. What ramifications does it hold?

For example, does that mean we could have up to 8 Yes/No fields without any increase in the table's size per row? (Speaking of which, is there a way to calculate how much space per row in a given table takes?)

Or did they really to manage to be able to handle one bit, meaning that whenever a table has 8 new rows, we've had added a byte to the total table size?

2) Why have two bytes for VBA's boolean, when one byte is actually just enough for the job? Heck, why not one bit as Yes/No field seems to be capable of? Does it then follow that if we used Byte data type (which takes just one byte) and treated it as a boolean, we would be saving a byte? AFAICT, we can use any numeric data type and treat it as boolean without any problem, so substituting a Byte data type for a Boolean data type would make more sense, especially if we were going to use it in several places or repeatedly.

The only problem, of course, is that Byte is unsigned while we normally use -1 for True/Yes/whatever.

I had expected that this would already been discussed to death by MVPs long ago- if anyone happens to know if there's such thread, I'll appreciate the link. :)
 
Where did you see that Yes/No was a bit? A byte is the smallest unit that can be stored and a byte is 8 bits.

There is another thread going on regarding the behavior of calculated Yes/No values versus those that are stored and how the results differ when being viewed via .net.
 
At first, I noticed this information in Access2MySQL, but I was sure that information had to be wrong, because as you already said, we work with bytes, not bits (at least not without any padding or truncation). Alison Baltier's book for developing Access 2002/2003 actually said same thing for Yes/No being one bit (on page 45, if anyone has that book), while stating that VBA boolean is 2 bytes (pg 260 something), which was quite a shocker to me.

I must have missed that thread; will dig it up. Thanks, Pat.
 
Last edited:
I have always seen Yes/No as 1 byte, for which your problem is that VBA and VB and some other languages are a bit inconsistent in how they treat computed results.

It comes about because type Yes/No is a cast (a.k.a. typecast) of BYTE.

As far back as Ac97, I remember having to test a computed Boolean for zero/non-zero and set it accordingly. (With zero being FALSE and all else being TRUE). FWIW, this occurs sometimes in the older versions of OpenVMS FORTRAN and BASIC, too.

In those cases, there was the issue that if an expression included the AND, OR, and NOT operators as well as numeric operators, an automagic typecast occurred that blurred the lines between raw Boolean results and integer logic operators on non-Boolean fields.

I.e.

A = TRUE
B = A and 1

What is B when Option Explicit isn't set? Answer: Integer because the integer in the expression forced the expression type. So you just had a forced type change.
 
I finally found a MSDN article that does seems to agree that Yes/No is just one bit:

Yes/No
Use for data that can be only one of two possible values, such as Yes/No, True/False, On/Off. Null values are not allowed.
Stores 1 bit.
Source

The_Doc_Man, I do suspect you are correct about automagic typecasting going on background- as I pointed out in OP, this obviously is not the problem because if it was, we'd be seeing a zillions of threads everywhere from new people asking "Why can't I use my boolean to read my Yes/No field?!?" Confusion galore. But this didn't happen, so they did indeed do something in background.

Still, this doesn't really explain how they managed that 1 bit trick because AFAIK, computers work with bytes, not bits, which would have some ramifications (e.g. would we actually save storage if we used 8 Yes/No fields instead of a 8-row lookup table?).

Not to mention the extra byte for VBA's boolean. I just don't get it. Why not just use an Integer data type since we're already eating 2 bytes?


BTW, it's great to see you again, The_Doc_Man! :)
 
Last edited:
This may have been an attempt at making more efficient (faster) code. It's a trick game programmers use...always make your data storage units a multiple of your bus size. That's the only reason for an 8 bit byte, really, 'cause cpus used to use 8 bit busses. Who knows?
 
This may have been an attempt at making more efficient (faster) code. It's a trick game programmers use...always make your data storage units a multiple of your bus size. That's the only reason for an 8 bit byte, really, 'cause cpus used to use 8 bit busses. Who knows?

For which?

If you meant VBA's boolean, which takes 2 bytes, I'm not sure how we would get faster performance since 1 byte is still a multiple of, well, everything. This might be true for using 4 bytes instead of 3 bytes, but I don't think it'd work for 2 bytes over 1 byte.

If you were referring to Yes/No field, this seems to go against the multiple because we now have an arbitrary amount of bits which may or may not be a multiple of bus size.

Or maybe I'm just confused and need another beer.
 
Hello Bannana ...

>> Why have two bytes for VBA's boolean, when one byte is actually just enough for the job? <<

Well, in the "olden days" Access (A95 and v2.0 I beleive) there was NOT a Boolean datatype, Integer was used as the "Boolean" datatype, this is how (IMO) the constant "True" was given a value of -1 (All BITS ON in two bytes of space!!), and Integer is a SIGNED type, so ... 1111111111111111 (bin) = -1 (dec) ... {Side note: if you could see the bit pattern of the Yes/No in the JET table, it is my best guess that its "All BITS ON", but since a Byte (Yes/No's are have a width of Byte, not Bit) is NOT signed, the decimal equivalent is 255 ... you can actually see this if you create a field as Yes/No, then set the field value to False (-1), then change the field to a Number/Byte .. the field value will now be 255} ....
So ... with the introduction of A97, along came the Boolean datatype ... but the datatype is pretty much analogous to the Date datatype, meaning, it is a FORMATED value. Dates are actually Doubles that are presented in a nicely formated way, well the same is true for Booleans, they are Integers that are formatted to SHOW us "True" or "False", PLUS there are vb constants named "True" (-1) and "False" (0). But it is important to realize that 99.999% of the programming platforms the term "False" is DEFINED as equal to 0, and "True" is NOT equal to False ... that is why we can do things like ...

If Len(MyString) Then x = 100 ... because if Len is NOT 0 then x will be assigned the value of 100.

Another interesting "thing" is that VBA does some serious "VooDoo Magic" since you can declare a variable as Boolean ... and set it to ANY value in the Integer range, but VBA coerces all values (except 0) to a -1 BEFORE the value is stored in memory....

Code:
Dim x As Boolean
x = 32000
Debug.Print x
Debug.Print CInt(x)

will yeild:
True
-1

For some leisurely reading check out the following:
http://www.utteraccess.com/forums/showflat.php?Number=1436622&fpart=all#Post1436622


Also ... here is some fun for the immediate window ... just to make sure we can confuse folks ....

? 32000 = True
False

? 32000 <> False {This one shows that comparison to <> False is different than comparing to True!}
True

? True = CByte(255)
True

{hmm.... I thought the Byte datatype was from 0 to 255, and True was -1 ... and -1 <> 255 does it?}

Ok so lets try ..

? -1 = CByte(255)
False

More fun ...

? CByte(True)
255

? CInt(True)
-1

? CByte(-1)
--> Overflow error ...

{I thought true was -1 ??}

... I love boolean (not the data type) math ... Cool eh?
 
Well, in the "olden days" Access (A95 and v2.0 I beleive) there was NOT a Boolean datatype, Integer was used as the "Boolean" datatype, this is how (IMO) the constant "True" was given a value of -1 (All BITS ON in two bytes of space!!), and Integer is a SIGNED type, so ... 1111111111111111 (bin) = -1 (dec) ...

That's cool tidbit of history, and totally explains it. Given how big Microsoft was on backward compatiblity, they probably decided it easier to just use 2 bytes and set all bits on and call it boolean rather than trying to buck the existing systems or cause more problems with implicitly typecasting older "virtua-boolean" datatypes.

{Side note: if you could see the bit pattern of the Yes/No in the JET table, it is my best guess that its "All BITS ON", but since a Byte (Yes/No's are have a width of Byte, not Bit) is NOT signed, the decimal equivalent is 255 ... you can actually see this if you create a field as Yes/No, then set the field value to False (-1), then change the field to a Number/Byte .. the field value will now be 255} ....

I'm a bit concerned that we've only shown the effects of implicit typecasting in background, rather than proving that Yes/No actually take a byte of space, rather than just a bit (thought I will admit that it is certainly more logical because we don't really work with bits, but with bytes)

So ... with the introduction of A97, along came the Boolean datatype ... but the datatype is pretty much analogous to the Date datatype, meaning, it is a FORMATED value. Dates are actually Doubles that are presented in a nicely formated way,

And one more interesting thing I learned from Allen Browne is that if we do this:
Code:
Now + 1/24

This may give us a hour from now, but won't be as accurate due to rounding errors inherent in floating number. Using the functions for dates, Microsoft does some magic so that the floating representation of the date is always accurate (not sure to what degree, though, I'd expect up to seconds.)

well the same is true for Booleans, they are Integers that are formatted to SHOW us "True" or "False", PLUS there are vb constants named "True" (-1) and "False" (0). But it is important to realize that 99.999% of the programming platforms the term "False" is DEFINED as equal to 0, and "True" is NOT equal to False ... that is why we can do things like ...

If Len(MyString) Then x = 100 ... because if Len is NOT 0 then x will be assigned the value of 100.

It's good to know that one shouldn't expect consistent results from this:

Code:
If Len(MyString)=True Then

Goes to show how humans' "common sense" can collide with computers which are definitely deterministic.

Another interesting "thing" is that VBA does some serious "VooDoo Magic" since you can declare a variable as Boolean ... and set it to ANY value in the Integer range, but VBA coerces all values (except 0) to a -1 BEFORE the value is stored in memory....

I quite suspected that it had to be the case, given the lack of hell being raised by irate developers for not seeing "right" results from such comparison. But that's what scares the pessimist in me- if there's voodoo magic going on, would it make it even more likely to bork everything up?!? Will there be a day when I do something a bit different that pass through the crack and thus throws wrench in the entire machinery?

Thanks for that experimentation and the link! :)
 
Hi -

I'm just a newbie here, but I've got to ask:

'How does this all matter in the greater scheme of things?"

From a newbie point of view, seems like you're beating a dead horse.


Bob
 
'How does this all matter in the greater scheme of things?"

A fair question!

There is a good chance that this is much ado about nothing. At until one day you're getting unexpected result, this information becomes kind of useful.

Why, I didn't know that comparing things to TRUE can actually cause problem than if you just compared to FALSE, which is always correct.

From a newbie point of view, seems like you're beating a dead horse.

Um, this is very important thing to do. You know that horses have a tendency to raise from dead and eat people's brains, don't you? So, um, yeah, beating a horse is a necessary thing. We gotta make sure that horse is really dead. Uh uh.

:p ;)
 
Hello Bannana ...

Now on to providing a good bit of evidence that Access documentation is in error when it claims the Yes/No claims a storage size of 1 bit. Do take note, however, that documentation that is found through the JET/ACE hierarchy things claims a storage size of 1 byte. I put more credibility in the JET documentation simply because JET is the database engine that the Access application uses, so documentation explaining JET through Access is "one degree removed" so-to-speak. So ... now for the experiment.

Steps taken ...
- Create and Empty MDB file, set it up to Compact on Close to ensure extra junk is removed.
- Create ONE table with 16 fields with a datatype of Currency (NOT a Number formated as 'Currency', but the DataType of 'Currency')
- Add 128 rows to that table with all the values set to 0. Take note that the same space is consumed no matter what number you are storing. I chose Zero simply because that value is a valid value through out this exercise. This step creates 2048 elements in the table.
- Quit Access, then record the "Size" of the file, NOT the "Size on Disk" as that is the size of the blocks claimed by the file.

Now, comes the fun part, we know we have 2048 elements in the table, so now change the datatype of each of those 16 fields to the next lower fixed point numeric type (a Long), exit Access, then record the size, then change the datatype to Integer, exit Access, record the size, then to Byte, exit Access, record the size, then to Yes/No, exit Access, then record the size. What this process will give you is the RELATIVE size of each datatype because we can calculate the elemental difference in storage space ...

<Difference in File Size between datatypes> / <number of elements> = <bytes freed up per element>, which translates to the byte size difference between datatypes. Now there may be padding an such simply because the record may not be optimized with respect to space consumed by the record once the C&R takes place.

Here is a table of the results:
Code:
DataType        AccessHelp  FileSize  Diff  BPE Result/Conclusion
------------------------------------------------------------------
Currency        8 bytes     131072     --   --  Assume the size of 8 bytes is accurate
Number/Long     4 bytes     122880    8192  4   Long is half the size of Currency
Number/Integer  2 bytes     118784    4096  2   Integer is half the size of Long
Number/Byte     1 byte      114688    4096  2   Unexpected value, but smaller than integer
Yes/No          1 bit       114688    0     0   Yes/No is the SAME size as Byte.
Where DataType is the DataType from the Table design UI; AccessHelp is the size listed in the help when you press the F1 key with your cursor in the DataType drop down of a field in the table design UI; FileSize is size of the MDB file in Bytes; Diff is the byte size difference acheived after the datatype was changed; BPE is the number of bytes per element, which was calculated by dividing the file size difference by 2048, as explained above; Result/Conclusion -- hopefully you can figure that out!

Based on the results I personally think is a fair bet to say that the Yes/No consumes 1 byte of space, otherwise I would think the file size would at least be moderately lower. To support the byte size claim, the article found here (http://office.microsoft.com/en-us/access/HP010322481033.aspx) indicates a storage sized of 1 byte for the BIT typed column. I do remember seeing more articles that support the Yes/No = 1 Byte of storage, I just can't find them right now!!

Now ... with respect to memory consumption in VBA... Again I take the approach of RELATIVE size and make conclusions based on that ... the information to follow is a virtual cut/paste from a post on UtterAcces, but the post is in a private forum, so I can't link you to it, but others who have access to it that post on both AWF and UA may have a sense of De-Ja-Vu while reading this ... To give a background, me and a highly respected Excel MVP, Nate Oliver (posts with the name NateO) were discussing the conflicting in formation regards how much memory is consumed by VBA datatypes, other topics, like speed and strings as Byte arrays were being discussed to ... below is the text, with OUT any edits, so conversation that does not relate to OUR discussion here ... you'll just have to extend some grace!! .. :) ...

--- Begin Copy ----
Okay ... so ... This thing was driving me crazy so I thought ... there has to be a way to test this, so as I thought to myself ... "How can it determine the memory footprint of a variable?" ... And low and behold it came to me ... I will just fill up the stack with a recursive function, then force a stack overflow and count the number of times the function is called ...
mod_StackOverflow
----------------------
Code:
Option Compare Database
Option Explicit
 
Private lngCalls As Long
 
Private Function foobar() As Boolean
    Dim x As Boolean
    lngCalls = lngCalls + 1
    Debug.Print lngCalls
    x = foobar()
End Function
So with the code, the RELATIVE size can be determined by the number of times the function is called, since each function call dumps some info to the stack, but since the info is never retreived from the stack, the stack will overflow. To "test" the memory footprint of each type, I merely varied the datatype of the function and the variable "x". The function and "x" always were tagged with the same datatype (does that make sense?). My thoughts are/were that a small memory footprint will yield a higher function call count ... here are the results:
Code:
DataType    CallCount   VBA Reported Size  Presumed Footprint
Byte        6627        1-byte             2-bytes
Integer     6627        2                  2
Boolean     6627        2                  2
Long        6452        4                  4
Currency    6130        8                  8
Single      6452        4                  4
Date        6130        8                  8
Double      6130        8                  8
Object      6452        4                  4
Variant     5004        16/22              16
String      6452        10                 4
WOW ... I was surprized! ... apparently a Byte and Integer have the same footprint, presumably 2-bytes!!! ... WHAT?!!? ... That is not what is implied by one of those articles!! ... Also, look and the String datatype, it seems to confirm our thoughts that it has the BSTR structure since the stack overflowed at the same count as a Long.
With all this, I DO understand that values pushed to the stack may not reflect the same storage space consumed by the actual location of a variable, but I would tend to think they would be the same ..
I thought this was cool!! .... so, it seems, "that bad boy" IS an Integer!!!
---- End Copy ----

So ... how does that information grab you?
 
Excellent!

You(/NateO?) did a thorough job of testing this!

Makes me wonder about the documentation, but I will definitely keep in mind to consult Jet's documentation before relying on Access's documentation.

If I may, the conclusions seems to be this:

It seems pointless to use VBA Byte (and to lesser extent Boolean) if they're all same sizes as a Integer. Indeed, it may be better to just use Integer even to hold values where you do not expect the range to exceed 255, just because it's signed, while Byte is unsigned, which gives us a bit more flexibility. (Don't know about other folks here, but sometime I use a negative number as a kind of flag for fields where I don't allow nulls but need to allow for exceptional cases where data isn't available for whatever reasons- in much similar manner to what Windows does with default system variables)

It may be OK to use Byte in field type as this is indeed one byte less, and there shouldn't be a problem using VBA Integer to hold the Byte if necessary, provided that we take responsibility for checking for overflow and negative values (especially if using TRUE or -1!).

We've already agreed that for most parts (possibly all parts, but one never can be absolutely sure), there is a implicit typecasting to/from booleans, so just using an Integer as a boolean would suit the job (except that it may obfuscate the readability of the code), and allow a third state (e.g. 1 to represent Null, if desired).

Not to say that we shouldn't ever use those variant, but I think it helps to know that there's nothing to gain from specifying a VBA Byte, other than making the code a bit more readable. (I also quite doubt we would be able to use Integers to hold an array of Bytes from a binary file, even if they would consume same amount of memory anyway).
 
Last edited:
>> You(/NateO?) did a thorough job of testing this! <<

... That would be me ... :rolleyes::cool::o:D

....

I kinda have a reputation for "pegging the geek meter" ...
 
I still use the Byte datatype in VBA simply because one day MS may decide to actually store it in a single Byte! Also, I will often choose the datatype based on the range of values I expect to store/work with.

... Gotta Run for now!! ...
 
This inspired me to try out a proof of concept:

I present you: a signed byte type!

You can use this like this:

Code:
Private Sub foo()

Dim x As New SByte

x = 1

Debug.Print x

Set x = Nothing   'Actually not necessary- will terminate on the next line anyway

End Sub

It's actually an integer, but has a check to accept only values between -128 to 127, like a real signed byte would, and raise Error #6 (which I *think* is the code for value out of script- could be wrong, but you get the idea) if we try to insert a bad value.

The reasoning being that if everything smaller than an Integer is really a Integer, it may be actually faster to use Integer directly and avoid any implicit typecasting, but we'll have to prove this through testing, which I will do when I have free time later in day, then using a class module to constraint the variable so it acts like a real signed byte with the errors included.


Fun!
 

Attachments

Last edited:
Test code:

Code:
Private Sub foo()

Dim i As Long
Dim x As New SByte
Dim y As Byte
Dim z As Integer
Dim a As Integer

Dim starttime As Double
Dim endtime As Double

a = 1
x = 1
y = 1
z = 1

starttime = Now()

For i = 1 To 20000000
    a = x
Next i

endtime = Now()

Debug.Print Second(starttime - endtime)
i = 0

starttime = Now()

For i = 1 To 20000000
    a = y
Next i

endtime = CDate(Now())

Debug.Print Second(starttime - endtime)

i = 0

starttime = Now()

For i = 1 To 20000000
    a = z
Next i

endtime = CDate(Now())
Debug.Print Second(starttime - endtime)
End Sub

Writing to variables:

When a is a Integer:
Code:
8 seconds for SByte
1 seconds for Byte
1 seconds for Integer (No surprise, as there's no typecasting)

When a is a byte:
Code:
9seconds for SByte
1 seconds for Byte (No surprise, too)
0 seconds for Integer (!! Integer faster than Byte!)

Reading from variable:

When a is a Byte:
Code:
9 for Sbyte (yuck!)
1 for Byte (No typecasting, again)
1 for Integer (hmmm....)

When a is a Integer:
Code:
8 for Sbyte
1 for Byte 
0 for Integer (no typecasting)

Now when I think about it, Sbyte sucked because it was essentially an object, even if all it had was a property Let/Get and a integer. Too bad.

But I think the original hypothesis of Integer being faster than Byte even with typecasting seems to be confirmed, but only for writing, which makes sense since integer is obviously bigger than a byte and thus needs no checking. Reading seems to be unaffected, so it's safe to say integer is just as good if not marginally faster than a byte.
 
Hello banana ...

You said:
And one more interesting thing I learned from Allen Browne is that if we do this:

Code:
Now + 1/24
This may give us a hour from now, but won't be as accurate due to rounding errors inherent in floating number. Using the functions for dates, Microsoft does some magic so that the floating representation of the date is always accurate (not sure to what degree, though, I'd expect up to seconds.)

But .... if you run this code ...

Code:
Public Sub foobar()
    
    Dim x As Integer
    Dim dtBase As Date
    
    dtBase = Now()
    
    For x = 1 To 24
        Debug.Print CDbl(dtBase + x / 24) & "  " & CDbl(DateAdd("h", x, dtBase))
    Next x
    
End Sub

You will see that the date function and the mathematical expression produce the same results all the way down to the deepest decimal point. Floating point numbers are and approximation of an actual value, but they are very consistent.

... I will play some with elapsed times too later on ... looks like some fun! ...
 
Interesting.

I wonder if the paranoid has to do with long term rather than 24 seconds? I would like to assume that they're telling us to use Date functions for a good reasons not because they're on a power trip. ;)
 
Post #8 in this thread may lack some correctness and clarity.

http://www.utteraccess.com/forums/showflat.php?Cat=&Number=1523482&page=&view=&sb=5&o=&fpart=1&vc=1

As an aside, if Boolean storage capacity changes in the future it’s not likely to get smaller, rather, it may shift to 32 or 64 bits.
Both truth and falsehood are based on the in register result of the last machine operation as expressed in the Zero bit (not the N bit) of the condition codes.
A 64 bit machine should have a 64 bit data bus, the most commonly attributed definition of size, of a processor.

There is a fundamental difference between that which is not Zero and that which is Negative.

A Negative value (as signalled by the setting of the N bit in the condition codes) is regarded as true.
However, the clearing of the N does not mean it is false… 5 for example will be regarded a true, but the N bit has been cleared.

False equates to the condition code Z bit being set: -
The Z bit will be set if the result of the last operation is zero, cleared otherwise.

This equation of the Z bit to false enforces the logic of both false and true across programming platforms.
The Z bit is equal to false and the logical NOT of false is true.

Regards,
Chris.
 

Users who are viewing this thread

Back
Top Bottom