Access's Yes/No Field=1 bit, VBA's boolean type=2 bytes?

Banana · Jul 11, 2008

Chris0, thanks so much for chiming in.

ChrisO said:
As an aside, if Boolean storage capacity changes in the future it’s not likely to get smaller, rather, it may shift to 32 or 64 bits.
Both truth and falsehood are based on the in register result of the last machine operation as expressed in the Zero bit (not the N bit) of the condition codes.

This part I don't get. My reasoning is that it shouldn't matter how big a processor/data bus/whatever is- the information for True/False can be easily held in single bit, so we all really need is one byte (as I understand that any data type has to have some prefix to identify that it is a so and so data type, but since we have 7 bits, we can use that to prefix this as a boolean).

There is a fundamental difference between that which is not Zero and that which is Negative.

...

This equation of the Z bit to false enforces the logic of both false and true across programming platforms.
The Z bit is equal to false and the logical NOT of false is true.

This is quite interesting, but I'm afraid I still don't understand two things:

1) How is N bit significant in dictating whether a value is False or True? if I read you right, Z bit is what determines falsehood?

2) I'm not sure exactly where Z bit is, but assuming on a 32-bit x86 (Little endian), it would be on the 2nd bit next to the negative bit:

Code:

0[color=red]0[/color]000000 00000000 00000000 00000000

But I can set the value 1, represented as:

Code:

01000000 00000000 00000000 00000000

which would evaluate to 1, and True. But value 2, which still evaluates to True, doesn't toggle neither Z bit nor N bit at all?

Code:

00100000 00000000 00000000 00000000

Or am I not understanding something here?

BTW, I do agree with the logic of making True simply NOT False, rather fixing it to a constant, which would cause problem with portability.

gemma-the-husky · Jul 11, 2008

Just back from my hols

Perhaps the most efficient way of handling a boolean value (ie one ONLY with a true/false value) depends on the machine code instruction set (i'm no expert, but isnt it beeter to have even numbers of bytes so the data aligns in memory in a certain way) - hence it may be possible (might it?) the a two byte variable can actually be more efficient then a single byte (and definitely more efficient than a bit type, I would have thought, as bit wise operations must take more clock cycles than byte operations)

Equally, I (think) I always try to use true and false in code rather then explicitly testing for 0 and -1 , as I understood that these usages were guarnteed (my MS) to work, irrespective of how they implemented things like true/false in future.

Banana · Jul 11, 2008

gemma-the-husky said:
erhaps the most efficient way of handling a boolean value (ie one ONLY with a true/false value) depends on the machine code instruction set (i'm no expert, but isnt it beeter to have even numbers of bytes so the data aligns in memory in a certain way) - hence it may be possible (might it?) the a two byte variable can actually be more efficient then a single byte (and definitely more efficient than a bit type, I would have thought, as bit wise operations must take more clock cycles than byte operations)

If that is true, this would also explain why Byte data type may be two, instead of just one, which is just neat. However, how to prove this?

I mean, it does sounds intuitive that a 2 byte would be easier to handle than 1 byte in any 32-bit (e.g. 4 bytes) and 64 bit (8 bytes), but is everything a multiple of 2? Not all programming languages does so (e.g. I know that some Date/Time types in certain languages are stored as 3 bytes).

Equally, I (think) I always try to use true and false in code rather then explicitly testing for 0 and -1 , as I understood that these usages were guarnteed (my MS) to work, irrespective of how they implemented things like true/false in future.

This is also true for any constants/enums- they always say to use stuff like dbOpenDynaset, adAsyncFetch, vbCrLf and the like even though passing numeric equilivalent will succeed but without guarantee that it will work tomorrow.

The weird thing about True, though is if you try to use it like this:

Code:

If FileLen(MyFile) = True

, it will not be consistent, as Brent showed earlier (or linked to UA, not sure which).

datAdrenaline · Jul 11, 2008

Hello Chris!! ...

>> Post #8 in this thread may lack some correctness and clarity <<

... hmmm ... what specifically in my post is incorrect? ... what specifically was not clear? ... I definately want accurate and clear information presented, so I welcome corrections and requests for clarity!!!

I read your post in the linked thread, but I did not see where it contridicted or corrected the information I provided, but it does indeed fit well in our conversation here (as do other threads we've participated in together regarding the Boolean datatype! ... good time! eh?

). In the linked post when you talk about True being -1 and why it is -1. It seems you said the bit pattern of -1 is 0000 0000 0000 0001 with the N bit set to 1 ... I was, and still am, under the impression that the bit pattern in memory for -1 (the the 2 byte VBA world here) is 1111 1111 1111 1111, and the two's compliment process is done by the software interpreting the value ... so the software (like at the machine code level) will only look at the N bit if the value being represented is a signed type. If the datatype is signed and the N bit is ON, then the software performs the two's compliment + 1 (and probably claim some stack space in the meantime) to get the magnitude, then tag the minus sign in front of it (because of the N bit) to present the human readable number, but the bit pattern of the value (at it primary memory location) does not change to the two's compliment form. If it did, then the "negativeness" of that value would be lost after the first usage of that number since only one N bit is present on the processor.

I am definately interested in further information as I do not have hex editor installed to "Play" (that I know of at least).

datAdrenaline · Jul 11, 2008

Hey Bannana ...

Quick comment ... you said:
>> as I understand that any data type has to have some prefix to identify that it is a so and so data type <<

It is my understanding that the datatype of a variable is NOT stored in conjuction with the value ... in other words, if a variable is tagged (in the software) as a 2 Byte Signed integer, there is only 2 bytes of space claimed by it. The datatype of that variable is managed by the software, and quite possibly that information is stored next to the variable's value spot, but I beleive the type info is addressed separately from the value....

{please note, I am speaking with a tone of "I'm almost sure ... but not entirely confident" ... so I hope others will bolster this info, or not}

Banana · Jul 11, 2008

Reading Brent's post makes me wonder-

1) Is N bit actually a part of the variable's space, or something related to the processor as a flag describing about that variable just processed?

2) I had understood that any variables had to be either prefixed or pointed to so we would know how to read the string of bits- whether as a string encoded in ASCII (or even Unicode, or a zillions other way to encode the string), a number occupying X many bits, or as a raw binary data to perform boolean math- that sort of stuff. I may have been incorrect in saying that all variables had to be prefixed, as Brent pointed out that this is done by software's design, whether to use a pointer and index to map to the variable. I'm not so sure about "random access" where variables are created/used/destroyed at runtime, though.

BTW, Brent, you can get a free hex editer from Cygnus. Good one, if I say so.

datAdrenaline · Jul 11, 2008

Hey Gemma ---

>> it may be possible (might it?) the a two byte variable can actually be more efficient then a single byte (and definitely more efficient than a bit type <<

"Efficient" to me is dependant on the point of view ... if an application (ie:VBA, VB.NET, Access, etc.) claims to have a BIT (a true bit) datatype, then the possibility exists that it has the ability to optimize the claim for memory, meaning that with the first varible of BIT type claims a BYTE (or some multiple of bytes) is claimed ... but subsequent variables that are tagged to the BIT type just piggy back on the first BIT's claim of a Byte ... For example ...

BIT1 00000000 1 bit used but 7 bits "available" for more BIT typed variables
BIT2 00000000 2 bit used (BIT1 and BIT2) but 6 bits "available"
BIT3 00000000 3 bit used (BIT1, 2, 3) but 5 bits "available"

Which is an efficient use of storage, but maybe not so go with usage with the presumption that the Byte holding all the bits will be masked for the bit needed then the result dumped into its own byte for use in expressions and such ...

However, if a system claims a BIT type, but coerces the value to a BYTE for storage, then we have ...

BIT1 00000000 1 bit used, but 1 full byte is marked as unavailable
BIT2 00000000 00000000 2 bit used, and 2 bytes marked as unavailable

So this would be Inefficient for storage, but probably efficient for speed of operations ...

{again, please note I am speaking with a bit (no pun intendid) of speculation hoping for confirmation or not}

... Gotta run ...

Banana · Jul 11, 2008

Brent, MySQL has SET datatype which indeed does uses bit, and consumes a byte, just as you describe. A short article detailing about the SET..

The article doesn't detail whether the act of retrieving is fundamentally slower, though because we no longer need to join fully normalized tables to get the full SET, this can help query performance (even without the benefit of index, according to the author).

I'm pretty sure MySQL isn't the only one to implement bits in this fashion, neither does it have to be confined to RDBMS.

ChrisO · Jul 12, 2008

G’day Banana.

The reason I suggest that if the storage size of a Boolean changes it will increase in size is that it has already happened. In an 8 bit processor it was 8 bits, in a 16 bit processor it’s 16 bits. It actually seems strange to me that in the current 32 bit processors that it isn’t 32 bits.

But why? Well the way I see it is that the thing that determines true or false (the condition not the value of the constants) is the equivalent of the Z bit in the condition codes. But the Z bit is set or cleared on the result of the last operation on an internal register. As processors go from 8->16->32->64 the register size also increases.

Now when true or false is stored in memory it needs to be addressed at least with byte (8 bit) addressing. (I’ve never seen a computer that has single bit addressing in memory. Even those computers that have Test Bit N memory instructions don’t actually do that.) What happens is that 8/16/32/64 bits are fetched and stored in a register. At that point the Z bit is set or cleared based on the contents of the entire register. In order to go down to the single bit level a second instruction needs to be performed. That instruction is an AND instruction using a mask of the bit under test. When the AND instruction is complete, again the Z bit is set or cleared based on the result. So because of the lack of direct bit addressing in memory, testing to the bit level takes longer to test than to the register level…it requires the extra AND instruction.

Now that memory is cheap it makes little sense to pack bits in memory only to suffer the speed penalty when testing them.

>>1) How is N bit significant in dictating whether a value is False or True? if I read you right, Z bit is what determines falsehood?<<
Sorry if I gave that impression, the N bit plays no part in determining the logical condition of true or false, that’s the job of the Z bit. The N bit plays a part in returning the value of the VBA constants True and False (-1 and 0) because the value is interpreted in two’s compliment and we deed the N bit for that.

>>2) I'm not sure exactly where Z bit is, but assuming on a 32-bit x86 (Little endian), it would be on the 2nd bit next to the negative bit:<<
No it’s a separate hardware register. I think the Z bit is the logical NOR of the bits remaining after the last instruction. (If any bit is set in the register the Z bit is cleared.)

Hope that all makes some sense.

Regards,
Chris.

Banana · Jul 12, 2008

ChrisO,

Thanks so much for clearing up the water. So, Z bit isn't the bit that represent zero or one value of a given variable, but is given by the processor's register.

It now begs the next question- Why do we have to use Z bit, instead of actually looking up the value within that variable? Surely this would help us avoid making boolean variable getting bigger as we move onto 64 bit or even 128 bit?

ChrisO · Jul 12, 2008

“Why do we have to use Z bit, instead of actually looking up the value within that variable?”

Well that’s pretty much what the Z bit is. It’s the hardwired result of effectively looking up the value in the register. We do not look at values in memory but rather transfer the contents of memory to a register and allow the hardwired condition codes to look at the contents of the register.

ChrisO · Jul 12, 2008

G’day Brent.

I don’t believe I said that the bit pattern of -1 is 0000 0000 0000 0001 with the N bit set to 1

Do you mean this section: -
----------
Now the two’s compliment of a value is derived by flipping all bits and adding one.

So: -
1111 1111 1111 1111
becomes: -
0000 0000 0000 0000
add one: -
0000 0000 0000 0001 = 1

format the result as N = minus and we get –1 (the value of the VBA constant True)
----------

If so I was talking about how we derive the two’s complement value of the binary not the way it’s stored in memory.

The possible lack of correctness and/or clarity I referred to in post #8 is simply that I believe that a discussion about true/false can’t be clear without also looking at the hardware level. It’s at the hardware level that true/false is actually derived.

Regards,
Chris.

datAdrenaline · Jul 12, 2008

Hello Chris,

>ChrisO> The reason I suggest that if the storage size of a Boolean changes it will increase in size is that it has already happened. In an 8 bit processor it was 8 bits, in a 16 bit processor it’s 16 bits. It actually seems strange to me that in the current 32 bit processors that it isn’t 32 bits. <ChrisO<

Well .. in VBA a boolean is 32 bits, so your progression is right on the money Chris! ... But from my testing, and info from the web, a boolean is stored on the hard drive as a byte... This is probably a good spot to toss in the reminder that the storage mechanism may effect our conversation (ie: booleans are stored as a byte on the hard drive, but are stored in memory as 2 bytes, or 32 bits). <doh! ... brain lapse ... 2 bytes is 16 bits not 32 ... sorry! thanks ChrisO!!>

>ChrisO> What happens is that 8/16/32/64 bits are fetched and stored in a register. At that point the Z bit is set or cleared based on the contents of the entire register. In order to go down to the single bit level a second instruction needs to be performed. That instruction is an AND instruction using a mask of the bit under test. When the AND instruction is complete, again the Z bit is set or cleared based on the result. So because of the lack of direct bit addressing in memory, testing to the bit level takes longer to test than to the register level…it requires the extra AND instruction <ChrisO<

This seems to concur with what I formulated in my previous post with my first scenario of BIT storage (efficient for space, but not for speed).

>ChrisO> ... because the value is interpreted in two’s compliment ... <ChrisO<

With this comment, then we are in agreement that the memory location of the boolean value is stored in "All Ones" (1111 1111 1111 1111) ... yes? ... then I must have mis-read/interpreted some of the information that was presented in the UA thread ...

EDITS ADDED

... Chris ... you posted whilst I was composing ... so ... we are definately on the same page with this!

ChrisO · Jul 12, 2008

Actually 2 bytes are only 16 bits and it hasn’t kept up with the size of the processor.

The reason may very well be that the compiler is not fully optimised for 32 bits and that may be to maintain compatibility with 16 bit machines.

Regards,
Chris.

datAdrenaline · Jul 12, 2008

>> Actually 2 bytes are only 16 bits <<

Oh Man!!! ... I KNOW that ... I mean like REALLY know that!! ... I have NO idea what I was thinking! ... I will just blame it on a being tired on Friday!!! ... I will edit with credit.

The_Doc_Man · Jul 12, 2008

I think I should chime in with some history and some linguistics, computer style.

Formal languages sometimes include the keyword PACKED or its equivalent, in which case a Boolean would occupy 1 bit. You would define the Boolean variable as part of a packed record in which you defined not only the data type but the size and location. For example, see the Ada language. I use this feature now and then on my Alpha processor.

If you don't say PACKED then the implied meaning is "naturally aligned" for which your next available "natural" alignment is one byte, the smallest thing that can be allocated individually in an Intel based machine. So look at it as either "packed" (one bit) or "naturally aligned" (larger storage). Treat the two terms as opposites.

But suppose you weren't on a byte-oriented machine? On the old PDP-10 (Digital Equipment Corp.) or the IBM 7xxx series, if you didn't say "PACKED" in the language you were using, you might will have gotten a 36-bit Boolean because it was an odd natural word size.

The ORIGINAL way to implement Booleans often included oddball instructions such as BODD or BEVEN - meaning Branch (GOTO) if Odd number or Branch if Even number. I.e. look at the least significant bit (LSB) of the storage entity - whatever it was - and branch or don't branch based on the LSB setting. With ODD meaning TRUE and EVEN meaning FALSE. Not only did the computers have branch tests for plus/minus and zero/not zero, they also had some tests for odd/even and - on a couple of systems - parity set/clear.

The experiments regarding a stack vs. a record in Access are flawed in a sense because the stack doesn't naturally push or pop bytes on an Intel box. Stack ops really want to push 16 bits or 32 bits. Data stored in VBA that has nothing to do with a disk record is also not a good test for the size of a Yes/No field that will be written to a record. They are not the same animal. Both VBA and Jet perform casts to create a Boolean, but they cast to very different natural sizes.

In the end analysis, because a non-packed Boolean is ALWAYS a cast of some sort, you have to worry about the casting rules. Because the variable that was typecast must be interpreted, not used directly. And therein lies the trick - knowing the correct interpretation.

I have seen the following cases:

1. 0 = FALSE, all else = TRUE. Obviously, the machine in question has one or more instructions that are the equivalent of BZER - Branch if ZERO value

2. All even FALSE, all odd TRUE. Here, the machine needs BODD/BEVEN instructions.

3. All positive FALSE, all negative TRUE. Here the machine needs BGTR (Branch greater than zero) and BLSS (Branch less than zero).

So, in essence, the answer to storage size and value for a Boolean is going to depend almost entirely on the instruction set of the machine "hosting" the program being compiled. Find out what instructions are used to test Yes/No and you have your answer.

Now, that stuff about time and rounding.... adding (1/24) to Now() - let's look at that. A similar principle might be involved.

Time is another cast - of type DOUBLE. The flow of time in this cast is that the current date is exactly the number of days since the reference point, with day 1 equal 1 Jan 1900, and the fraction is the time of day since midnight. If you bother to do the math, you find that your DOUBLE format probably allows you 55 bits. We have yet to reach day 65536 so we are using only 16 bits for the number of days. That leaves 39 bits for the passage of time. You can do seconds as a number from 0 to 86400, less than 128K = 17 bits. By my count, that has consumed 33 bits, leaving 6 bits for fractions of a second.

I'm confused by the comments about "rounding errors" because as far as I know, MS Access won't show you time fractions lower than one second, and those 6 bits are more than enough to block excessive rounding errors. But if your time fraction was not forced immediately to a DOUBLE then I might agree that rounding would occur. I wonder if the fraction 1/24 was evaluated as

CDBL(1/24) or CDBL(1)/CDBL(24)

Those alternatives will give you two different answers past about 23 bits because of significance of the various digits. And therefore if you added them to Now() it would not surprise me to note the difference between the two results. But if I were doing the operation, trust me.... I would always use the more accurate method of adding.

Banana, I've been busy with work-related stuff. We are getting ready to activate our standby site as a test of our hurricane preparedness. The rule at work is that I cannot connect to a web site if there is anything pressing. Well, for a couple of months now, I've been more pressed than a cheap suit with trips to the USA "Left Coast" several times and having to assemble the server from individual parts. Then prep it remotely so we can run our big disaster recovery exercise.

Banana · Jul 12, 2008

So generally, it looks like we're coming to those points:

1) How variables are actually represented may be ultimately dictated by the specific machine's instruction set.

2) The variable may be typecasted several time between input and output

3) The variable's consumption may not be always same for every medium (e.g. 2 bytes in memory and one byte on hard drive).

I hope that sums up everything?

Now, my original intention for starting this thread was because I was alarmed that there were more typecasting going on under the hood than I originally bargained for, which made me wonder- how likely is that an implicit typecast will result in an error?

It is clear that they were pretty thorough with ensuring compatibility between Access's Yes/No fields (actually, this is Jet's, not Access's) and VBA's boolean because there is absence of holy hell being raised had it been poorly implemented. But let us suppose we found a hole where typecast returns undesired results. What are our options since this is a flaw outside of our control. This is probably what scared me back there. A case in point: API calls. They are very touchy and can go haywire because there is serious difference between VB's datatype and C's datatype. Some API may insist on strings formatted in a certain way or you get a lonely GPF. One API I've worked with long ago but can't remember had a particularly strange way to send a struct from VBA. We couldn't use a user-defined type, even though we can do so with other APIs, and thus had to use hex values to mimic the struct that parameter is supposed to have.

Generally speaking, I'd rather that the compiler slaps me upside down on the back of head with a jumbo-sized fish and berates me for not being explicit with my data types rather cheerfully accepting it and give me chocolate chip cookies while it "guesses" what the data type ought to be and fudge the result in case where it's not a exact match. Thus I am just a bit apprehensive of any implicit typecasting.

ChrisO · Jul 12, 2008

G’day Banana.

Bugs due to possible typecasting… that should be an interesting subject.

Well first up I can only think of two straight off the top of my head and some may say they are not typecasting errors but I think they are.

From the Help File.

"If you don't explicitly declare the constant type using As type,
the constant has the data type that is most appropriate for expression."

Code:

Sub Test()
    
  '  Const Thousand = 33000              ' data type Long
  '  Const Thousand = 33000@             ' data type Currency
  '  Const Thousand As Currency = 33000  ' data type Currency
  '  Const Thousand = 33000              ' no data type
  '  Const Thousand = 32000              ' no data type
     
    MsgBox Thousand * 2
 
End Sub

Try running each of the five lines.

And this one: -

Code:

Sub Test()

    Call UpdateFileExists("abc", "xyz", 0)

End Sub

Public Function UpdateFileExists(ByVal strFilePath As String, _
                                 ByVal strFileName As String, _
                                 ByVal blnState As Boolean) As Boolean
    
    ' Update all matching records with blnState.
    CurrentDb.Execute " UPDATE tblRecentFiles" & _
                      " SET FileExists = " & blnState & _
                      " WHERE FileName = " & Chr$(34) & strFileName & Chr$(34) & _
                      " AND FilePath = " & Chr$(34) & strFilePath & Chr$(34)
             
End Function

In the above the passed 0 (blnState ) gets converted to the string “False” and inserted in the SQL string.
Now that’s fine in the English version of Access.

In the Dutch version of Access it gets converted to “Waar” and inserted.
But SQL doesn’t speak Dutch so it fails.

So the question I would like to ask is if the compiler knows that we are passing a Boolean why does it convert it to a string?

Regards,
Chris.

datAdrenaline · Jul 12, 2008

ChrisO ...

Those are some good samples/possibilities!

The one with "Waar" and "True" is a good one to hilite ... Its really not that different than concatenating date literals in that the literal will be representative of your system settings, and JET really likes US formated (or ISO formated) date literals.

datAdrenaline · Jul 12, 2008

The Doc Man ...

I will try to be brief (HAH!!) as I am short on time ... so if I "sound" 'curt' or brash I am not intending to

...

Dates
In VBA/JET day 1 = 12/31/1899 12:00 AM; SQL Server day 1 = 1/2/1900 12:00 AM; Excel its as you state, day 1 = 1/1/1900 12:00 AM. The base date is often refered to as day 0, which is one less than stated, except in Excel where 12/31/1899 is not a valid date (I think)

<all the branch if instructions>
All those are ulitimately compared to a 0 to determine which way to branch, so to me truth still rests with being not equal to 0. Even in higher level languages, all tests are implicitly compared to 0, even if we think we are being explicit! ...

If x = 1 Then ...

Get implicitly compared to 0 in the compile code ... something like this ...

If (x = 1) <> 0 Then ...

That is where the Z bit comes into play ...

<Tests Flawed>
My tests did not mix hard drive stored and program memory. The "stack" test only tested the stack space consumed by VBA datatypes. The storage test only dealt with data stored in a table using HD statistics, NO VBA what so ever. I agree with you, and have stated as much, when you said something like "hard drive space and memory are two different animals". I believe my tests adhere to that fact.

My tests revealed RELATIVE sizes to support a concept .. some assumptions were made so there is room for error, especially in the stack overflow one simply because the stack may handle memory differently than program memory is handled. The hard drive space test, I feel fairly confident in that one, but then again, there may be a loop hole ...

<Packed and Natually Aligned>
Excellent bit of information! ... Thank you for sharing that!

... Okay ... I got to paint in my house today ... so I've GOT to go ... or I may not be married come tommorrow!! ....

Access's Yes/No Field=1 bit, VBA's boolean type=2 bytes?

split with a cherry atop.

Super Moderator

split with a cherry atop.

AWF VIP

AWF VIP

split with a cherry atop.

AWF VIP

split with a cherry atop.

Registered User.

split with a cherry atop.

Registered User.

Registered User.

AWF VIP

Registered User.

AWF VIP

Immoderate Moderator

split with a cherry atop.

Registered User.

AWF VIP

AWF VIP

Similar threads

Users who are viewing this thread