Solved Data Type Inaccuracies

dalski

Member
Local time
Today, 19:34
Joined
Jan 5, 2025
Messages
31
My problem is similar to a previous thread (unable to link SPAM warning) regarding a high degree of accuracy to adjusting fields (a fractional small number; a multiplier of another field [a currency data type]). The consensus seems to store the multiplier as a double & the currency as a currency data type to 4dp. That sounds like it will work well to my needs (I won't bloat the thread with my needs here; just trying to learn what's going on). I understand that there are storage sizes & storage accuracy though MSN article is lacking yet again... Seems it's all related to how numbers are stored in binary. I would have thought these essential specs would be on the DataType MSN article. Decimal would be perfectly accurate but concerned that it'll greatly slow the db.

After much reading:
Screenshot_2.jpg


Testing
Testing by repeating digits 123,456,789 in row1 & repeating this group of numbers twice in row2:
Screenshot_1.jpg


I'm confused that MSN states a Single can store up to 39 significant figures. Yet fails at storing a 9 digit; non-floating number straight away. Converting back the notation 1.2345678*10^8 yeilds 123,456,800; opposed to the real value typed in 123,456,789; so we are 11 full digits of innaccuracy on a basic number. Despite changing the auto decimal point to 9 digits.

MSN says:
a Single with a floating point:
  • -3.402823E38 to -1.401298E-45 for negative values
  • 1.401298E-45 to 3.402823E38 for positive values
So in my feeble mind a single should be able to store up to 340,282,023,000,000,000,000,000,000,000,000,000,000 as a positive number using the single data type. No doubt I have shamed myself in failing on converting? Yet testing the inaccuracy is already rife in a single in row1 by typing in 123,456,789 it then shows the nr in scientific notation as 1.234568 e+08; (so converting that we end up with 123,456,800; already 11 digits out.

Currency is as expected from the MSN article; 15 digits to left & 4dp. Cool thanks MSN as tested in Currency4dp row2.
 

Attachments

Last edited:
A single has a large range, but not large accuracy.
This can support 7 significant figures with 6 decimal places.
As such I understand why it exists but in this day and age of cheap storage, personally I really don't see any use for it.
A double can support 15 significant figures with 28 decimal places.

Edited - see my post below for a more accurate description.
 
Last edited:
Thanks Minty, I am in disbelief as to the lacking info on accuracy on MSN. Managed to find 'Set The Field Size' on MSN under Access > Tables > Creating Tables > Set The Field Size. Hopefully helps someone out in the future (can't link as lacking privileges). I must be wrong here because in the field size it says 15 sf on a double; so there must be something else to the FIELD SIZE which sets the accuracy...
 
Last edited:
Actually, having written the above, I have found a more trustworthy source from Microsoft - you just have to hunt around for it:


Scroll down to the number section where you will find:

Field Size:
  • Byte — Use for integers that range from 0 to 255. Storage requirement is 1 byte.
  • Integer — Use for integers that range from -32,768 to 32,767. Storage requirement is 2 bytes.
  • Long Integer — Use for integers that range from -2,147,483,648 to 2,147,483,647. Storage requirement is 4 bytes.
Tip: Use Long Integer when you create a foreign key to relate to another table's AutoNumber primary key field.

  • Single Use for numeric floating point values that range from -3.4 x 1038 to 3.4 x 1038 and up to seven significant digits. Storage requirement is 4 bytes.
  • Double Use for numeric floating point values that range from -1.797 x 10308 to 1.797 x 10308 and up to fifteen significant digits. Storage requirement is 8 bytes.
  • Replication ID Use for storing a globally unique identifier required for replication. Storage requirement is 16 bytes. Note that replication is not supported using the .accdb file format.
  • Decimal Use for numeric values that range from -9.999... x 1027 to 9.999... x 1027. Storage requirement is 12 bytes.
 
Last edited:
I'm confused that MSN states a Single can store up to 39 significant figures.

Can you return to that article and examine it more carefully? A SINGLE can store a number up to 10^38 with precision to 7 full digits and that 8th digit is a "rarely" case. Any "significant digits" discussion becomes trickier because the position of the decimal point becomes an issue in that discussion. How many significant digits are in the number 1.0000000000000? That depends on how you measured it. Scientists in a lab with no computer presence whatsoever argued over "numeric significance" 100 years ago.

To answer your question, you need to understand the format and storage concepts. To start with, you have a bucket of bits for each data type. For a SINGLE variable, the bucket holds 32 bits. The DOUBLE bucket holds 64 bits. There is a format supported by the Intel "Longhorn" series of processors that holds 128 bits, but Access doesn't support that one so we can skip it for this forum. Since the hardware data bus width is your hard limit, you can only transport those buckets of bits in fixed-size gulps. So a SINGLE is ALWAYS 32 bits and a DOUBLE is ALWAYS 64 bits.

You have to allocate bits in three groups. These groups are the sign, the exponent, and the mantissa; some older authorities use "fraction" instead of "mantissa" but ever since the standard came out, it is "mantissa". The organization IEEE (you can look that up online) established "IEEE 754" as a standard for SINGLE and DOUBLE. That aforementioned 128-bit number has a different standard because it came much later. Here is a link from AMD that explains the IEEE 754 standards for you.


In brief, both SINGLE and DOUBLE allocate 1 bit to the sign. The convention is that if sign bit is SET/TRUE/ON, the number if negative. That leaves 31 bits for SINGLE and 63 bits for DOUBLE.

The standard is for scientific numbers, so we are talking about a format that, in decimal, has some number of digits and a decimal point plus a separate exponent representing a power of 10. So (for chemists, you knew this) Avogadro's number is 6.023x10^23, the number of molecules in one "mole" of an elemental substance. If you have the Atomic Weight (in grams) of a substance, you have one mole.

However, for computers, you aren't working in decimal, you are working in binary. All exponents are powers of 2, not of 10. A SINGLE number allocates 8 bits to the exponent. A DOUBLE allocates 11 bits. IEEE 754 also states that the exponents are stored in "excess" notation. For SINGLE, that means that the exponent appears as an unsigned integer but to use it, the hardware subtracts 128 - and thus you hear about "excess 128" notation. For DOUBLE, that is "excess 1024" notation, same idea but different number of bits.

What is left is the mantissa. A difference between "traditional decimal" and "computer binary" mantissas is the location of the fractional point. In decimal scientific notation, a decimal scientific number is d.dddd...dddx10^eeee - which is to say scaling of the exponents leaves one digit to the left of the decimal point and the remainder of fractional digits to the right, with a couple of exceptions. A binary scientific fraction has NO digits to the left of the fractional point. ALL digits are considered as fractions. That is, the number is .bbbbbbb...bbbbbx2^(eeee-"excess") - more or less.

When I mentioned that scaling moves the fractional point for decimal numbers to leave one digit to the left, that process of adjusting the fractional point is called "normalization." An assumption is made that for a binary fraction, the first bit is always 1 unless the number is "true zero" (all bits 0, whether 32-bit or 64 bit format) - and therefore IEEE 754 includes a feature called "hidden-bit" normalization. Since you know that 1st bit is always 1, you don't need to show it. Which means that your mantissa for SINGLE = 32 bits - 1 sign - 8 exponent + 1 hidden bit = 24 bits of mantissa. For DOUBLE, 64 bits - 1 sign - 11 exponent + 1 hidden bit = 53 bits of mantissa.

Here's a rule of thumb to figure out how many decimal digits you can expect. Remember that 10^3 is close to 2^10 (1000 ~ 1024). So every 10 bits of mantissa is about 3 decimal digits. Single = 24 bits, so that's 10 bits + 10 bits + 4 bits, or 1000x1000x16 - and you can express (roughly) 16,000,000 with that mantissa - and that is 7 decimal digits. Double = 53 bits = 8,000,000,000,000,000 so you can express 15 decimal digits and squeak out a 16th in rare cases (when the first digit is 4 or less.)

Side effects of this notation scheme allow greater range for negatives than for positives. This hits hardest for the exponents. The side effect occurs because signed binary numbers have to include 0, which is not a negative number. So a short binary integer ranges from +127 through 0 to -128 - and that same difference permeates number representation on computers to this day. SINGLE and DOUBLE have one extra step for negative exponents representing numbers less than 1. A SINGLE ranges from 10^38 to 10^-39, roughly. Note also that when your problem is numbers with larger exponents, DOUBLE has 3 extra bits, so you might in some cases want to use DOUBLE, not because of the extra precision but because of the extra exponent range.

@dalski, this should tell you everything you needed to know (and maybe just a little bit more) about a computer's precision of numbers. Remember that rule of 2^10~10^3 because it applies to EVERYTHING in binary computing.
 
Fascinating topic which I always wish I understood but did not until now thanks to you both.

Thanks @Minty,

Scroll down to the number section where you will find: ...

I did not comprehend that the accuracy would be limited to the surprisingly small amount of significant figures, quite interesting. I would have thought that there would be linked articles prompting this ; especially as it's an Introductory Article; maybe it's just me justifying my density.

Remember that 10^3 is close to 2^10 (1000 ~ 1024). So every 10 bits of mantissa is about 3 decimal digits. Single = 24 bits, so that's 10 bits + 10 bits + 4 bits, or 1000x1000x16 - and you can express (roughly) 16,000,000 with that mantissa - and that is 7 decimal digits. Double = 53 bits = 8,000,000,000,000,000 so you can express 15 decimal digits and squeak out a 16th in rare cases (when the first digit is 4 or less.)

Thanks @The_Doc_Man, this is very useful.

a SINGLE is ALWAYS 32 bits and a DOUBLE is ALWAYS 64 bits.
Screenshot_4.jpg

Cool, so that makes sense to me with yourself saying it, so 8bits * 4bytes (Double stated on MSN) = 32 bits. That makes sense, but I'm only aware of this because you have kindly informed me. If I was reading MSN with no help. Having watched 'Bits & Bytes' videos on Youtube... I always wondered what size a byte was nowadays & if it was universal. Seems where 8bits for a byte these days.

Thank you both again.
 
Sorry - what about the data type for a calculated field in a Query? There does not seem to be the Field Properties available? One could think that an implicit Typecast occurs dependant on the data types of the info used in the Query, but I would have thought this is quite dangerous & the User has specification of what Data Field Properties he/she wants the result to be in?

Screenshot_5.jpg
 
You would use a CCurr() , CLng(), CDbl() type of expression to force it to a specific data type.

I think Access is quite lose with data type conversions.
You can easily do "text" & 1 and it will work. "text" + 1 won't however.

 
Sorry - what about the data type for a calculated field in a Query? There does not seem to be the Field Properties available? One could think that an implicit Typecast occurs dependant on the data types of the info used in the Query, but I would have thought this is quite dangerous & the User has specification of what Data Field Properties he/she wants the result to be in?

View attachment 118509
Along the lines of what Minty said, the result of a calculation in a query is dependent on the datatypes of the values involved in the calculation. It's up to you to manage them at that level, i.e. with the component values.

You're right in another sense, although it has a much broader applicability, I think.

With Access we have a tool that can sometimes play loosie-goosey with inputs, or coerce sloppy input to what it thinks is appropriate for a given context. In some contexts, we can get away with using delimiters other than the designated date delimiter, for example, because the context makes it possible to assume a string of digits represents a date.

"01/02/2025" vs #01/02/2025#

IMO, we can become complacent about datatypes as well as other references because Access tries so hard to take away the heavy thinking required.

Again, IMO, that's a bad habit to get into. It means we give up fine control over what our database applications are really doing and opens the door for unanticipated and undesirable results.

Returning to the case at hand, while it's not necessary to do so in all cases, it's also a reasonably good idea to coerce values to the Datatype you want when there's even a slight chance it might go wrong.
 
One could think that an implicit Typecast occurs dependant on the data types of the info used in the Query

One would be almost right except that typecast has an explicit meaning different from the way you are using it. A true "typecast" in the context of VBA means the data type stays the same size but its interpretation changes. This happens most often for DATE vs. DOUBLE and for BOOLEAN vs. INTEGER data types. It also happens when you use any of the bitwise logic operators (such as AND, OR, and XOR), because the signed integer that typically holds the variable is treated as an UNSIGNED integer. (Yeah, I know... not much of a stretch.)

There is a concept called "coercion" in which the implied datatype of something that isn't already explicitly declared will be "coerced" into the data type corresponding to the largest data type involved in the computation. This coercion occurs during query or statement analysis, BEFORE the actual computational work is done. Where there is any ambiguity at all, it occurs when you use numeric constants as part of the expression. That ambiguity is due to ANOTHER rule says that in an expression, a literal (constant) value occupies the smallest data type required to hold it. The two rules - one about constants, the other about expression variables - interact during expression analysis to optimize the data storage size required to hold an expression's result.

I frequently reference the VBA Language Specification, but this time I'm not going to try to quote it on the subject of "coercion." That would require me to copy several multi-page tables and quite a few paragraphs. Suffice it to say it is a complex topic with LOTS of transition rules where a coercion to a larger data type becomes really complex. Including rules where a text-string of digits (i.e. a quoted numeric constant) can be coerced to a numeric format in order to participate in the computation.
 
One would be almost right except that typecast has an explicit meaning different from the way you are using it. A true "typecast" in the context of VBA means the data type stays the same size but its interpretation changes...
Astutely observed; I was referring to data-type conversion only so kudos for recognizing this on such limited info. Sounds like VBA is quite different to other languages in this area.

"coercion" in which the implied datatype of something that isn't already explicitly declared will be "coerced" into the data type corresponding to the largest data type involved in the computation. This coercion occurs during query or statement analysis, BEFORE the actual computational work is done. Where there is any ambiguity at all, it occurs when you use numeric constants as part of the expression.
Interesting, I would never have figured that out. If I'm interpreting this correctly (I'm not going to try this as I'm just a beginner) - hypothetically this could save memory size of the saved db (assuming it saves to the data-type spec'd). At compilation time it then inherits the largest data-type in the relevant functions.

That ambiguity is due to ANOTHER rule says that in an expression, a literal (constant) value occupies the smallest data type required to hold it. The two rules - one about constants, the other about expression variables - interact during expression analysis to optimize the data storage size required to hold an expression's result.

Thanks, this is exactly what I am doing & very relevant. If you had not posted this i would've been lost. I had been leaving the Double Field Properties decimal point position as auto in the exact hope that the opposite was true; storing as accurate a value as possible.
 
Last edited:
Remember that the second rule relates to constants where the decimal point position is usually explicit - but implicit for integers. In fact, if you have an integer but then drop in a trailing decimal point anyway, that syntax MAKES the data type either SINGLE or DOUBLE even though the number itself would have representable as an integer.

Besides that, there is a hidden "gotcha" regarding math. IF you do a division of integers, you can still get a scientific result - if you use the "/" operator - but not if you use the "\" operator or the MOD operator. The latter two operators return integers.

I had been leaving the Double Field Properties decimal point position as auto in the exact hope that the opposite was true; storing as accurate a value as possible.

You are still confused about one issue. "Decimal point position = Auto" has nothing to do with storage precision - only with display precision.

The expression will always be the optimum precision until a very specific moment - which is the moment you have to store a value. This is true for SQL where you are storing or updating a field or in VBA when you have an assignment statement such as X = SQR(PI) (just to be a little obscure.) If X has been explicitly declared in VBA, or if there is a field waiting for a value during INSERT/UPDATE operations, you have one last coercion.

NO MATTER WHAT the expression's internal data type has become, that moment of storage is the final coercion to make the value match the destination storage. In that VBA Language Reference I often quote, there is a right-coercion (that applies to the overall expression) and a left-coercion that changes the expression value's representation if necessary for storage.
 

Users who are viewing this thread

Back
Top Bottom