FAQ article on nulls needs correction (1 Viewer)

ButtonMoon · May 29, 2016

I have commented on this FAQ article by The_Doc_Man before. Unfortunately I just came across another reference to it and so I'm making a plea to reason here: please, please correct this article (or preferably delete it!)

Here is an excerpt:

... very specific - meaning of NULL in set theory. In the latter theory, NULL means the same thing as "empty set" - and it takes the same exact meaning whether you are doing simple sets, joined sets, unioned sets, or any other kind of set you can imagine. NULL = no members in the set. The population of the set is 0. More specifically for JOINS, it means that the INTERSECTION of the joining sets is empty.

This is so completely wrong. In relational and SQL terms the empty set is a very different thing to null.

Brief digression: It's true that in the past the phrase "null set" was occasionally used by mathematicians as an alternative name for the empty set. That usage predates relational databases. "Null set" is a term very rarely used today and it has absolutely nothing to do with the database meaning of "null" which is quite different.

In Codd's relational model the empty set is not the same as null at all. x = x is TRUE if x is the empty set but if course x = x would not be true if x were null. In relational algebra the result of σ(1=0)R is an empty set (an empty relation), but the result would never be null. The empty set is a value (a set) which therefore must be distinct from null according to Codd's "Rule 3". Relational theory (and set theory for that matter) would be reduced to nonsense if the results of unions, intersections and joins returned nulls instead of the empty set.

The same is true in SQL and Access of course, where the empty set (the empty result of a query) is treated very differently from null. Null, by definition must be part of a non-empty result of a query.

The site's FAQ section is presented as a curated set of explanations of common questions, so it seems right to ask that these faults be cleared up.

The_Doc_Man · May 31, 2016

ButtonMoon, I respect you and understand that this 9-year-old article is one with which you disagree. I believe, however, that I am still correct in the statement that there are a multiplicity of nulls to be found in the context of Access and SQL. If you dislike my choice of nomenclature, OK. Fine. But the concept I was discussing is not wrong.

One of those "multiplicity of nulls" is the character ASCII NULL. Since in communications, there is a real character of this name, we simply have to say that this name is a poor choice for modern usage, but it is what it is. It is still used for serial devices that have to establish what is called "framing" or "frame synchronization" (depending on the articles you read) for a packet transmission. You can also send it as a "KeepAlive" marker for those connections that require it. This is also called a "heartbeat" in some protocols.

Another of these nulls is the value that results when you do an OUTER JOIN and don't take special action for those cases where the dependent side of the JOIN doesn't bring back any records. The independent side will still be represented once in the result set, with null values for the fields that were defined by the incomplete intersection of those two merging sets as formulated by the JOIN.

If somehow you manage to store the entire result set of this query, some recipient fields would be null-valued (assuming you took appropriate measures to assure that storage occurred despite the presence of that null-result error.) That is why there is an IsNull function and an NZ function. They don't exist in a vacuum!

I believe that when you quoted the segment that you did, you became guilty of cherry-picking (and nit-picking) at the same time. The overall tone of that article was that people were grousing about some issues with nulls and I pointed out that the problem was a DESIGN problem. And again, it might be a matter of nomenclature, but a null set is a set that has no members, i.e. is empty. Whether it is empty or null, it means that whatever you did gave back nothing to you. Sometimes a distinction without a difference IS no difference.

With the exception of the ASCII NULL (which IS a character), in all other cases of nulls you have the situation where someone's design failed to take into account the absence of data, and they were using the null to represent a state other than the states normally represented by the variables they used.

For example, there was a situation in the original context for which someone used Yes/No fields to be tri-state, not bi-state, because a null value is neither TRUE nor FALSE. My comments were designed as a response to this case and its incorrect behavior, because what SHOULD have been done was to have a separate flag saying "damaged set" or "incomplete set" or something specific to the situation.

Finally, let me also point out one more thing... I didn't put the article in the FAQ section. One of the site's administrators did that NINE YEARS AGO. The other administrators have had the chance to review it many times and remove it or reply to it. But they did not.

I understand if you dislike the article and think it is misleading, but I do not think that it is so bad when taken in its complete context. If you honestly think the article needs revision, I respectfully request that you bring it up with the administrators. But I also honestly believe that you are straining over nomenclature and missing the intent of the original article.

Button, PLEASE understand that I am not in any way trying to disrespect your viewpoint. I merely disagree with it. And remember, when dealing with issues like this one, you can take all the experts in the world on a subject and lay them end-to-end, but they still won't reach a conclusion.

ButtonMoon · Jun 1, 2016

I think you are looking for something complicated in something that is very simple:

CREATE TABLE t1 (x INT);
INSERT INTO t1 VALUES (NULL);

Now try the following two queries:

SELECT * FROM Table1;
SELECT * FROM Table1 WHERE 1=0;

See the difference? The first SELECT query returns a null. The second query returns an empty set. The empty set is not null and the null is not an empty set because those are completely different things. They also mean very different things in relational database terms. You can call the empty set a "null set" but that doesn't make it the same as a null.

The_Doc_Man · Jun 1, 2016

This whole discussion started nine years ago when someone wanted to know what NULL meant. The answer was (and still is, for that matter) "nobody knows for sure."

If I do an OUTER JOIN as the source for an INSERT INTO statement and it happens that at least some of the inserted records have no values from the dependent member of the JOIN, you will still get NULL values in those fields.

I don't have to explicitly put NULL into a field to get a NULL. I get a NULL because something else didn't have a value to put there, and let's be clear. I don't know from looking at it whether some explicitly assigned a NULL value or someone implicitly assigned a NULL because of an OUTER JOIN with missing values from a dependent JOIN member. I also know that I can't do certain functions when the input includes NULLs.

It is perhaps only my opinion and not widely shared, but in my not-always-humble opinion, any time you attempt to use NULL as the (n+1)th value of a field that nominally can contain only n values, you have made a mistake in your design. Don't use a null to assign a 3rd value to a Y/N field or a 257th value to a byte integer, etc. That is purely laziness on your part if your design depends on that feature. If you test for NULL fields and use that SOLELY as a way to decide if you got an incomplete result, that might be OK. But otherwise, it has no easily discernible meaning.

This might seem a bit far afield for you, but my dissertation included explicit attention to the idea that using mathematical analysis has an implication - i.e. that the analysis you were about to perform was in fact appropriate. My favorite anecdote from that time was the common use of the Linear Regression formulas to derive reaction rates from what I liked to call "shotgun plots." The point distribution looked like someone tacked graph paper on a target and fired a shotgun at it from a distance. (Small shot, otherwise you would of course have no paper left...). Then they took coordinates of each pellet hole and turned that into X-Y pairs. Then they ran a regression analysis.

You know what they got? A straight line. But here's the REAL question: What did it mean? Answer: Not a damned thing, because linear regressions with more than a few points will ALWAYS give you a straight line. But if you don't ask yourself the right question up front, you still get a trash result. The right question? Was this treatment appropriate? For the shotgun plot, the answer was a resounding NO (unless the guy with the shotgun was still there and kind of twitchy...).

This relates to the question that triggered that original discussion: What does NULL mean as a value? In any situation except for the narrow case of the ASCII NULL character, the answer is that NULL has no meaning at all.

It is like division by zero. What is the answer to x/0 where x is not 0? The answer: UNDEFINED - unknown, unknowable. The operation itself has no meaning and should not be attempted. So here, running across NULL results - and then USING them to mean something - is always to be carefully controlled.

If you dislike my nomenclature, just remember that I am a pragmatist. Past a certain point, THERE IS NO DIFFERENCE between a null set and an empty set when you are trying to make a decision based on some query that didn't give you back an answer. You can tout the precise definitions all you like. You still can't work with either case except by anticipating their possible occurrence and testing for those cases first.

ButtonMoon · Jun 1, 2016

Past a certain point, THERE IS NO DIFFERENCE between a null set and an empty set when you are trying to make a decision based on some query that didn't give you back an answer.

Are you really saying that you reject the Closed World Assumption - keystone of the relational model and SQL?

Under the CWA, given the following query:

SELECT age FROM employees WHERE empname = 'zsmith';

If the empty set is returned then that means there is no employee called 'zsmith'.

If NULL is returned then that means there IS an employee called 'zsmith' but his age is not recorded.

That is a major difference to most people and it's a difference that in my opinion keeps a whole database industry alive and producing useful results every minute of the day.

The_Doc_Man · Jun 2, 2016

And again, YOU are missing MY point. If I have any problem with the FAQ posting, it was that part of the original context was removed.

For your sample query: What do you do to test for either case? If your design includes that query, how do you test for those results? My original quibble was not that you would get back 0 records - or might get back a record with a NULL field value. It was what you did about it afterwards.

Further, if your Employees table is actually an Employees OUTER JOIN query that involves a primary table and a supplemental data table, and if 'zsmith' has a primary entry but no secondary entry, you will get back a NULL value for age but it was because the OUTER JOIN in question got back an empty set for that person. I.e. BOTH a null value AND an empty set situation. OK, it is hypothetical - but so is this entire discussion.

You are focusing on nomenclature and missing the point of the original article:

MURPHY'S LAW:

Whatever can go wrong WILL go wrong - so make plans for non-optimal results.

Most folks don't remember Ed Murphy's second clause, they only remember the first one. But Ed was a pragmatist of the highest order. His law wasn't cynicism, it was a guideline on experimental design. Back during the early days of rocketry experiments, he was working on rocket sled experiments. Some poor volunteer underwent a 9-G pullout for nothing because someone else plugged in the test equipment upside down. You know how Murphy fixed that? The shaped plugs that can only be inserted into their sockets in one way, no other. Every computer plug on every computer you buy these days either is shaped in a way to prevent incorrect insertion (RJ23 or RJ45, e.g.), or they are designed for there to be no difference in up/down/right/left (e.g. coax cable).

FWIW, I know the difference between a NULL field and an empty set. May I offer you an article? OK, this is not the best possible reference in the world, but it is reviewed by many people and, being subject to peer review/correction, is likely to not be terribly wrong. It's also easily accessible, which helps make it easer to find.

https://en.wikipedia.org/wiki/Null_set

Under Properties:

The empty set is always a null set. More generally, any countable union of null sets is null. Any measurable subset of a null set is itself a null set.

Maybe you think that I am somehow disrespecting set theory? Take it up with the authors of the Wikepedia article. A null set includes the empty set as a matter of pragmatism, though a null set can also be something that is structured (i.e. a union of null sets has structure but is still null.) I KNOW THE DIFFERENCE in theory. I'm trying to put it into practice.

I'm reminded of the infamous line from the Woody Allen movie What's Up, Tiger Lily[/] - I'd accuse you of being a sadistic, sodomistic necrophile, but that would be beating a dead horse.

The_Doc_Man · Jun 2, 2016

Follow-up:

http://whatis.techtarget.com/definition/null-set

In mathematical sets, the null set, also called the empty set, is the set that does not contain anything.

http://www.onlinemathlearning.com/empty-set.html

There are some sets that do not contain any element at all. For example, the set of months with 32 days. We call a set with no elements the null or empty set.

http://www.icoachmath.com/math_dictionary/empty_set.html

The solution set for an equation that has no solution is also called an empty set. It is also called a null set.

Now, if you want to get into quantum mechanics, you can perhaps find a meaningful physical-world distinction between null sets and empty sets. For example,

http://www.phy.duke.edu/~rgb/Philosophy/axioms/axioms/node18.html

If you want to define sets by "what's in the box" models, Access cannot have a physics-level Null Set because if you can define the criteria for membership in the set, you can define the "box" even if it is empty. And that means that for anything with definable membership criteria, the set isn't NULL, it is EMPTY. Access cannot return a TRUE null set in the quantum physics sense of the term because if you can't define it for access, you can't even begin to work with it.

Button, are you still going to insist on tail-chasing? If so, please clarify your objection. Otherwise, I think I have shown enough references to suggest that your fixation on null sets vs. empty sets IN THE CONTEXT OF ACCESS is simply not warranted. Unless, of course, you want to do some quantum physics computations using VBA? {shudder}

ButtonMoon · Jun 2, 2016

The_Doc_Man said:
Button, are you still going to insist on tail-chasing? If so, please clarify your objection. Otherwise, I think I have shown enough references to suggest that your fixation on null sets vs. empty sets IN THE CONTEXT OF ACCESS is simply not warranted. Unless, of course, you want to do some quantum physics computations using VBA? {shudder}

As I already stated more than once, my "fixation" is not on null sets vs. empty sets at all but on null vs. empty sets. In Access or in any SQL-based database they are not the same thing and don't mean the same thing. Since you keep returning to the different topic of null sets I don't see anyway to continue the discussion, which seems a shame.

The_Doc_Man · Jun 2, 2016

I also know the difference between NULL (as a field's property) and the NULL SET and an EMPTY SET. NULL is a PROPERTY of a field, not a value of it.

Doubt me?

You can do "if [MyField] = value" but cannot do "if [MyField] = Null" because NULL isn't a "real" value. You CAN do "Nz([MyField],0)" or "IsNull([MyField]) = True" - both of which avoid treating Null as a value. The KEYWORD "NULL" might even be usable in your example:

Code:

INSERT INTO t1 VALUES (NULL);

but the ability to use it in a VALUES list does not make it a value. Remember that if you want to test for NULL in SQL, you do not do a "WHERE ABC = NULL" because that won't work. You have to use NZ or ISNULL for the test in SQL. Doesn't work much better in VBA, either. You can do "if RecordsetVariable IS NULL" but not "if RecordsetVariable = Null" because they don't mean the same thing, not even the same CLASS of thing.

For what it is worth, there are a few cases in hardware situations where you have to let numbers co-exist with non-numbers. For instance, IEEE Floating-Point format has a configuration of bits in the exponent field (with an empty mantissa) that says "NaN" - "Not a Number" - for any uninitialized variables. But you can't get a "NaN" result for IEEE floating values. It only occurs by direct initialization.

Regarding the original comments: In this case I will NOT back down: It doesn't matter in the slightest that you can encounter a NULL in a field or WHY you can encounter it in the field. It matters as to what you do when you encounter it. If you don't plan ahead of time to define what you will do, then it is sloppy programming and an exemplar of poor design skills. Can you not see that obsessing over the difference between a Null field and an empty set return is asking the wrong question?

If you are up on Zen Buddhism, you might recognize this exact philosophy as it relates to just about anything "Here is X. What do I do about it?" You don't ask "Why is there X?" first. It's the wrong question. You don't ask "What is X?" first. That's also wrong. You always ask how to manage/cope with X. "What action must I take for X?" is a valid question and is part of a good planning strategy.

plog · Jun 2, 2016

Not that it matters, but I fall on the side of Doc. And by that I mean, you guys aren't taking opposite positions of one argument: Blue is having one conversations and Doc is having another.

Blue seems to mistake Doc's conversation as an argument against his because Doc doesn't say that Blue is right (which Blue is). Doc keeps saying the validity of Blue's point in Blue's conversation isn't relavant to the conversation Doc is having (and it isn't).

So, in conclusion, you are both right within the conversations you each keep trying to have, but you are having 2 different conversations and only Doc seems to see that.

The_Doc_Man · Jun 3, 2016

Button... I repeat this and have no problem with saying it. If you have a suggestion of a specific and narrowly tailored edit that needs to be made on the FAQ article, I would agree to it as long as it does not blunt the point of the article regarding how one should plan for all contingencies when designing a database, both at high and low levels.

HOWEVER, having said that, I would question whether I can reach into the article now that it has been moved to the FAQ section. An administrator would have to tell me about that part of it.

I don't know how else to proceed here.

Thanks, plog, for the moral support.

And... Button, please note that your comment:

Brief digression: It's true that in the past the phrase "null set" was occasionally used by mathematicians as an alternative name for the empty set. That usage predates relational databases. "Null set" is a term very rarely used today and it has absolutely nothing to do with the database meaning of "null" which is quite different.

is incorrect in a general sense. The references I offered showing some equivalence between the terms "null set" and "empty set" were made more recently than the advent of relational databases. Your definition seems applicable only to a narrow situation. One might even be led to question whether Codd's nomenclature was unjustifiably hair-splitting since most other disciplines do not coincide with this usage. Not trying to throw stones at anyone, but just pointing out this wouldn't be the first time someone had to "coin a phrase" to describe something in a way that wasn't completely supported by the language. It is a side effect of the fact that (in Goedel's terms) English is a complete but not correct language.

Would you be happier if, in the portion of the FAQ article you highlighted, that we swapped "NULL SET" for my use of "NULL" ?? Or do you have wider objections? I don't want you to see me as explicitly being difficult for no purpose.

Please also note that in my government job, there is a different culture and I don't consider myself successful unless I give at least as many headaches as I get on a given day.

But I have no intent to give YOU a headache.

pbaldy · Jun 3, 2016

The_Doc_Man said:
HOWEVER, having said that, I would question whether I can reach into the article now that it has been moved to the FAQ section. An administrator would have to tell me about that part of it.

One way or another we can get it done. We can either open the thread and let you edit it or one of us (admin or mod) can apply your desired edits.

The_Doc_Man · Jun 3, 2016

If we can get Button to agree on an edit that doesn't gut the intent of the article, I have no objections. Can't be more cooperative than that.

Button, the specifics have to come from you.

FAQ article on nulls needs correction (1 Viewer)

ButtonMoon

Registered User.

The_Doc_Man

Immoderate Moderator

ButtonMoon

Registered User.

The_Doc_Man

Immoderate Moderator

ButtonMoon

Registered User.

The_Doc_Man

Immoderate Moderator

The_Doc_Man

Immoderate Moderator

ButtonMoon

Registered User.

The_Doc_Man

Immoderate Moderator

plog

Banishment Pending

The_Doc_Man

Immoderate Moderator

pbaldy

Wino Moderator

The_Doc_Man

Immoderate Moderator

Similar threads

Users who are viewing this thread