Results 1 to 30 of 39
Thread: minor programming rant
-
2019-09-13, 01:25 PM (ISO 8601)
- Join Date
- Jan 2009
minor programming rant
In the spirit of camaraderie and sharing small bugs that cause big problems, sharing something that just happened.
I'm updating a program from about 6 months ago, as the reporting format changed and they added a couple reports. Nothing too bad, but annoying in that I have to figure out what I did before and at least one change is rather big so I have to undo some data cleaning and then recode it a different way.
I just made one "minor" change to the code to try to remove pre-kindergarten students, and about half of my output (the part not related to Grade level) vanished.
When I started this post, I was kicking myself for not saving a backup copy of my program before fiddling with it, since I had no clue how I lost half my output from some minor changes.
But then it hit me and turned out an easy fix. Grade is stored numerically, where 0 = Kindergarten and negatives are pre-k. I added a line
If Grade < 0 then delete
when I needed to write
If Grade < 0 and Grade NE . then delete.
The half of my output that was deleted were statistics not related to Grade, so the Grade column was null. And null is less than 0 according to the programming language's logic.
I feel quite relieved, and hope this will teach me to save versions of my program better.
-
2019-09-13, 01:38 PM (ISO 8601)
- Join Date
- Feb 2007
- Location
- Manchester, UK
- Gender
Re: minor programming rant
Weird logic in whatever language you're using, that--in SQL server, at any rate, NULL is explicitly not equivalent to *any* value; this is why you can't find all the NULL values in a list using something like SELECT * FROM list WHERE column = NULL, because the "column = NULL" will never be true even if the value in the column is actually NULL. (You would have to use IS NULL instead of the direct equals statement).
-
2019-09-13, 01:51 PM (ISO 8601)
- Join Date
- Mar 2007
- Location
- Oregon, USA
Re: minor programming rant
Right. NULL doesn't represent no value, it represents an unknown value. "NULL = NULL" is itself NULL, because it's unknown whether two unknown values are equal to each other.
"NULL < 0" would be NULL for the same reason (it's unknown whether an unknown value is less than 0)...so it certainly seems weird for an "if" to act on an unknown value like that.FeytouchedBanana eldritch disciple avatar by...me!
The Index of the Giant's Comments VI―Making Dogma from Zapped Bananas
-
2019-09-13, 02:02 PM (ISO 8601)
- Join Date
- Jan 2009
Re: minor programming rant
The language is SAS, and I think that's just how SAS orders numerics for both sorting and comparison purposes. Null/missing is less than any actual number, hence less than 0 for logical operators.
I started to think of some way it could be a strange logic resolution, like the "Grade < 0" being unknown when Grade is null, and "if unknown" then somehow passing as true, but I'm pretty sure null defaults to False.
But I just tested something like
if . then foo=1
and foo was not set to 1.
So, thinking it's just the ordering logic in that language.
-
2019-09-13, 06:50 PM (ISO 8601)
- Join Date
- Jan 2007
Re: minor programming rant
This nicely explains, why some languages have three possible outcomes of IF statements respectively for true, false and no clue.
In a war it doesn't matter who's right, only who's left.
-
2019-09-15, 09:04 AM (ISO 8601)
- Join Date
- Jul 2008
- Location
- Sweden
- Gender
Re: minor programming rant
When I first read this I was like "WHAT??? Since when is null not no value", I set out to find out if you're right and turns out the answer is also null.
http://www.dbta.com/Columns/DBA-Corn...ll-102619.aspx I got this site and it says like you do "A null represents missing or unknown information at the column level". Always thought null meant "no value", not "unknown/missing"Black text is for sarcasm, also sincerity. You'll just have to read between the lines and infer from context like an animal
-
2019-09-15, 10:33 AM (ISO 8601)
- Join Date
- Feb 2007
- Location
- Manchester, UK
- Gender
Re: minor programming rant
There are already perfectly good ways to represent no value--e.g. 0 for a number or an empty string for a text field. There wouldn't be much point in NULL if it did the same thing.
-
2019-09-15, 01:09 PM (ISO 8601)
- Join Date
- Aug 2005
- Location
- Mountain View, CA
- Gender
Re: minor programming rant
"Unknown/missing" is a concept that clearly has many use cases where it needs to be represented in some way, and what other way is there than null? Same with "uninitialized", and of course "no value". This is part of the big problem with null as a concept in programming - there are at least three different concepts it can represent, maybe more I haven't thought of, and few languages have any way to indicate which one of them any particular null is supposed to mean.
Those are terrible ways to represent "no value", because they are in fact actual values too.Last edited by Douglas; 2019-09-15 at 01:10 PM.
Like 4X (aka Civilization-like) gaming? Know programming? Interested in game development? Take a look.
Avatar by Ceika.
Archives:
SpoilerSaberhagen's Twelve Swords, some homebrew artifacts for 3.5 (please comment)
Isstinen Tonche for ECL 74 playtesting.
Team Solars: Powergaming beyond your wildest imagining, without infinite loops or epic. Yes, the DM asked for it.
Arcane Swordsage: Making it actually work (homebrew)
-
2019-09-15, 02:51 PM (ISO 8601)
- Join Date
- Feb 2007
- Location
- Manchester, UK
- Gender
-
2019-09-15, 04:16 PM (ISO 8601)
- Join Date
- Jan 2007
Re: minor programming rant
I beg to differ: 0 is a valid data value in many cases and is vastly different then having no value at all. Being able do distinguish knowledge of there being nothing from lack of knowledge about something is important. If you put 0 as an unknown value, then you can never have 0 as a legitimate value, since you could not distinguish between two of its uses otherwise. The same goes for an empty string.
Languages geared for numerical calculations even go further catching infinities and "not a number" results with each having its own designation.In a war it doesn't matter who's right, only who's left.
-
2019-09-15, 05:35 PM (ISO 8601)
- Join Date
- Mar 2007
- Location
- Oregon, USA
Re: minor programming rant
Yeah, I thought the same thing for years, largely because all the applications I worked with treated it as "no value"...but it turned out that in SQL itself, NULL means "unknown value". Which explained a lot of otherwise bizarre things like why "NULL = NULL" wasn't true, and why IS NULL constructs existed.
The closest thing to that I remember is integer fields where only 0 and positive values made sense, so negative values were used to signify special cases (like weapon ranges in Red Alert 2, "-2" meant "skip the distance check").FeytouchedBanana eldritch disciple avatar by...me!
The Index of the Giant's Comments VI―Making Dogma from Zapped Bananas
-
2019-09-16, 01:38 AM (ISO 8601)
- Join Date
- Feb 2007
- Location
- Manchester, UK
- Gender
-
2019-09-16, 04:45 AM (ISO 8601)
- Join Date
- May 2009
Re: minor programming rant
Certain virtually-prehistoric languages (read: BASIC, back in the day) would initialise a new variable as zero (or '', for a string) as soon as it was named. (So for instance, "IF A = 0" would return true, if A had never previously been mentioned.)
SQL has very definite ideas about NULL. I'm surprised to learn of a language where NULL < 0, but then it takes all sorts to make a world. (Even some who haven't grasped the idea of version control, apparently...)"None of us likes to be hated, none of us likes to be shunned. A natural result of these conditions is, that we consciously or unconsciously pay more attention to tuning our opinions to our neighbor’s pitch and preserving his approval than we do to examining the opinions searchingly and seeing to it that they are right and sound." - Mark Twain
-
2019-09-16, 05:11 AM (ISO 8601)
- Join Date
- Aug 2013
-
2019-09-16, 01:20 PM (ISO 8601)
- Join Date
- Jul 2008
- Location
- Sweden
- Gender
Re: minor programming rant
Black text is for sarcasm, also sincerity. You'll just have to read between the lines and infer from context like an animal
-
2019-09-16, 01:55 PM (ISO 8601)
- Join Date
- Dec 2006
Re: minor programming rant
The closest analogue that I'm familiar with would be NaN (not a number) in C, in which any comparison to it is false. So (x <= 0.0) and (x >= 0.0) would both be false for NaN. I could envision a language where for some reason they thought it made more sense to have the condition always evaluate to true instead of always evaluate to false.
-
2019-09-16, 09:02 PM (ISO 8601)
- Join Date
- May 2009
Re: minor programming rant
Come to think of it, there's an inconsistency in the SQL treatment. If you select MAX from a column that contains nulls, the nulls are ignored and you get the highest non-null value.
But according to above logic, you should get NULL - because you can't know if one of the unknown values may be higher than the current maximum."None of us likes to be hated, none of us likes to be shunned. A natural result of these conditions is, that we consciously or unconsciously pay more attention to tuning our opinions to our neighbor’s pitch and preserving his approval than we do to examining the opinions searchingly and seeing to it that they are right and sound." - Mark Twain
-
2019-09-16, 09:40 PM (ISO 8601)
- Join Date
- Mar 2007
- Location
- Oregon, USA
Re: minor programming rant
FeytouchedBanana eldritch disciple avatar by...me!
The Index of the Giant's Comments VI―Making Dogma from Zapped Bananas
-
2019-09-16, 11:04 PM (ISO 8601)
- Join Date
- May 2009
Re: minor programming rant
"None of us likes to be hated, none of us likes to be shunned. A natural result of these conditions is, that we consciously or unconsciously pay more attention to tuning our opinions to our neighbor’s pitch and preserving his approval than we do to examining the opinions searchingly and seeing to it that they are right and sound." - Mark Twain
-
2019-09-16, 11:25 PM (ISO 8601)
- Join Date
- Dec 2010
Re: minor programming rant
An example of 'no value': you have a dataset of molecules, and one derived quantity is the fraction of double bonds to single bonds on nitrogen atoms in that molecule. If there are no nitrogen atoms in that molecule, the nitrogen double bond fraction isn't 'unknown', it's 'not applicable'.
"No value" would imply to me that the code should raise an error if that entry is ever directly used in a computation or comparison, whereas "Unknown value" could plausibly just convert things its used with into "Unknown value" and still make sense. Though, honestly, this is really annoying behavior in practice and while it makes logical sense, it ends up creating really hard-to-debug results in simulations since you can't easily find where the NaN or NULL originated from.
The sort of unknown value propagation behavior makes more sense to me if you're operating in a framework that has some sort of method to impute or approximate those unknown values automatically, though there we're getting to a point where we might as well just say that everything is a variable with some statistical distribution and 'unknown value' just means e.g. 'no additional information other than that it's part of the same dataset'.
-
2019-09-17, 12:37 AM (ISO 8601)
- Join Date
- Mar 2007
- Location
- Oregon, USA
Re: minor programming rant
I suppose that's true....At the same, any aggregate function that works with values would need to return NULL if there's any NULL in the set it's working on, undercutting the general usefulness of aggregate functions without some sort of prefiltering scheme...and it turns out SQL's prefiltering scheme is "filter out NULL values before passing to aggregate functions that aren't COUNT(*) ." Which, incidentally, means you could compare a COUNT with a COUNT(*) to check for the presence (and number) of NULLs, when that's important.
Is it kind of a hacky way to say "MAX returns the highest known value" and not need special steps for when a value might be (or might ever become) NULL? Yes...but I think I'm okay with erring on the side of maintainability, in this case.FeytouchedBanana eldritch disciple avatar by...me!
The Index of the Giant's Comments VI―Making Dogma from Zapped Bananas
-
2019-09-17, 02:08 AM (ISO 8601)
- Join Date
- Aug 2013
Re: minor programming rant
I was gonna say. SQL (like Excel) probably works on the assuption that from a database you want whatever result can be got, even if there are entries missing. If it didn't, basically, it would be entirely useless as you'd almost always end up working with "bad" data of some kind.
Last edited by snowblizz; 2019-09-17 at 02:10 AM.
-
2019-09-17, 06:07 AM (ISO 8601)
- Join Date
- Feb 2007
- Location
- Manchester, UK
- Gender
Re: minor programming rant
To be fair, you could filter the results yourself in the event aggregate functions didn't work that way, e.g. "SELECT MAX(value) FROM list WHERE value IS NOT NULL". Arguably that might be better, because it would make aggregate functions consistent with other uses of NULL.
-
2019-09-17, 08:32 AM (ISO 8601)
- Join Date
- Jan 2007
Re: minor programming rant
The question is, when would you ever use aggregate function without filtering NULL values? How would that even work (aside from propagating NULL)? If noone would ever need such a functionality and everyone would have to always specify that they want ot exclude NULL values, then this is not the way to go.
Also, any unknown value is by default assumed to come from the same probabilistic distribution as the ones we have. This means, it would not affect the calculation of distribution parameters in a significant way unless there is a bias in which data points we failed to obtain. But in this case it means you need to go back to data gathering procedures. Bottom line is, excluding unknown values from aggregate functions is justified.In a war it doesn't matter who's right, only who's left.
-
2019-09-17, 08:38 AM (ISO 8601)
- Join Date
- Dec 2010
Re: minor programming rant
For what it's worth, I encountered an interesting problem where the NULL pattern was correlated with observables. The dataset was constructed by fusing two measurement techniques - one which was more complete, but narrow in what objects it could capture; the other of which captured more objects, but not every field.
In this case, you would get weird results assuming you could draw NULLs from the data distribution.
-
2019-09-17, 09:25 AM (ISO 8601)
- Join Date
- Jan 2007
Re: minor programming rant
In a war it doesn't matter who's right, only who's left.
-
2019-09-17, 11:28 AM (ISO 8601)
- Join Date
- Dec 2010
Re: minor programming rant
It's the dataset of observed exoplanets. Some are detected with radial velocity measurements (gets period, but only determines a lower bound on the mass and no inclination information), some via transits (gets radius, period, maybe some other things like eccentricity and inclination), or both (allowing you to know the mass field because you have independent measurement of the inclination).
For transits, it has to be big enough to pass between the Earth's field of view and the disk of the star, so you have a bias towards short periods and/or large radii. For radial velocity measurements, it can be further away, but the size of the effect is based on mass (and star type might factor in). And so on.
In the end, we trained a generative model to reproduce the distribution and found that when we trained with missing data using e.g. a masking scheme to hide those values, the network actually used the fact that some values were masked to improve its estimates of other values we were testing it on. So for instance, there was more information about the orbital period in the pattern of NULLs than there was in the radius, star type, etc.
We didn't actually figure out how to solve this, so we just proceeded using the 600 or so exoplanets that had the complete set of fields we ended up retaining. But this means we had to drop stuff like eccentricity, which would have been cool.
-
2019-09-17, 11:52 AM (ISO 8601)
- Join Date
- Mar 2007
- Location
- Oregon, USA
Re: minor programming rant
If you're only interested in one aggregate value, that's pretty easy. If you're trying to work with interrelated data, though, you probably don't want to drop known values simply because another column associated with them is NULL. And this gets "exciting" if you're working with joined tables and can't be sure in advance there'll be matches between the tables (ie, any time you'd want to use a left/right/outer join rather than an inner join). And/Or if you're using grouping or window functions, where applying filtering on the spot is awkward at best. True, a hypothetical clause like "MAX(value) WITHOUT NULL" could have been used to explicitly declare what SQL does implicitly...but then we're at what Radar already said; is quasi-mandatory boilerplate just to make explicit what nearly everyone does really worth it?
Semi-tangential: aggregate functions work on expressions, not necessarily columns; so functions like IFNULL/NVL can be used to replace a NULL value with an approximation, if desired...and if the approximation is NULL, it's still excluded like any other NULL value is.FeytouchedBanana eldritch disciple avatar by...me!
The Index of the Giant's Comments VI―Making Dogma from Zapped Bananas
-
2019-09-17, 02:08 PM (ISO 8601)
- Join Date
- Feb 2007
- Location
- Manchester, UK
- Gender
Re: minor programming rant
Maybe not in this particular instance, but personally I like to write code that makes sense without the inside knowledge of what the language considers an exception or not. That's why I never use BETWEEN in SQL but instead use >= and <=, because I can never remember if BETWEEN includes the endpoints or not and I suspect I'm not alone in that. OK, that means I sometimes add redundant code elements, but I do so for the sake of readability.
-
2019-09-17, 08:15 PM (ISO 8601)
- Join Date
- Jan 2007
Re: minor programming rant
So that means the NULLs were not random. From the measurement techniques alone there are some bounds on what is detectable or not, which means that NULLs do carry some useful information. As they say, lack of news is good news.
Understanding those patterns however without going through the data personally might be not possible.In a war it doesn't matter who's right, only who's left.