In the real world, we often don’t have all the data we want. We might be missing the time at which an experiment was performed or the postal code of a patient being given a new kind of treatment. Rather than leave what we do know out of the database, we may choose to insert it and use the value NULL to represent the missing values. For example, if there is a region whose population we don’t know, we could insert this into our database:
| >>> cur.execute('INSERT INTO PopByRegion VALUES ("Mars", NULL)') |
On the other hand, we probably don’t ever want a record in the database that has a NULL region name. We can prevent this from ever happening, stating that the column is NOT NULL when the table is created:
| >>> cur.execute('CREATE TABLE Test (Region TEXT NOT NULL, ' |
| ... 'Population INTEGER)') |
Now when we try to insert a NULL region into our new Test table, we get an error message:
| >>> cur.execute('INSERT INTO Test VALUES (NULL, 456789)') |
| Traceback (most recent call last): |
| File "<pyshell#45>", line 1, in <module> |
| cur.execute('INSERT INTO Test VALUES (NULL, 456789)') |
| sqlite3.IntegrityError: Test.Region may not be NULL |
Stating that the value must not be NULL is not always necessary, and imposing such a constraint may not be reasonable in some cases. Rather than using NULL, it may sometimes be more appropriate to use the value zero, an empty string, or false. You should do so in cases where you know something about the data and use NULL only in cases where you know nothing at all about it.
In fact, some experts recommend not using NULL at all because its behavior is counterintuitive (at least until you’ve retrained your intuition). The general rule is that operations involving NULL produce NULL as a result; the reasoning is that if the computer doesn’t know what one of the operation’s inputs is, it can’t know what the output is either. Adding a number to NULL therefore produces NULL no matter what the number was, and multiplying by NULL also produces NULL.
Things are more complicated with logical operations. The expression NULL OR 1 produces 1, rather than NULL, because of the following:
If the first argument was false (or 0, or the empty string, or some equivalent value), the result would be 1.
If the first argument was true (or nonzero, or a nonempty string), the result would also be 1.
The technical term for this is three-valued logic. In SQL’s view of the world, things aren’t just true or false—they can be true, false, or unknown, and NULL represents the last. Unfortunately, different databases interpret ambiguities in the SQL standard in different ways, so their handling of NULL is not consistent. NULL should therefore be used with caution and only when other approaches won’t work.