Thursday, August 13, 2009

Outer Joins, Nulls, and related evil

If you've ever written a query in a relational database language, you've probably encountered two things:

1. The concept of a left outer join
2. Nulls

The first isn't so hard to conceive, but the second causes wailing and gnashing of teeth in the minds of newbie database students and Oracle developers.

We won't go into the deep details of these monsters here, for we'll assume if you've read this far, you probably have a basic understanding of what they are. It is my duty here, however, to point out a subtle but dangerous hideousness that lay within these two when mixed in the cauldron of a query.

The basics: If you "left outer join" Table "A" with Table "B", you get all records from Table "A", and matching records from Table "B". If no records in "B" satisfy the join condition, the fields in the resultant query from table "B" are simply "null."

Suppose, for the sake of illustrating our Subtle Point of Evil, that your Table "B" has a nullable field that you need to ensure has no value, eg is null. You might be tempted to write:


Select *
  from A
  left outer join B
    on a.field=b.field
   and b.BigData is null


That seems innocent enough, doesn't it?

But it's wrong.

Let's explore this a bit while you cogitate on the problem.

Consider the situation in which Table "B" has a record that matches on field, but BigData has a value. Because the null test is part of the join predicate, the presence of that value in that "B" record causes the predicate to fail, resulting in no matching records. When this happens, consider the result: the fields from the record in table "A" (our "left" table) will be returned, but nulls for the fields in B indicating the failure of the join predicate (no records). Therein, however. is the evil: the result set appears to show a null value for B.BigData from the joined record, when in reality BigData for this value of field is not null! This means the null in the join condition causes an improper (and misleading) result to be returned - the indication data does not exist where it clearly does.

How do we fix this problem? The solution lies in recognizing the distinction between join predicates and where clauses. Joins hook together, where's filter. In this case, putting the null test in our join predicate is really not the best solution, because it amounts to a filter, not a join.

Fortunately, the solution is relatively simple, but subtle. Applying the idea that we want to think of joins as "glue," and wheres as "filters" that take place after the join is applied (or after the glue has dried), we realize moving the test to the where is the solution. Here's our winner, as follows:


Select *
  from A
  left outer join B
    on a.field=b.field
 where b.BigData is null


From our discussion above, we know that B has a record that matches across the field field, and as a result we know that it will be included in our left outer join result. This time, however, the where clause will inspect b.BigData for each record in our result set, and because our join condition returns the proper match, the theoretical record in "B" that contains a value in BigData for the matching field value will be eliminated, and the corret resultset returned.

Nulls and joins give more developers headaches than a lack of morning coffee. I showed this example to a DBA friend, and he was somewhat on the fence about whether the average developer would catch this subtlety. An experienced query writer shouldn't make this mistake, but a developer for whom query writing might not be a primary task could. In any event, its a critical example of where a technical subtlety could result in potentially serious consequences, and remind us all how important the "little things" are in every aspect of our jobs.

Blessings, all...

David