Thursday, August 13, 2009

Outer Joins, Nulls, and related evil

If you've ever written a query in a relational database language, you've probably encountered two things:

1. The concept of a left outer join
2. Nulls

The first isn't so hard to conceive, but the second causes wailing and gnashing of teeth in the minds of newbie database students and Oracle developers.

We won't go into the deep details of these monsters here, for we'll assume if you've read this far, you probably have a basic understanding of what they are. It is my duty here, however, to point out a subtle but dangerous hideousness that lay within these two when mixed in the cauldron of a query.

The basics: If you "left outer join" Table "A" with Table "B", you get all records from Table "A", and matching records from Table "B". If no records in "B" satisfy the join condition, the fields in the resultant query from table "B" are simply "null."

Suppose, for the sake of illustrating our Subtle Point of Evil, that your Table "B" has a nullable field that you need to ensure has no value, eg is null. You might be tempted to write:

Select *
  from A
  left outer join B
    on a.field=b.field
   and b.BigData is null

That seems innocent enough, doesn't it?

But it's wrong.

Let's explore this a bit while you cogitate on the problem.

Consider the situation in which Table "B" has a record that matches on field, but BigData has a value. Because the null test is part of the join predicate, the presence of that value in that "B" record causes the predicate to fail, resulting in no matching records. When this happens, consider the result: the fields from the record in table "A" (our "left" table) will be returned, but nulls for the fields in B indicating the failure of the join predicate (no records). Therein, however. is the evil: the result set appears to show a null value for B.BigData from the joined record, when in reality BigData for this value of field is not null! This means the null in the join condition causes an improper (and misleading) result to be returned - the indication data does not exist where it clearly does.

How do we fix this problem? The solution lies in recognizing the distinction between join predicates and where clauses. Joins hook together, where's filter. In this case, putting the null test in our join predicate is really not the best solution, because it amounts to a filter, not a join.

Fortunately, the solution is relatively simple, but subtle. Applying the idea that we want to think of joins as "glue," and wheres as "filters" that take place after the join is applied (or after the glue has dried), we realize moving the test to the where is the solution. Here's our winner, as follows:

Select *
  from A
  left outer join B
    on a.field=b.field
 where b.BigData is null

From our discussion above, we know that B has a record that matches across the field field, and as a result we know that it will be included in our left outer join result. This time, however, the where clause will inspect b.BigData for each record in our result set, and because our join condition returns the proper match, the theoretical record in "B" that contains a value in BigData for the matching field value will be eliminated, and the corret resultset returned.

Nulls and joins give more developers headaches than a lack of morning coffee. I showed this example to a DBA friend, and he was somewhat on the fence about whether the average developer would catch this subtlety. An experienced query writer shouldn't make this mistake, but a developer for whom query writing might not be a primary task could. In any event, its a critical example of where a technical subtlety could result in potentially serious consequences, and remind us all how important the "little things" are in every aspect of our jobs.

Blessings, all...


Thursday, June 18, 2009

The Technical Vacuum of American Industry

Having been in the software development industry for my entire career, I've seen how computing has revolutionized the way some aspects of business and economy function.

And I've seen that some businesses insist on remaining stuck in a 1960's era mentality of technology, using it as a resource only when necessary - like toner for a copier, or paper cups at the water cooler.

It says here that such a mentality is a deadly practice for any business wanting to leverage technology over the next two decades.

The new economy brought on by the advent of handheld personal computers and devices, tied to gether via a world-connected Internet is, amazingly, still hidden from the eyes of many managers, with the value of technology-savvy individuals realized only in a mop-up mode. The problem is that technology is being managed by, well, managers; managers without a clue what they're managing, and as a result, are piloting their companies into a technical vacuum.

The old-guard mentality around technology and software holds that you bring in an IT guy - the kind everyone likes to make fun of - after some process or project is well on its way, to see if maybe an Excel spreadsheet or a webpage might simplify, or at least publicize, some aspect of the operation. The worse model is the one where some half-baked notion of software is implemented without thought to design or scale, necessitating expensive after-the-fact support to be brought in to fix a disasterous implementation. The latter notion is frequently seen in companies that live under the idea that IT resources were largely interchangeable, disposable, but certainly never part of a company's "core competencies."

Such is precisely the model that must change in order for companies to adapt to 21st century technological realities. Technology experts must be present at every level of most enterprises to bring to bear their expertise on every aspect of an enterprise's operation. It must be brought in as a partnering peer at the beginning of every new project, every new concept, every new plan undertaken.

This is not a temporary change. Technology is now an integrated part of every day life, and forever will it remain so. A reinvention of the business management model that recognizes the mandatory inclusion of technology expertise up front is in order. How well businesses respond to the new world order of technology is unclear. Those who embrace the integration of technology will create for themselves a vital competitive advantage over the next twenty years. Those who don't will wonder why their grasp of technology seems perpetually inadequate, as they continue to bring in staff to close the technology gap only after-the-fact, increasing expenses while blaming the very technology they are unwilling to embrace.

Companies that think they are forward-thinking in this regard probably believe in their approach to IT because they've added a CIO or CTO position in their senior hierarchy, and leave such people to design networks, control desktop deployments, manage their printers, and staff helpdesks. They manage bulletin boards and field arguments of Microsoft versus Apple, Windows versus Linux. They're also the ones to heed the call to cut IT costs when lean times arrive.

Hear this well: That is not enterprise technology integration.

A technology-integrated company mandates the assessment of risk and benefit that can be brought to bear on every enterprise project, no matter how distant a technical angle might be. It leverages the knowledge that, for example, electronically connected design teams can cross continental boundaries and timezones to achieve a continuously operating work force. It identifies how corporate intellectual property can be protected and leveraged to the profit of an entire company, and set trends for an entire industry. It gains insight into the operations of business elements across diverse domains of operation and learns how technology can build efficiencies into the enterprise.

Companies with the kind of forethought necessary to bring technology to bear from the top down, rather than backfilled, will be the Microsoft's and Google's of the next generation. Those who don't will, in all likelihood, go the way of GM, hoping a misguided government will bail them out of their own shortsightedness.

Sunday, June 7, 2009

New designs, bad habits?

Here's a poser for the Object-Oriented development crowd.

OO tells us to encapsulate; to keep our object methods small (or, in the word of the jargon, atomic). Build objects that have simple methods, and that helps make the objects reusable. By the same token, objects also have state that is resposed in one or more member variables that may or may not be exposed by public properties. State, however, is tyically expensive, because persistence implies the memory and similar resources necessary to implement it.

Atomicity and state indirectly tend to work against each other. If my methods are too small, it necessarily suggests I'm going to push out elements to the class level. But if I push too much to the class level, I run the risk of creating classes that may need increasingly complex persistence mechanisms which, in turn, suggests a class that may be too broadly scoped. Yet if I decompose (or factor) an object too much, the fragmented design becomes a nightmare to maintain. It's not clearly a vicious circle, but it's a cautionary cliff to avoid.

Here's a shadowy example.

Suppose you have a class:

public class Something
    public void InterestingMethod1()
    int ImportantVariable; something interesting...

And, without typing them here, suppose you have several similar methods in this class, each with a similar "ImportantVariable" declaration. Now, the casual observer would probably suggest that the repetition of that variable could indicate that it should be declared at the class level, as such:

public class Something
    int ImportantVariable;

    public void InterestingMethod1()

If, however, we start referencing "ImportantVariable" in our atomic methods, don't we reintroduce an old villain in our nice, object-oriented code? It seems to me that in a class of any appreciable size that does any appreciable work, factoring out common variables to the class level starts to look a lot like our old nemesis - global variables. We all know they're bad, don't we? That is, a substantive module wherein we declare a variable once, then it has scoping across all methods, allowing a single change to wreak all manner of unintended consequences. But isn't that precisely what member variables are allowing us to do in even moderately complex classes?

I won't pretend that I have the answer here, nor that this microexample is anything but a strawman example of the point. So I'll throw out the question- what is the "right" answer? When do our atomic methods have elements like local variables factored to the class scope, risking global behavior; when do our classes have members pushed down to methods for the sake of atomicity?

When, indeed? The floor is open for debate...

Friday, May 29, 2009

Hello...and an introductory musing


Here begins a blog devoted to things computing and development oriented, ranging from the whimsical to the technical and everything in between. As topics evolve, feel free to contribute and comment for the benefit of everyone.

I've been a developer in the Windows environment for over 20 years, back in the halcyon days of Visual Basic, and later into the object-oriented world of Java, and more recently C# in ASP.NET. Most of my focus in recent years has been on database development in Microsoft SQL Server. I run a small Samba-based network at home, and have contributed to various technical publications over the years.

This brings us to today's inaugural post: a question of simplicity.

I'm a pretty simple person. I like plain mashed potatoes with a little butter and salt. I like hamburgers without a lot of gourmet trappings. And I prefer my programming code to be simple, too.

Having developed ASP.NET applications in C# for some time, it should come as no surprise to have encountered a situation with a site having multiple pages serving as content in conjunction with a master page. Each subsequent page gets a reference to a master page, and if programmatic access to elements of the master page is needed, you use the page's Master property to access them. Now, this works, and has obviously working for some time now, but it seems an unnecessarily complication.

The situation above, in my own head, screams as a matter of inheritance. In my ideal world, I would declare a base master page, a base content page, and derive all such pages in my projects:

// ideal ASP.NET subclass model
public class MyMasterPage: MasterPage

public class MyBaseContentPage: Page

public class AppContentPage: MyBaseContentPage

In this world, the source files for AppContentPage amount to a single ASPX file that includes the ASP:Content control markers for inclusion into the inherited master page, and the corresponding .cs code-behind file. That's it.

Now, you can "kinda" do this in ASP.NET today, but not quite. Note that these declarations omit the "partial" keyword - that's because they're hard classes. You can declare "hard" classes that reside in a web application folder called "App_Code", but as they're compiled to a separate assembly, they have no knowledge of master pages or other components. You can create page files with partial class declarations that inherit hard classes from App_Code, but you don't inherit the ASPX file that goes with it.

The inheritance model I dream of here eliminates the need for the "Master" keyword to access elements of the inherited master page, which I've always thought of as a bit of an ugly hack. For the notion of "installable" master pages, the inheritance notion also suggests that master page developers could be led more naturally to the implementation of standard interfaces to bridge the gap between the presentation and the content, not to mention the use of events to decouple the master page from its consumer. That "Master" keyword leads to ungainly and knarled code that, further, tends to gloss over design partitioning issues that, in turn, slow productivity.

As I noted, I understand you can "kinda" do the things I've discussed in ASP.NET, but not in a way (so far as I know) that truly embraces the notion of object-based inheritance.

Am I wrong? Have I missed some huge boat here?

Let me know.