Google “nullable fields”, “nullable columns”, or “null debate” and you can read about people arguing about whether nullable columns in a database table are evil and are slower to process (among other topics that hold no real interest for me).  I want practical rules-of-thumb when I am in a situation where I think I might need a nullable column.  What issues can they cause for me as the developer of the common business application?

 

For me, a nullable column is a design scent, not quite a design smell, but I am starting to get a whiff of something that might not be pleasant. Often the nullable column is justifiable, other times I would argue it is not. Like any design decision there are many, many factors that can justify or condemn a nullable column.

 

I present here my most basic rules-of-thumb when deciding what to do with a possible nullable column. If the business significance of the entity in question is great, then these basic rules-of-thumb might be too basic. That said, often these heuristics are all I need.

  • If nothing governs the nullable field of the entity then it’s OK
  • If one attribute of the entity affects the nullability of another then I might give this nullable attribute a little more thought.
  • If one attribute of the entity affects the nullability of multiple other fields then it’s probably not OK

If nothing governs the nullable field of the entity then it’s OK

The middle name attribute on a person name entity is a good example of this. There is nothing about the entity that dictates whether or not the middle name attribute is null or not. We just have the data, or we don’t.

 

If one attribute of the entity affects the nullability of another then I might give this nullable attribute a little more thought.

We might have an address entity that has a Boolean attribute IsUSA and when this attribute is true then postal code is required. This is OK, but it would start discussions on whether or not we have two entities (Domestic and Foreign Address).

 

If one attribute of the entity affects the nullability of multiple other fields then its probably not OK

We are most certainly missing an entity when we find ourselves in this situation and are representing two or more entities in one table. Either we have different base entities or the multiple attributes are describing different entities that are related to the base entity. Our model loses clarity. We have to acknowledge that we are representing more than one entity and understand the benefits and drawbacks.

Example of two different base entities in single table: Security Entity (ID, Name, Security Type, Comment, BData1, BData2)

When Security Type is Bond then the BData1 and 2 attributes have to be filled in. We should acknowledge that we have two entities, a Security and a Bond.  This model does not make it clear that there even is such thing as a Bond and that only it uses the BData 1 and 2 attributes.  It appears that a Bond is a Security so they share some base data (ID, Name, Security Type, and Comment).

Example of one entity with relationship to multiple entities in single table

Contract Entity (ID, Name, Code, DataA1, DataA2, DataB1, etc)

When Code=A then DataA1 and DataA2 need to be filled in. When Code=B then DataB1 needs to be filled in, etc, etc. In this example we don’t have two base entities, we still have only one kind of Contract. However, the Contracts are associated with Code Specifications that are not modeled in our design. ContractEntity(ID, Name, Code) CodeASpec(Code, Data1, Data2) CodeBSpec(Code, Data1), etc.

 

 

The practical benefits and drawbacks to representing multiple entities in one table

Benefits

  • Combining the entities into one may be exactly how the system consumes the information in the table. For example, we may find that the table is never queried alone and the other tables are always joined in which complicates the data consumption.
  • Entering data requires going to one table which is a benefit if data entry is being done by hand and the data entry person benefits from seeing all the attributes possible.

Drawbacks

  • When multiple entities are represented in one table we suffer the standard drawbacks of not separating our concerns which should be a basic credo for every developer.
  • If another table requires a foreign key into the table in question but only for the records of one of the entities then referential integrity gets pretty complicated
  • Lack of clarity in our model as to what entities exist.
  • Lack of clarity in our model as to why a null exists. Is it because we don’t have the data or because the data is not required to be there?
  • If the multiple entities are ever consumed as separate entities then we complicate things by separating the entities in stored procedures and or code.
  • General logic and joins with nullable fields are more error prone
  • Referential integrity is made more complex by the need for triggers and or constraints to be written and maintained for the one table managing multiple entities.
  • Possible sign the model has frozen and the system is no longer evolving but rather being hacked just to get work done. Often a table evolves into storing data for more than one entity and when we don't address it we often start to suffer the broken window syndrome.
  • A system with multiple entities in single tables often cannot be extended without change. Instead of adding a new table for more functionality we end up editing existing tables and their constraints, possibly breaking existing functionality. (Granted this is tied to separating our concerns.)

 

Interesting Note

It can appear that putting multiple entities in one table is breaking data normalization, specifically third normal form.  The third form of normalization states that all data is dependent on the primary key. In the case of nullable columns that are dependent on a value outside of the primary key we might figure we have broken normalization.  However, the rule is about attributes on an entity that don’t tie back to the primary key but rather another attribute in the table. The attributes we have been talking about are attributable back to the primary key.