SFMC SQL and the Three-Valued Logic involving NULL


Cross posted to HowToSFMC from GORTONINGTON.

Many people do not realize that the SQL used in SFMC Query Activities follows Three-Valued Logic, which essentially means there are three possible outcomes for any Boolean value:

  • True
  • False
  • Unknown(NULL)

A value that is NULL therefore cannot be true or false.

How is this relevant? Well, this causes issues with conditional statements in SQL because any WHERE clause or conditional statement in your Query will need to account for the NULL values or they will always be filtered out of your results.

Conditional statements (including ON, WHERE, CASE, HAVING, etc) need to return a value of TRUE (also can be shown as 1) in order for the value to meet the conditions. For instance, WHERE 1 = 1 will return TRUE and that row will be included in results, but 2 = 1 will return FALSE and that row will not be included in results.

Now, when we introduce null, things get different. NULL = 1 will result in NULL and not be included. This is OK as you are only looking for a specific value, but if you do a not equals, NULL <> 1, then you run into an issue. You would expect null to be returned as it is not equal to 1, but it is not. The issue is that Null compared to anything equals NULL so NULL <> 1 will return NULL, not TRUE, and therefore will be excluded from your returned data.

For instance, lets say TableA has values in ColumnA of 1|2|NULL and you use the following SQL:

SELECT *
FROM TableA
WHERE ColumnA <> 1

This would only return the 2nd row with a value of 2.

So how the heck do you make sure that NULL values are not lost in your returned data?

The answer is that you would need to have them specified explicitly in your query. There are a few ways to handle this and which one you chose will depend on your use case and comfort level as each has their own benefits and pitfalls.

Using X IS NULL or X IS NOT NULL:

By explicitly comparing your column to NULL, you will then look for NULL = NULL, which will then return true and be included in your returned data. This is usually used with an OR operator to allow you to include both statements and if either one returns TRUE, then the values are returned.

SELECT *
FROM TableA
WHERE ColumnA <> 1
OR ColumnA IS NULL

Some of the major benefits to utilizing X IS NULL is that it is SARGable and allows indexing to still be used – which can speed up processing for indexed tables/data extensions. Albeit, many of SFMC data extensions do not have much indexed in them as all indexing needs to be accomplished via Support and they are usually reticent on doing this without justified reasoning. To note though, all Primary Keys on a data extension are actually, by default, indexed. Which can sway this to be more relevant.

The main pitfall for this is that it adds another conditional logic statement to your Query, which means that you need to account for this in your logical operators. Meaning if you have a complex and verbose ‘filter’ for your Query, this can then become a game of Jenga, where you hope the logic does not fall down with the added statement. This also then adds complexity when debugging your conditions as well.

Now, that being said, this pitfall is likely to be more of an excuse for using the other options than an actual pitfall (I know I am guilty of this), but it is still something to keep in mind.

Using the ISNULL function:

Another option you can use to reduce the number of conditional statements inside of your query is to use the ISNULL() SQL function:

SELECT *
FROM TableA
WHERE ISNULL(ColumnA, 2) <> 1

This will then take any null value inside of ColumnA and assign it a value of 2 before doing the comparison logic. This will then make all null values return true from that condition.

The major benefit of this function is that it can inline change the value of a null to another defined value. This will essentially take WHERE ColumnA <> 1 OR ColumnA IS NULL and turn it into WHERE ISNULL(ColumnA,2) <> 1 which simplifies things quite a bit.

The downfall of this is that it makes the query non-SARGable and removes the benefits of indexing, which could slow down your queries.

I do also want to note that this can be done via COALESCE() as well:

SELECT *
FROM TableA
WHERE COALESCE(ColumnA, 2) <> 1

So, now what is the difference? Why use ISNULL instead of COALESCE or vice versa?

Comparing ISNULL() to COALESCE():

Both ISNULL and COALESCE have a similar purpose but can behave differently. COALESCE, by definition, returns the first nonnull expression among its arguments, where ISNULL replaces NULL with the specified replacement value.

One advantage COALESCE has over ISNULL is that you are able to utilize more than 2 inputs – allowing for fallbacks if the preceding entries are all null. COALESCE also utilizes the data type of the input argument where ISNULL uses the data type of the first input. Data type is one of those things that can bite you in the arse if you don’t realize it is being changed or set outside your explicit direction.

ISNULL has a bit of an advantage when it comes to subqueries as it internally does not evaluate an input expression more than once – where COALESCE will run each time (similar to a CASE statement – which essentially is what it is shorthand for)

There are a few other differences between these, but I felt these were the most pertinent and relevant to share associated with SFMC.

My recommendation would be to:

  1. Always use X IS NULL for Conditional statements (WHERE,HAVING,etc.) to allow for indexing use.
  2. Utilize ISNULL for Select statements when assigning fields inside of subqueries for potentially null value replacement
  3. Utilize COALESCE in top level queries when you would need multiple fallbacks for multiple potential null values

Anything outside of those three listed above are dependent on your best judgement. Hope this helps!

All Rights Reserved
Made with by your fellow SFMC users.
All Rights Reserved
Made with by your fellow SFMC users.