Sunday, April 12, 2020

Join on nullable column

There is a interesting behavior I noticed in SQL Server. Let me show it to you.

First let's prepare tables and data:

CREATE TABLE T1 (A UNIQUEIDENTIFIER);
CREATE TABLE T2 (A UNIQUEIDENTIFIER);

;WITH CTE AS ( SELECT A FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8)) X(A) )
INSERT INTO T1
SELECT NEWID() FROM CTE AS CTE1
CROSS JOIN CTE AS CTE2
CROSS JOIN CTE AS CTE3
CROSS JOIN CTE AS CTE4
CROSS JOIN CTE AS CTE5
CROSS JOIN CTE AS CTE6
CROSS JOIN CTE AS CTE7
CROSS JOIN CTE AS CTE8;

INSERT INTO T2
SELECT TOP 5 PERCENT A FROM T1
UNION ALL
SELECT TOP 95 PERCENT NULL FROM T1;

We have now two tables with similar number of rows. The difference is that the second one has mostly NULL values. Now let's try joining those two tables together:

SELECT COUNT(1)
FROM T1
INNER JOIN T2 ON T1.A = T2.A;

It takes 6922 ms on my machine. If you look at the execution plan, you will notice that SQL Server reads the whole T2 table, does aggregate on it and then performs a hash join.


The aggregate is useful, because it removes all the duplicate NULL values. But we're doing a inner join. Which means we cannot join on NULL value. We could simply filter these out. Let's try this:

SELECT COUNT(1)
FROM T1
INNER JOIN T2 ON T1.A = T2.A
WHERE T2.A IS NOT NULL;

This takes 4625 ms and gives the same result. In the execution plan you can observe that the aggregate is gone and we have residual predicate on T2, which removes all the unnecessary NULL values.

In conclusion we see that SQL Server could not filter out NULL values early unless explicitly told to do so. I think the reason for this is that adding such filter is rarely beneficial. So no one coded this optimization into the query optimizer. I personally encounter queries like the one above very rarely. Like once a year maybe. You can even see that the set up for this to surface is very specific. For this reason it makes no sense to go and add to every query with inner join a NOT NULL filter. It's just something to keep in mind.

No comments:

Post a Comment