First let us prepare tables and data:
CREATE TABLE T1 ( A INT, B INT) CREATE TABLE T2 ( A INT, B INT) GO INSERT INTO T1 VALUES (1, 1) GO 1000 UPDATE TOP (100) T1 SET A = 2 INSERT INTO T2 VALUES (1, 1), (2, 2) UPDATE STATISTICS T1 UPDATE STATISTICS T2
Now try the following query and examine the execution plan:
SELECT COUNT (1) FROM T1 INNER JOIN T2 ON T1.A = T2.A WHERE T2.B = 1
The initial execution plan |
How to fix this? As we can see the problem is with wrong estimates. We need to help SQL Server here and create filtered statistics:
CREATE STATISTICS S1 ON T2(A) WHERE B = 1
This statistics is on the join column and the filter is exactly the same. The execution plan estimates look much better now:
The better execution plan |
In this simplistic example it doesn't matter much. Yet I encountered cases where this caused hash spills. In addition, if it's executed often, you can get significant performance problems.
Now this is in no way universal fix. It will work only if filter on B doesn't change. If we change the where clause to B = 2, the statistics will be wrong again. We would need to create new statistics for the new where clause. This is not scalable. It may be doable for 5 values, but not for 10 000. Yet this trick may come in helpful from time to time.
No comments:
Post a Comment