Thursday, July 15, 2021

Query producing wrong estimates (part 1)

 I thought I'll start documenting instances of wrong estimations done by SQL Server. I already have one post on this topic, but recently I found another such case. Not sure it will help anyone, yet I'll document my findings.

This case is reproducible on compatibility levels 130 and 150, with QUERY_OPTIMIZER_HOTFIXES enabled and disabled

As usual let's first prepare table and fill it up with data:

CREATE TABLE T
(
PK UNIQUEIDENTIFIER NOT NULL DEFAULT NEWSEQUENTIALID(),
ID UNIQUEIDENTIFIER NULL,
B1 BIT NOT NULL
)

CREATE INDEX IX ON T
(ID, B1)

;WITH cte AS (
SELECT a FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) x(a))
INSERT INTO T(ID, B1)
SELECT NULL, 0
FROM cte
CROSS JOIN cte cte2
CROSS JOIN cte cte3
CROSS JOIN cte cte4
CROSS JOIN cte cte5
CROSS JOIN cte cte6

Now create following procedure and run it:

CREATE OR ALTER PROC #p (@p0 UNIQUEIDENTIFIER, @p1 BIT)
AS
BEGIN

    SELECT * FROM T
    WHERE ID = @p0
    AND B1 = @p1 

END

GO

EXEC dbo.#p @p0 = '00000000-0000-0000-0000-000000000000', 
            @p1 = 0

ID column has only null values and we're seeking for non null value, so we would expect an index seek with a lookup. What we get is table scan and totaly wrong estimates.


What are the ways to fix it? We could:

  • remove the B1 = @p1 part. But this changes the query logic
  • run query with literals. Suprisingly adding OPTION(RECOMPILE) doesn't fix the query
  • add OPTION(OPTIMIZE FOR UNKNOWN). RECOMPILE doesn't work. But this one does. Go figure
  • Remove the B1 column from the index. I find it fascinating that to fix a query performance you sometimes need to substitute good index with a worse one.
I'll let you dear reader to try out these fixes for my demo. I must say this looks like a bug to me. Since in  the real case I went for the index fix, I reviewed histogram for index with and without B1 column. They seemed basically identical, yet yielded totally different execution plans. I opened a ticket in UserVoice, but obviously don't hold my breath someone will even look in its remote direction.

Wednesday, March 31, 2021

Beating deferred name resolution (and finding a bugs on the way)

Deferred name resolution is not my favorite SQL Server feature. And I suspect I'm not the only one. Fortunately I found a workaround. I designed a DDL trigger, which uses sys.sql_expression_dependencies. With this system view I can check for any unresolved dependencies. There are quite a lot of caveats, which make the code nontrivial. For example: I don't think anything meaningful can be done with rows having is_ambigous set to 1. That's why I skip them. 

One big caveat is that for this trigger to be useful you need to quantify all objects with schema names. This wouldn't be a big problem. If you do this in your stored procedure:

SELECT * FROM NotExistingTable

It shows as invalid in sys.sql_expression_dependencies. But, if you do this:

UPDATE TableAlias SET I = 0
FROM #t TableAlias

Even though it could be totally valid statement in the context of your code, the TableAlias will appear in sys.sql_expression_dependencies. This is totally wrong in my opinion (and I even filed a bug). This system view shouldn't show aliases. Yet there it is. You cannot realistically differentiate between those two. The code I'm working on uses schema names. For this reason I designed the trigger to ignore objects without them. But I'm sure there are databases, which completely ignore schemas.

Anyway, here is the code:

CREATE OR ALTER TRIGGER ValidateDDL ON DATABASE
WITH EXECUTE AS 'dbo'
FOR DDL_DATABASE_LEVEL_EVENTS
AS
BEGIN

	SET NOCOUNT ON

	DECLARE @Data XML = EVENTDATA();
	DECLARE @InvalidObjects INT = 0;

	IF @Data.value('(/EVENT_INSTANCE/EventType)[1]', 'nvarchar(100)') IN 
		('CREATE_PROCEDURE', 'ALTER_PROCEDURE', 'CREATE_TRIGGER', 'ALTER_TRIGGER', 'CREATE_FUNCTION', 'ALTER_FUNCTION')
		OR @Data.exist('(/EVENT_INSTANCE/AlterTableActionList/Enable/Triggers/Name)') = 1
	BEGIN
		
		/*when enabling trigger we need to take its name from elsewhere*/
		DECLARE @Object NVARCHAR(128) = IIF(@Data.exist('(/EVENT_INSTANCE/AlterTableActionList/Enable/Triggers/Name)') = 1,
											@Data.value('(/EVENT_INSTANCE/AlterTableActionList/Enable/Triggers/Name)[1]', 'nvarchar(128)'),
											@Data.value('(/EVENT_INSTANCE/ObjectName)[1]', 'nvarchar(128)'))
		DECLARE @Schema NVARCHAR(128) = @Data.value('(/EVENT_INSTANCE/SchemaName)[1]', 'nvarchar(128)')

		SELECT @InvalidObjects = COUNT(1)
		FROM sys.sql_expression_dependencies
		WHERE OBJECT_NAME(referencing_id) = @Object
			AND OBJECT_SCHEMA_NAME(referencing_id) = @Schema
			AND referenced_id IS NULL
			/*ignore special trigger tables and aliases*/
			AND referenced_schema_name IS NOT NULL
			AND referenced_server_name IS NULL
			AND is_ambiguous = 0
			AND (referenced_database_name IS NULL
				/*check invalid database names*/
				OR referenced_database_name NOT IN (SELECT name COLLATE DATABASE_DEFAULT FROM sys.databases))

		/*check inside valid databases*/
		SELECT	referenced_schema_name SchemaName,
				referenced_entity_name ObjectName,
				referenced_database_name DatabaseName
		INTO #CrossChecks
		FROM sys.sql_expression_dependencies
		WHERE OBJECT_NAME(referencing_id) = @Object
			AND OBJECT_SCHEMA_NAME(referencing_id) = @Schema
			AND referenced_id IS NULL 
			AND referenced_database_name IN (SELECT name COLLATE DATABASE_DEFAULT FROM sys.databases)
			AND referenced_server_name IS NULL

		DECLARE @CheckSchema NVARCHAR(128),
				@CheckObject NVARCHAR(128),
				@CheckDatabase NVARCHAR(128)
		DECLARE cur_ CURSOR
		FOR
		    SELECT  *
		    FROM #CrossChecks
		
		OPEN cur_
		FETCH NEXT FROM cur_
		INTO @CheckSchema, @CheckObject, @CheckDatabase
		
		WHILE @@FETCH_STATUS = 0
		    BEGIN
				DECLARE @Result INT = 0
				DECLARE @SQL NVARCHAR(MAX) =	N'SELECT @ResultOUT = IIF(count(1) = 0, 1, 0) FROM sys.objects 
												WHERE name = ''' + @CheckObject + ''' AND OBJECT_SCHEMA_NAME(object_id) = ''' +@CheckSchema + ''''
				DECLARE @ParameterDefinition NVARCHAR(255) = N'@ResultOUT int OUTPUT'
				DECLARE @sp_executesql NVARCHAR(255) = @CheckDatabase + '.sys.sp_executesql'

				EXEC @sp_executesql @SQL, @ParameterDefinition, @ResultOUT = @Result OUTPUT

				SET @InvalidObjects = @InvalidObjects + @Result
		
				FETCH NEXT FROM cur_
				INTO @CheckSchema, @CheckObject, @CheckDatabase
		    END 
		CLOSE cur_;
		DEALLOCATE cur_;
		
		IF @InvalidObjects > 0
		BEGIN
			RAISERROR( 'The object references invalid entities', 16, 1)
			ROLLBACK
		END
		
	END

END

This trigger prevents creation of objects referencing invalid tables or databases in stored procedures, functions and triggers. It will generate error like this:

Msg 50000, Level 16, State 1, Procedure ValidateDDL, Line 80 [Batch Start Line 98]

The object references invalid entities

Msg 3609, Level 16, State 2, Procedure TestInvalidObjects, Line 1 [Batch Start Line 98]

The transaction ended in the trigger. The batch has been aborted.

Overall I'm pretty proud of this code, even though it doesn't cover all possible cases. Feel free to try it out. 

Note: While testing the trigger I found an additional bug to the one described at beginning. If you have two DDL triggers in the database, disable one of them and then try to alter it, the session will be terminated. For this reason, if you run into problems with this trigger, it's safer to just drop it, not disable.

If you're interested in the bug reproducer, you can check it here


Thursday, February 11, 2021

LIKE predicate hidden conversion

 There is one gotcha with LIKE predicate, which I recently discovered. It caused me quite a lot of confusion, so I'll describe it here to hopefully save others this headache.

First let's set up a table and fill it with data:

CREATE TABLE T
(
	[TextColumn] [varchar](100) NOT NULL
)

CREATE NONCLUSTERED INDEX [IX_TextColumn] ON T
(
	[TextColumn] ASC
)


;WITH cte AS (
SELECT a FROM (VALUES('A'),('B'),('C'),('D'),('E'),('F')) x(a))
INSERT INTO T
SELECT cte.a FROM cte
CROSS JOIN cte cte2
CROSS JOIN cte cte3
CROSS JOIN cte cte4
CROSS JOIN cte cte5
CROSS JOIN cte cte6
CROSS JOIN cte cte7
CROSS JOIN cte cte8
CROSS JOIN cte cte9

Now let's run following query and check its execution plan. Note: I'm using variable here, but the same issue happens with parameters

DECLARE @v NVARCHAR(4000) = N'C'

SELECT COUNT(1) FROM dbo.T
WHERE  TextColumn LIKE @v




As we see, we've got table scan even though there is perfectly valid index for this query. So why isn't it used? Let's examine the table scan closer:


As we can see there is implicit conversion from NVARCHAR variable to match VARCHAR column. Meaning the predicate is not SARGable. Now that we know the issue there are few possible fixes:
  • change the variable to VARCHAR
  • add OPTION(RECOMPILE) to the query
  • explicitly cast the variable to VARCHAR
In all of the above cases you will end up with nice, quick seek:


Now you may wonder why I found this confusing. There are three reasons:
  1. Normally when there is implicit conversion happening, the SELECT node has yellow warning mark. The issue is easily noticeable. Not in this case. 
  2. It doesn't happen when you swap the datatypes (column nvarchar, variable varchar)
  3. The same problem doesn't happen with equality comparison is used:



This issue can become quite problematic when working with NHibernate and probably other frameworks. NHibernate doesn't emit VARCHAR parameters, only NVARCHAR. So if you have VARCHAR columns in your database and you want to filter on them using LIKE keyword you can be unpleasantly surprised performance wise.

Sunday, April 12, 2020

Join on nullable column

There is a interesting behavior I noticed in SQL Server. Let me show it to you.

First let's prepare tables and data:

CREATE TABLE T1 (A UNIQUEIDENTIFIER);
CREATE TABLE T2 (A UNIQUEIDENTIFIER);

;WITH CTE AS ( SELECT A FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8)) X(A) )
INSERT INTO T1
SELECT NEWID() FROM CTE AS CTE1
CROSS JOIN CTE AS CTE2
CROSS JOIN CTE AS CTE3
CROSS JOIN CTE AS CTE4
CROSS JOIN CTE AS CTE5
CROSS JOIN CTE AS CTE6
CROSS JOIN CTE AS CTE7
CROSS JOIN CTE AS CTE8;

INSERT INTO T2
SELECT TOP 5 PERCENT A FROM T1
UNION ALL
SELECT TOP 95 PERCENT NULL FROM T1;

We have now two tables with similar number of rows. The difference is that the second one has mostly NULL values. Now let's try joining those two tables together:

SELECT COUNT(1)
FROM T1
INNER JOIN T2 ON T1.A = T2.A;

It takes 6922 ms on my machine. If you look at the execution plan, you will notice that SQL Server reads the whole T2 table, does aggregate on it and then performs a hash join.


The aggregate is useful, because it removes all the duplicate NULL values. But we're doing a inner join. Which means we cannot join on NULL value. We could simply filter these out. Let's try this:

SELECT COUNT(1)
FROM T1
INNER JOIN T2 ON T1.A = T2.A
WHERE T2.A IS NOT NULL;

This takes 4625 ms and gives the same result. In the execution plan you can observe that the aggregate is gone and we have residual predicate on T2, which removes all the unnecessary NULL values.

In conclusion we see that SQL Server could not filter out NULL values early unless explicitly told to do so. I think the reason for this is that adding such filter is rarely beneficial. So no one coded this optimization into the query optimizer. I personally encounter queries like the one above very rarely. Like once a year maybe. You can even see that the set up for this to surface is very specific. For this reason it makes no sense to go and add to every query with inner join a NOT NULL filter. It's just something to keep in mind.

Sunday, April 5, 2020

Filtering on one column and joining on another - estimate problem

Let me show you a case where SQL Server has problems with cardinality estimation and how to fix it.

First let us prepare tables and data:

CREATE TABLE T1 ( A INT, B INT)
CREATE TABLE T2 ( A INT, B INT)
GO

INSERT INTO T1 VALUES (1, 1)
GO 1000

UPDATE TOP (100) T1 SET A = 2

INSERT INTO T2 VALUES (1, 1), (2, 2)

UPDATE STATISTICS T1
UPDATE STATISTICS T2


Now try the following query and examine the execution plan:

SELECT COUNT (1) FROM T1
INNER JOIN T2 ON T1.A = T2.A
WHERE T2.B = 1

The initial execution plan

As we can see the estimates on the join are wrong. Reason for this is that we filter on column B in table T2, but use column A to join with the other table. In that case SQL Server has no idea what value will column A be. So it guesses an average. In this case the guess is far off, since the data in column A in table T1 is skewed.

How to fix this? As we can see the problem is with wrong estimates. We need to help SQL Server here and create filtered statistics:

CREATE STATISTICS S1 ON T2(A) WHERE B = 1

This statistics is on the join column and the filter is exactly the same. The execution plan estimates look much better now:
The better execution plan

In this simplistic example it doesn't matter much. Yet I encountered cases where this caused hash spills. In addition, if it's executed often, you can get significant performance problems.

Now this is in no way universal fix. It will work only if filter on B doesn't change. If we change the where clause to B = 2, the statistics will be wrong again. We would need to create new statistics for the new where clause. This is not scalable. It may be doable for 5 values, but not for 10 000. Yet this trick may come in helpful from time to time.

Monday, November 4, 2019

Tracking query timeouts with extended events

Timeouts are kind of a blind spot for performance tuner. They don't appear in any DMVs like other queries. You get to know about them either when someone points them out for you or from application logs, if you have such logging in place. For this reason I wanted independent way of tracking timeouts. After quite exhaustive research I figured out how to do this with Extended Events. Turns out that when application cancels request, it's result is 2. Theoretically you could use attention event, but I found it reported timeouts on statements, which didn't from application point of view. Here's the event definition:

CREATE EVENT SESSION [QueryTimeouts] ON SERVER 
ADD EVENT sqlserver.rpc_completed(SET collect_statement=(1)
    ACTION(sqlserver.database_name,sqlserver.username)
    WHERE ([result]=(2))),
ADD EVENT sqlserver.sql_batch_completed(
    ACTION(sqlserver.database_name,sqlserver.username)
    WHERE ([result]=(2)))
ADD TARGET package0.asynchronous_file_target
    (SET filename = 'G:\QueryTimeouts.xel', max_file_size = 5, max_rollover_files = 2)
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS)

The data can be extracted with following query:

DECLARE @filename NVARCHAR(max)
SELECT @filename = REPLACE(CAST(sesf.value AS NVARCHAR(MAX)), '.xel', '') 
FROM sys.server_event_session_fields sesf
JOIN sys.server_event_sessions ses ON ses.event_session_id = sesf.event_session_id
WHERE sesf.name = 'filename'
AND ses.name = 'QueryTimeouts'

;WITH cte
AS (SELECT CONVERT(XML, event_data) ed
    FROM sys.fn_xe_file_target_read_file(@filename + '*.xel', NULL, NULL, NULL) )
SELECT
	DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), ed.value('(event/@timestamp)[1]', 'datetime2(3)')) AS [timestamp],
	ed.value('(event/action[@name="database_name"]/value)[1]', 'nvarchar(128)') AS [database_name],
	ed.value('(event/action[@name="username"]/value)[1]', 'nvarchar(128)') AS username,
	ed.value('(event/data[@name="duration"]/value)[1]', 'bigint')/1000 AS duration,
	ed.value('(event/data[@name="cpu_time"]/value)[1]', 'bigint')/1000 AS cpu_time,
	ed.value('(event/data[@name="logical_reads"]/value)[1]', 'bigint') AS logical_reads,
	ISNULL(ed.value('(event/data[@name="spills"]/value)[1]', 'bigint'),0) AS spills,
	ISNULL(ed.value('(event/data[@name="batch_text"]/value)[1]', 'nvarchar(max)'),ed.value('(event/data[@name="statement"]/value)[1]', 'nvarchar(max)')) AS text
FROM cte
ORDER BY timestamp;

There are few gotchas when using this event. First you want to filter out non application users like yourself. You'll be surprised how often you cancel your queries. Secondly, some applications are not written very well. They do select without TOP XX clause but cancel the query after receiving some number of rows. Better filter those out. Either in the event itself or when extracting data.

When analyzing the results I compare duration and CPU time. If they are similar then the query is probably too slow. If the times are like 30 seconds duration but 1 for CPU, then it was probably blocked by other query. You need to use other methods to identify the offending query. I also check spills. They can be slow, so fixing them may speed up the query. Note that spills are only reported for batches for some reason.

Thursday, October 10, 2019

Search plan cache for index

Quite often I want to see where specific index is used. I started by using script from Jonathan Kehayias. Sadly his script took around 10 minutes on plan cache of 15 GB size. Also it produced many duplicate entries. So I tweaked his script to my needs. My version groups results by plan hash. This helps, if you have unparametrized queries. I also switched to exist() when searching xml. Finally I use both dm_exec_query_plan and dm_exec_text_query_plan. For some reason, during my testing, it turned out quicker than with only one of them. dm_exec_text_query_plan lets you limit execution plan only to specific query using index of your interest, in case a batch or stored procedure has more than one statement. That way you don't need to analyze the whole thing to find the relevant part.

Here's the code:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
DECLARE @IndexName AS NVARCHAR(128) = N'YourIndexName';

--Handle the case where the left or right was quoted manually but not the opposite side 
IF LEFT(@IndexName, 1) <> '[' SET @IndexName = '[' + @IndexName;
IF RIGHT(@IndexName, 1) <> ']' SET @IndexName = @IndexName + ']';

;WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
, cte AS
	(SELECT plan_handle, query_plan_hash, statement_start_offset, statement_end_offset,
		MIN(creation_time) OVER(PARTITION BY qs.query_plan_hash ORDER BY qs.query_plan_hash) creation_time,
		MAX(last_execution_time) OVER(PARTITION BY qs.query_plan_hash ORDER BY qs.query_plan_hash) last_execution_time,
        SUM(total_worker_time) OVER(PARTITION BY qs.query_plan_hash ORDER BY qs.query_plan_hash) total_worker_time,
		SUM(total_logical_reads) OVER(PARTITION BY qs.query_plan_hash ORDER BY qs.query_plan_hash) total_logical_reads,
		SUM(execution_count) OVER(PARTITION BY qs.query_plan_hash ORDER BY qs.query_plan_hash) execution_count,
		ROW_NUMBER() OVER(PARTITION BY qs.query_plan_hash ORDER BY qs.query_plan_hash) row_num
    FROM sys.dm_exec_query_stats AS qs
   )
, cte2 AS
	(SELECT * FROM cte
	OUTER APPLY sys.dm_exec_query_plan(cte.plan_handle) AS qp
	WHERE (query_plan.exist('//IndexScan[not(@Lookup)]/Object[@Index=sql:variable("@IndexName")]') = 1 OR query_plan IS NULL) AND  cte.row_num = 1
	)
, cte3 AS
	(SELECT ISNULL(
			SUBSTRING(qt.text, (cte2.statement_start_offset / 2) + 1, ((CASE cte2.statement_end_offset WHEN -1 THEN DATALENGTH(qt.text) ELSE cte2.statement_end_offset
						END - cte2.statement_start_offset) / 2) + 1), qt.text) AS TSQL,
		TRY_CAST(qp.query_plan AS XML) query_plan, cte2.creation_time, cte2.last_execution_time,
		cte2.total_worker_time, cte2.total_logical_reads, cte2.execution_count
    FROM cte2
    CROSS APPLY sys.dm_exec_text_query_plan(cte2.plan_handle, cte2.statement_start_offset, cte2.statement_end_offset) AS qp
    CROSS APPLY sys.dm_exec_sql_text(cte2.plan_handle) AS qt)
SELECT *
FROM cte3
WHERE query_plan.exist('//IndexScan[not(@Lookup)]/Object[@Index=sql:variable("@IndexName")]') = 1
ORDER BY cte3.execution_count DESC
OPTION (RECOMPILE);

I admit it's quite messy, but gets me relatively clean data and runs in 2 minutes. Quite nice improvement over original.

Before using this script three warnings:
  1. Remember that plan cache doesn't necessary hold all execution plans, especially after instance restart
  2. This script won't catch index usage, if plan is not parse-able as XML.
  3. Considering the above be careful when performing index maintenance based on script results.