Thursday, July 15, 2021

Query producing wrong estimates (part 1)

 I thought I'll start documenting instances of wrong estimations done by SQL Server. I already have one post on this topic, but recently I found another such case. Not sure it will help anyone, yet I'll document my findings.

This case is reproducible on compatibility levels 130 and 150, with QUERY_OPTIMIZER_HOTFIXES enabled and disabled

As usual let's first prepare table and fill it up with data:

CREATE TABLE T
(
PK UNIQUEIDENTIFIER NOT NULL DEFAULT NEWSEQUENTIALID(),
ID UNIQUEIDENTIFIER NULL,
B1 BIT NOT NULL
)

CREATE INDEX IX ON T
(ID, B1)

;WITH cte AS (
SELECT a FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) x(a))
INSERT INTO T(ID, B1)
SELECT NULL, 0
FROM cte
CROSS JOIN cte cte2
CROSS JOIN cte cte3
CROSS JOIN cte cte4
CROSS JOIN cte cte5
CROSS JOIN cte cte6

Now create following procedure and run it:

CREATE OR ALTER PROC #p (@p0 UNIQUEIDENTIFIER, @p1 BIT)
AS
BEGIN

    SELECT * FROM T
    WHERE ID = @p0
    AND B1 = @p1 

END

GO

EXEC dbo.#p @p0 = '00000000-0000-0000-0000-000000000000', 
            @p1 = 0

ID column has only null values and we're seeking for non null value, so we would expect an index seek with a lookup. What we get is table scan and totaly wrong estimates.


What are the ways to fix it? We could:

  • remove the B1 = @p1 part. But this changes the query logic
  • run query with literals. Suprisingly adding OPTION(RECOMPILE) doesn't fix the query
  • add OPTION(OPTIMIZE FOR UNKNOWN). RECOMPILE doesn't work. But this one does. Go figure
  • Remove the B1 column from the index. I find it fascinating that to fix a query performance you sometimes need to substitute good index with a worse one.
I'll let you dear reader to try out these fixes for my demo. I must say this looks like a bug to me. Since in  the real case I went for the index fix, I reviewed histogram for index with and without B1 column. They seemed basically identical, yet yielded totally different execution plans. I opened a ticket in UserVoice, but obviously don't hold my breath someone will even look in its remote direction.

Wednesday, March 31, 2021

Beating deferred name resolution (and finding a bugs on the way)

Deferred name resolution is not my favorite SQL Server feature. And I suspect I'm not the only one. Fortunately I found a workaround. I designed a DDL trigger, which uses sys.sql_expression_dependencies. With this system view I can check for any unresolved dependencies. There are quite a lot of caveats, which make the code nontrivial. For example: I don't think anything meaningful can be done with rows having is_ambigous set to 1. That's why I skip them. 

One big caveat is that for this trigger to be useful you need to quantify all objects with schema names. This wouldn't be a big problem. If you do this in your stored procedure:

SELECT * FROM NotExistingTable

It shows as invalid in sys.sql_expression_dependencies. But, if you do this:

UPDATE TableAlias SET I = 0
FROM #t TableAlias

Even though it could be totally valid statement in the context of your code, the TableAlias will appear in sys.sql_expression_dependencies. This is totally wrong in my opinion (and I even filed a bug). This system view shouldn't show aliases. Yet there it is. You cannot realistically differentiate between those two. The code I'm working on uses schema names. For this reason I designed the trigger to ignore objects without them. But I'm sure there are databases, which completely ignore schemas.

Anyway, here is the code:

CREATE OR ALTER TRIGGER ValidateDDL ON DATABASE
WITH EXECUTE AS 'dbo'
FOR DDL_DATABASE_LEVEL_EVENTS
AS
BEGIN

	SET NOCOUNT ON

	DECLARE @Data XML = EVENTDATA();
	DECLARE @InvalidObjects INT = 0;

	IF @Data.value('(/EVENT_INSTANCE/EventType)[1]', 'nvarchar(100)') IN 
		('CREATE_PROCEDURE', 'ALTER_PROCEDURE', 'CREATE_TRIGGER', 'ALTER_TRIGGER', 'CREATE_FUNCTION', 'ALTER_FUNCTION')
		OR @Data.exist('(/EVENT_INSTANCE/AlterTableActionList/Enable/Triggers/Name)') = 1
	BEGIN
		
		/*when enabling trigger we need to take its name from elsewhere*/
		DECLARE @Object NVARCHAR(128) = IIF(@Data.exist('(/EVENT_INSTANCE/AlterTableActionList/Enable/Triggers/Name)') = 1,
											@Data.value('(/EVENT_INSTANCE/AlterTableActionList/Enable/Triggers/Name)[1]', 'nvarchar(128)'),
											@Data.value('(/EVENT_INSTANCE/ObjectName)[1]', 'nvarchar(128)'))
		DECLARE @Schema NVARCHAR(128) = @Data.value('(/EVENT_INSTANCE/SchemaName)[1]', 'nvarchar(128)')

		SELECT @InvalidObjects = COUNT(1)
		FROM sys.sql_expression_dependencies
		WHERE OBJECT_NAME(referencing_id) = @Object
			AND OBJECT_SCHEMA_NAME(referencing_id) = @Schema
			AND referenced_id IS NULL
			/*ignore special trigger tables and aliases*/
			AND referenced_schema_name IS NOT NULL
			AND referenced_server_name IS NULL
			AND is_ambiguous = 0
			AND (referenced_database_name IS NULL
				/*check invalid database names*/
				OR referenced_database_name NOT IN (SELECT name COLLATE DATABASE_DEFAULT FROM sys.databases))

		/*check inside valid databases*/
		SELECT	referenced_schema_name SchemaName,
				referenced_entity_name ObjectName,
				referenced_database_name DatabaseName
		INTO #CrossChecks
		FROM sys.sql_expression_dependencies
		WHERE OBJECT_NAME(referencing_id) = @Object
			AND OBJECT_SCHEMA_NAME(referencing_id) = @Schema
			AND referenced_id IS NULL 
			AND referenced_database_name IN (SELECT name COLLATE DATABASE_DEFAULT FROM sys.databases)
			AND referenced_server_name IS NULL

		DECLARE @CheckSchema NVARCHAR(128),
				@CheckObject NVARCHAR(128),
				@CheckDatabase NVARCHAR(128)
		DECLARE cur_ CURSOR
		FOR
		    SELECT  *
		    FROM #CrossChecks
		
		OPEN cur_
		FETCH NEXT FROM cur_
		INTO @CheckSchema, @CheckObject, @CheckDatabase
		
		WHILE @@FETCH_STATUS = 0
		    BEGIN
				DECLARE @Result INT = 0
				DECLARE @SQL NVARCHAR(MAX) =	N'SELECT @ResultOUT = IIF(count(1) = 0, 1, 0) FROM sys.objects 
												WHERE name = ''' + @CheckObject + ''' AND OBJECT_SCHEMA_NAME(object_id) = ''' +@CheckSchema + ''''
				DECLARE @ParameterDefinition NVARCHAR(255) = N'@ResultOUT int OUTPUT'
				DECLARE @sp_executesql NVARCHAR(255) = @CheckDatabase + '.sys.sp_executesql'

				EXEC @sp_executesql @SQL, @ParameterDefinition, @ResultOUT = @Result OUTPUT

				SET @InvalidObjects = @InvalidObjects + @Result
		
				FETCH NEXT FROM cur_
				INTO @CheckSchema, @CheckObject, @CheckDatabase
		    END 
		CLOSE cur_;
		DEALLOCATE cur_;
		
		IF @InvalidObjects > 0
		BEGIN
			RAISERROR( 'The object references invalid entities', 16, 1)
			ROLLBACK
		END
		
	END

END

This trigger prevents creation of objects referencing invalid tables or databases in stored procedures, functions and triggers. It will generate error like this:

Msg 50000, Level 16, State 1, Procedure ValidateDDL, Line 80 [Batch Start Line 98]

The object references invalid entities

Msg 3609, Level 16, State 2, Procedure TestInvalidObjects, Line 1 [Batch Start Line 98]

The transaction ended in the trigger. The batch has been aborted.

Overall I'm pretty proud of this code, even though it doesn't cover all possible cases. Feel free to try it out. 

Note: While testing the trigger I found an additional bug to the one described at beginning. If you have two DDL triggers in the database, disable one of them and then try to alter it, the session will be terminated. For this reason, if you run into problems with this trigger, it's safer to just drop it, not disable.

If you're interested in the bug reproducer, you can check it here


Thursday, February 11, 2021

LIKE predicate hidden conversion

 There is one gotcha with LIKE predicate, which I recently discovered. It caused me quite a lot of confusion, so I'll describe it here to hopefully save others this headache.

First let's set up a table and fill it with data:

CREATE TABLE T
(
	[TextColumn] [varchar](100) NOT NULL
)

CREATE NONCLUSTERED INDEX [IX_TextColumn] ON T
(
	[TextColumn] ASC
)


;WITH cte AS (
SELECT a FROM (VALUES('A'),('B'),('C'),('D'),('E'),('F')) x(a))
INSERT INTO T
SELECT cte.a FROM cte
CROSS JOIN cte cte2
CROSS JOIN cte cte3
CROSS JOIN cte cte4
CROSS JOIN cte cte5
CROSS JOIN cte cte6
CROSS JOIN cte cte7
CROSS JOIN cte cte8
CROSS JOIN cte cte9

Now let's run following query and check its execution plan. Note: I'm using variable here, but the same issue happens with parameters

DECLARE @v NVARCHAR(4000) = N'C'

SELECT COUNT(1) FROM dbo.T
WHERE  TextColumn LIKE @v




As we see, we've got table scan even though there is perfectly valid index for this query. So why isn't it used? Let's examine the table scan closer:


As we can see there is implicit conversion from NVARCHAR variable to match VARCHAR column. Meaning the predicate is not SARGable. Now that we know the issue there are few possible fixes:
  • change the variable to VARCHAR
  • add OPTION(RECOMPILE) to the query
  • explicitly cast the variable to VARCHAR
In all of the above cases you will end up with nice, quick seek:


Now you may wonder why I found this confusing. There are three reasons:
  1. Normally when there is implicit conversion happening, the SELECT node has yellow warning mark. The issue is easily noticeable. Not in this case. 
  2. It doesn't happen when you swap the datatypes (column nvarchar, variable varchar)
  3. The same problem doesn't happen with equality comparison is used:



This issue can become quite problematic when working with NHibernate and probably other frameworks. NHibernate doesn't emit VARCHAR parameters, only NVARCHAR. So if you have VARCHAR columns in your database and you want to filter on them using LIKE keyword you can be unpleasantly surprised performance wise.