Tuesday, July 9, 2019

Finding failing queries in SQL Server

Sometimes you know there is a query which fails with error but you don't know how does it look like. For example the application doesn't log it or the query is dynamic one. To track these you can use such Extended Event:

CREATE EVENT SESSION FailingQueries
ON SERVER
    ADD EVENT sqlserver.error_reported
    (ACTION (sqlserver.sql_text,
             sqlserver.database_id,
             sqlserver.username)
     WHERE (   [severity] > 10
               AND error_number <> 942 /*Database '%.*ls' cannot be opened because it is offline.*/
               AND error_number <> 916 /*The server principal "%.*ls" is not able to access the database "%.*ls" under the current security context.*/
               AND error_number <> 18456 /*Login failed for user '%.*ls'.%.*ls%.*ls*/
               AND error_number <> 17830 /*Network error code 0x2746 occurred while establishing a connection; the connection has been closed.*/
               AND error_number <> 4060 /*Cannot open database  requested by the login. The login failed.*/
     ))
    ADD TARGET package0.asynchronous_file_target
    (SET filename = 'G:\FailingQueries.xel', max_file_size = 5, max_rollover_files = 2)
WITH (MAX_DISPATCH_LATENCY = 5 SECONDS);
GO

(Don't forget to set your own path)

This event checks for error severity higher than 10 (anything below is informational) and also filters out these I found useless and quite chatty in practice. For example, if some developer connects with SSMS to instance on which he doesn't have access to all databases, said SSMS will try to open these databases anyway.

You can check data captured with following SQL:

DECLARE @filename NVARCHAR(max)
SELECT @filename = REPLACE(CAST(sesf.value AS NVARCHAR(MAX)), '.xel', '') 
FROM sys.server_event_session_fields sesf
JOIN sys.server_event_sessions ses ON ses.event_session_id = sesf.event_session_id
WHERE sesf.name = 'filename'
AND ses.name = 'FailingQueries'

;WITH cte
AS (SELECT CONVERT(XML, event_data) ed
    FROM sys.fn_xe_file_target_read_file(@filename + '*.xel', NULL, NULL, NULL) ),
shredded_data AS (
SELECT DATEADD(
           mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), ed.value('(event/@timestamp)[1]', 'datetime2(3)')) AS [timestamp],
       DB_NAME(ed.value('(event/action[@name="database_id"]/value)[1]', 'nvarchar(128)')) AS database_name,
       ed.value('(event/data[@name="severity"]/value)[1]', 'tinyint') AS [err_severity],
       ed.value('(event/data[@name="error_number"]/value)[1]', 'int') AS [err_number],
       ed.value('(event/data[@name="message"]/value)[1]', 'nvarchar(512)') AS [err_message],
       ed.value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(max)') AS [sql_text],
       ed.value('(event/action[@name="username"]/value)[1]', 'nvarchar(128)') AS username
FROM cte)
SELECT * FROM shredded_data sd
--WHERE sd.database_name = 'YourDatabase'
ORDER BY timestamp;

I like to have this Extended Event enabled whole time on development servers. That way if developers come to me saying they have some errors but don't know what SQL is executed, I can just check the event data. I've also used similar one in production.