Sunday, August 25, 2019

Who is doing all those batch requests anyway?

Let's say you have a instance which has normally 1000 batch requests/s. Then suddenly it's 10 times more batch requests. It doesn't even need to affect performance in visible way. Yet it's is a sign that something is going wrong somewhere. In case you have many applications, each connecting through different user, you can try to pinpoint which one is it. To do this you can set up such extended event:

CREATE EVENT SESSION NumberOfStatementsByUser ON SERVER 
ADD EVENT sqlserver.sql_batch_completed(
    ACTION(sqlserver.username))
, ADD EVENT sqlserver.rpc_completed(
    ACTION(sqlserver.username))
ADD TARGET package0.histogram(SET source=N'sqlserver.username')
WITH (MAX_MEMORY=4096 KB, EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, 
        MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,
        MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF, STARTUP_STATE=OFF)

After starting it, you can check which user starts most requests with following query:

;WITH
	sql_batch_completed_by_database AS ( 
        SELECT CAST(xet.target_data AS xml) AS event_data
        FROM sys.dm_xe_session_targets AS xet  
        JOIN sys.dm_xe_sessions AS xe  
            ON (xe.address = xet.event_session_address)  
        WHERE xe.name = 'NumberOfStatementsByUser'
    )
    , requests_by_user AS (
        SELECT
              slot.value('(./value)[1]', 'sysname') AS Username
            , slot.value('@count', 'bigint') AS RequestCount
        FROM (SELECT event_data FROM sql_batch_completed_by_database) AS xe(event_data)
        CROSS APPLY event_data.nodes('/HistogramTarget/Slot') AS histogram_target(slot)
    )
SELECT Username, SUM(RequestCount) AS RequestCount
FROM requests_by_user
GROUP BY Username;

Personally, I start this extended event only when I see a problem and only for a minute or so. This is enough to identify the troublesome user. After I identify the user, I usually set up tracing only for him to see what queries he runs. I stop the trace stop as soon as possible. The profiler is quirky and usually runs for few seconds before it can be stopped. This is enough to capture thousands of event. In my experience such things happen when the code being executed goes into infinite loop or user is missing permissions (yes this happened). Having the queries which are being executed should be sufficient information to give to people responsible for the application, so they can start troubleshooting their code.