I am not pro-cursor. I am not, however, extremely anti-cursor.
I think that if there is a cursor used for a certain problem it just means we don’t know enough about the problem, the data or the code to think of a better solution.
But I’ve had an experience with cursors and, for my cases, found a better way.
That is what this blog post is going to be about, a case where we were using a cursor until we realised a better solution for it.
We have a weird table.
It’s our fault though as we’ve made it weird, we haven’t properly normalised it and it’s basically just a massive catch-all for records and for some strange reason, only 1 in 10 records have a ticket number.
So let’s create it!
-- Create the test table. CREATE TABLE [dbo].[ProductActions] ( [product_action_id] INT IDENTITY(0, 1) PRIMARY KEY, [product_id] INT, [allowed_action_id] INT, [ticket_number] VARCHAR(20), [date_actioned] DATE ); -- Populate it. WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B) INSERT dbo.ProductActions (product_id, allowed_action_id, date_actioned) SELECT TOP (10000000) product_id = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 5, allowed_action_id = (ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) + 5) / 5, date_actioned = CAST(DATEADD(DAY, (SELECT ABS(CHECKSUM(NEWID())) % 50000), 0) AS DATE) FROM L5; UPDATE dbo.ProductActions SET ticket_number = (CASE WHEN product_action_id % 10 = 0 THEN 'TCKT' + REPLACE(QUOTENAME(product_action_id, ''''),'''','') ELSE NULL END);
…that took around 1 minute for 10,000,000 records, including the create and update statements…
Now say something diabolical happens, a bad delete statement, a disgruntled employee who just won the lottery, a chimpanzee riding on a segway, whatever but for some reason all the records holding our ticket numbers get deleted.
However, thankfully (not sure if that’s the right word) we have Excel sheets of the records that were dropped.
… 9 seconds! 9 seconds for that wealthy employee/chimpanzee to cause havoc…
However, we don’t have everything; we have the product_id and the allowed_action_id.
So the business tells us that the dates don’t matter, it just needs those records back and those ticket numbers!
What we did:
Cursors. (I know, I’m sorry.)
However, we had problems.
Our table had a lot of traffic so we weren’t guaranteed an unbroken sequential range of identifiers that we could update.
And we couldn’t remove these foreign inserts from our range as we had no way of identifying which was our records and which were foreign records.
So we created a cursor. Not a performance tuned cursor since we were relatively new to SQL at the time but a run-of-the-mill, Google what it is and grab the template off the internet cursor.
- Import the records into a holding table
- Give the table a sequential identifier for ordering
- Get the first record
- Insert it into the original table
- Grab the identity of the insert we just inserted
- Update it to have the correct ticket number
- Grab the next record and repeat steps 4 – 6
All together 8 steps (although steps 3 – 7 are done around 1 million times) but how fast is it?
ALTER TABLE dbo.DeletedRecordsPaper ADD [ID] INT IDENTITY(0, 1); GO
…around 34 seconds, surprisingly long…sign of things to come?…
Steps 3 – 7:
DECLARE @prodID INT, @allowed INT, @entered INT; -- Start of Step 3 DECLARE FillMissingRecords CURSOR FOR SELECT product_id, allowed_action_id FROM dbo.DeletedRecordsPaper ORDER BY ID ASC; OPEN FillMissingRecords FETCH NEXT FROM FillMissingRecords INTO @prodID, @allowed WHILE @@FETCH_STATUS = 0 BEGIN -- Step 4 INSERT dbo.ProductActions (product_id, allowed_action_id, date_actioned) SELECT @prodID, @allowed, GETDATE(); -- Step 5 SELECT @entered = SCOPE_IDENTITY(); -- Step 6 UPDATE dbo.ProductActions SET ticket_number = 'TCKT' + CAST(@entered AS varchar(10)) WHERE product_action_id = @entered; -- Step 7 FETCH NEXT FROM FillMissingRecords INTO @prodID, @allowed END CLOSE FillMissingRecords DEALLOCATE FillMissingRecords;
How’d it do? Well it worked; 1 million records took 9 minutes and 35 seconds though.
…9 minutes and 35 seconds…9 minutes and 35 seconds?!?…I could have recreated the table 9 and a half times in that time! I knew it was going to be slow, I even went out and got a coffee while I was waiting but C’MON!!!!…
So altogether, with the adding of the sequential identifier, it took 10 minutes to run.
Now I can’t remember why exactly but this happened a few times so we had to run this a few times.
What we learned:
1). Eventually monkies improve with using keyboards, so too do humans and 2). it’s SQL Server. Use Set Theory!
Looking at this issue the main problem that we could see was that we needed to know what records we had inserted to update them with a ticket number.
We thought that we couldn’t figure out what we had inserted without using SCOPE_IDENTITY, we didn’t know about the inserted table, we didn’t know about variable tables, essentially we didn’t know about OUTPUT.
- Import the records into a holding table
- Declare a holding table for the identities
- Insert all the records into the original table and output the values to our holding table
- Update all the records inserted to have the correct ticket number
All together 5 steps, already an improvement, but that’s just steps, not performance. Is it an actual performance improvement?
…to be honest, if it was anyways faster than 9 minutes, I’d be happy…
-- Step 2 DECLARE @entered TABLE ( entered_id INT ); -- Step 3 INSERT dbo.ProductActions (product_id, allowed_action_id, date_actioned) OUTPUT inserted.product_action_id INTO @entered SELECT product_id, allowed_action_id, GETDATE() FROM dbo.DeletedRecordsPaper; -- Step 4 UPDATE pa SET ticket_number = 'TCKT' + CAST(entered_id AS varchar(10)) FROM dbo.ProductActions AS [pa] JOIN @entered AS e ON pa.product_action_id = e.entered_id;
29 SECONDS!!!! 1 million rows in 29 seconds!
So we’ve gone from 1 million rows in 9 minutes 35 seconds (~1,730 rows a second) to 1 million in 29 seconds (~34,482 rows a second).
…those results flustered me so much that I forgot maths so if anybody wants to work out that improvement percentage, please let me know. I want to write that one down!…
The most dangerous phrase in the English language is probably “what does this button do?”
The second most dangerous is “we’ve always done it that way”.
There’s always room for improvement testing (on the right system i.e. NOT PRODUCTION) and OUTPUT over CURSORS any day.