T-SQL Tuesday #77: Recursive Common Table Expression (CTE)

tsql2sday150x1501

Favourite SQL Server Feature:

Is there is any event that is a great starting off point for new bloggers than T-SQL Tuesday? It gives you everything you need, a topic, a semblance of choice, a deadline

This month’s T-SQL Tuesday is hosted by Jens Vestergaard (b/t) and the topic is out favourite SQL Server Feature.

Now I didn’t realise that I had a favourite SQL Server feature until I had to sit down and think about it and I realised that the moment that I learned how to use CTE’s, I’ve been using them non stop. So much so that in a previous job, a developer once said he could recognize a stored procedure I had written just because it contained a CTE!

it didn’t help that in that case he was right 😦 

According to Books Online, a CTE is a “temporary named result set” that is defined within the scope of a single statement. In case you are worries about the ‘single statement’ aspect of that, don’t be. With temp tables, Variables table, etc, SQL Server got you covered ;).

As this is the first blog post, I’ll keep this short and sweet. The main capacity of CTE’s that I admire is the RECURSIVE element to them.

Recursive CTE’s require only 4 aspects

  1. An anchor statement
  2. A joining statement e.g. UNION ALL
  3. A recursive statement, and
  4. A terminator clause

Actions Speak Louder Than Words:

With the release of the new STRING_SPLIT function in SQL Server 2016 that everyone is looking forward to, it’s probably fitting that the example of a recursive CTE that I’ll be using is to split a string. Since this post is about Recursive CTE’s I’ll be focusing on the architecture of the CTE more than what is in the script though!

So for our example, say we are given a variable sting with a list of elements in it…


DECLARE @string VARCHAR(100) = 'SELECT,INSERT,UPDATE,DELETE,EXECUTE'

…and we are tasked with splitting this string out into it’s different parts. Now there are multiple different ways that this could be accomplished (nearly all of them faster and more efficien) but we’re going the recursive CTE route!


DECLARE @string VARCHAR(100) = 'SELECT,INSERT,UPDATE,DELETE,EXECUTE'
-- This value is used to split out the string :
, @delimiter CHAR(1) = ',';

-- Add on a final delimter to get the last element
SELECT @string = @string + @delimiter;

WITH delimiting_cte
( ID, original_text, remaining_text, delimited_text )
AS
(
-- Anchor stmt :
SELECT
CAST( 1 as SMALLINT ),
@string,
RIGHT( @string, ( LEN( @string ) - CHARINDEX( @delimiter, @string ) ) ), --remaining_text
SUBSTRING( @string, 1, CHARINDEX( @delimiter, @string ) - 1 ) -- delimited_text
-- Joining Statement :
UNION ALL
-- Recursive stmt : remove each delimited value to put in own row...
SELECT
CAST( c.ID + 1 as SMALLINT ),
c.original_text,
RIGHT( c.remaining_text, ( LEN( remaining_text ) - CHARINDEX( @delimiter, c.remaining_text ) ) ), -- remaining_text
SUBSTRING( c.remaining_text, 0, CHARINDEX( @delimiter, c.remaining_text )) -- delimited_text
FROM delimiting_cte as [c]
WHERE
-- Terminator clause: Until no delimiter left in the [remaining_text] column...
remaining_text like '%['+@delimiter+']%'
)
SELECT
ID,
original_text,
remaining_text,
delimited_text
FROM delimiting_cte as [c];

Anchor Statement:

The anchor statement is static, it doesn’t change. You can take that query out, run it all day long and you’d get the same results. No changes here, this is what the recursive derives itself from!

-- Anchor stmt :
SELECT
CAST( 1 as SMALLINT ),
@string,
RIGHT( @string, ( LEN( @string ) - CHARINDEX( @delimiter, @string ) ) ),
SUBSTRING( @string, 1, CHARINDEX( @delimiter, @string ) - 1 )

However, it has the basic limitation of a CTE in that it requires distinct column names

So we have two options, define them at the start:


WITH delimiting_cte
( ID, original_text, remaining_text, delimited_text )
AS
(
 SELECT ...

 

or define them inside the anchor statement itself:

WITH delimiting_cte
AS
(
SELECT [ID] = CAST(1 AS SMALLINT),
[original_text] = @string...

Whatever we choose the only caveat is that each column must have a distinct column name.


Joining Statement:

Nice and simple, we need something to join the anchor and the recursive statement together:

-- Joining Statement :
UNION ALL

(Ever wonder what happens if you change this to UNION? INTERSECT? EXCEPT? Go on, give it a go and find out!)

 

Recursive Statement:

Now this is metaphorically where the magic happens. There are a couple of things here that are worth pointing out.

This is the complete opposite of the Static Statement, this will not run on it’s own! It needs the Anchor Statement to actually execute. This is because you are SELECTing from the CTE while still defining the CTE!!
I can’t think of another aspect in SQL that has this behaviour but if anyone knows, let me know!
On the same level, we call the column names themselves as well here but we don’t have to give these guys distinct column names. SQL Server is smart enough to get their position and match them up with the column name in the Anchor Statement, much like a regular UNION ALL expression.
However, like a regular UNION ALL expression, the columns in the Recursive Statement need to be the same data types as the Anchor Statement otherwise it throws a slight hissy fit errors out!

-- Recursive stmt : remove each delimited value to put in own row...
SELECT
CAST( c.ID + 1 as SMALLINT ),
c.original_text,
RIGHT( c.remaining_text, ( LEN( remaining_text ) - CHARINDEX( @delimiter, c.remaining_text ) ) ), -- remaining_text
SUBSTRING( c.remaining_text, 0, CHARINDEX( @delimiter, c.remaining_text )) -- delimited_text
FROM delimiting_cte as [c]

Terminator Clause:

SQL server has a MAXRECURSION setting in it. Pretty smart when you think about it unless you’re a fan of Infinite Loops. Unless specified otherwise, Recursive CTE’s will stop after 100 recursions. (and no, I’m not going to tell you how to increase this limit, it’s right there in the Books Online).
However, if we want the CTE to not error out, it may be a good idea to stop it before it hits that limit so that’s why we have Terminator clauses.

WHERE
-- Terminator clause: Until no delimiter left in the [remaining_text] column...
remaining_text like '%['+@delimiter+']%'

Now we can specify this inside the CTE or in the statement after it but like already stated, best have this somewhere (unless you like error messages…)

Conclusion:

And that’s the surface of recursive CTE’s, if not scratched then definitely slightly scraped.
They are so many more uses for these guys and some genius ones have already been blogged about.
Jeff Moden uses CTE’s for his Tally Table and even gives you a glimpse into his string splitter which I definitely recommend checking out.
And this is definitely my favourite SQL Server feature…so far 🙂

Advertisements

Author: Shane O'Neill

DBA, T-SQL and PowerShell admirer, Food, Coffee, Whiskey (not necessarily in that order)...

2 thoughts on “T-SQL Tuesday #77: Recursive Common Table Expression (CTE)”

What's your opinion?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s