Table Column Differences with T-SQL and PowerShell – Part 2

If this was a horror movie, it would be called “The Differencing”…duh duh duh!

The original post for this topic garnered the attention of a commenter who pointed out that the same result could be gathered using a couple of UNION ALLs and those lovely set-based EXCEPT and INTERSECT keywords.

I personally think that both options work and whatever you feel comfortable with, use that.

It did play on my mind though of what the performance differences would be…what would the difference in STATISTICS IO, TIME be? What would the difference in Execution Plans be? Would there even be any difference between the two or are they the same thing? How come it’s always the things I tell myself not to forget that I end up forgetting?

I have no idea about the last one but at least the other things we can check. I did mention to the commentor that I would find this an interesting blog topic if they wanted to give it a go and get back to me. All I can say is – Sorry, your mail must have got lost in transit. I’m sure it is a better blog post that mine anyway.

If you’re going to do it…

For this test, we’re not going to stop at a measely 4 columns per table. Oh no! For this one we’re going to go as wide as we can.

With a recent post by Kenneth Fisher ( blog | twitter ) out about T-SQL FizzBuzz, I’m going to create two tables, both of which will have incrementing column names i.e. col00001, col00002, …, col1024. Table1 will have all columns divisible by 3 removed while Table2 will have all columns divisible by 5 removed.

See, FizzBuzz can be useful!

So our table creation scripts…

SELECT TOP (1024)
    CASE WHEN v.number = 0
      -- Change this to 02 the second run through
THEN N'CREATE TABLE dbo.TableColumnDifference01 ('
    ELSE N' col' + RIGHT(REPLICATE('0', 8) + CAST(v.number AS nvarchar(5)), 4) + N' int,'
    END
FROM master.dbo.spt_values AS v
WHERE v.type = N'P'
AND (
-- Change this to '% 5' the second run through
v.number % 3 != 0
OR v.number = 0)
FOR XML PATH('')
TableCreationScript
See Note

NOTE: When you copy and paste the results of this query into a new window to open it, it is going to fail. Why? Well the end of the script is going to be along the lines of colN int, and it needs to be colN int). Why is it like this? Well it was taking to damn long to script that out. Feel free to change this to work for you. Hey if you do, let me know!

Now, how I’m going to do test this, is run each method 3 times (PIVOT, UNION, and PowerShell), then measure the third run of each method. This is mainly as I want to get rid of any “cold cache” issues with SQL Server where the plan has to be compiled or the data brought into memory.

…do it Pivot

So first up is the Pivot method from the last blog post. In case you’re playing along at home (and go on, do! Why should kids get all the fun) here is the code that I’m running.

And here is our results:

PivotMethodGridResults
Yup, those be columns

What we are really after though is the stats, execution plan and time to complete for our 3rd execution. Now as much as I love reading the messages tab for the stats information, I feel with blog posts that aesthetics is king, so I’m going to be using the free tool by Richie Rump ( twitter ) “Statistics Parser

Stats:

PivotMethodGridStats
Elapsed time: 00:00:00.136

 

Execution Plan:

PivotMethodGridPlan
Probably the first plan I’ve seen where the SORT isn’t the most expensive!

..do it UNION

Secondly we have what I dubbed “the UNION method” (no points for figuring out why) and the only change I’ve made to this script is to add in PARSENAME() and that’s only so that the script would..you know…work.

Results be like:

UnionMethodGridResults
Yep, Yep, Yep, Yep, Nope, Yep…

Stats:

UnionMethodGridStats
Elapsed time: 00:00:00.624

hmm…less Scan Counts but 5 times the reads…also 5 times slower than the PIVOT method. Maybe the execution plan will be prettier?

Execution Plan:

UnionMethodGridPlan.png
ehh…WHAT!

Yeah…so…that’s…that’s different from the first plan! I was right in my comment though, there is a concatenation operator (there’s actually 2, you may need to zoom in to find them though)

…do it PowerShell

Finally we have the PowerShell method. No messing about here, let’s get straight to it! I’m going to lump all the code together in one gist and I’ll be wrapping it in Measure-Command to get the speed of the command.

Get-Results

PoSHMethodGridResults
Yeah I’m liking VS Code more and more…

Get-Stats:

PoSHMethodGridStats.png
Elapsed time: 00:00:00.249

help *execution*; help *plan*

Would you believe that I couldn’t figure out how to get an execution plan for PowerShell 🙂

If anybody knows, hit me up!

Finishing off

You know at the start of this, I was fully expecting the PowerShell to win out, followed by the UNION method, because it’s use of UNION, EXCEPT, and INTERSECT which are basically made for this kind of problem, and the PIVOT method bringing up a distant last since PIVOTs have this complexity stigma attached to them and what is complex is normally slow.

From a sheer speed point of view, the actual results are:

  1. Pivot
  2. PowerShell
  3. Union

Who knew!?

I don’t think this is the end of my use of PowerShell or Union operators though. I’m not going to replace all the stuff that I can with Pivots. For one I just think that PowerShell and the Union operators are just too cool!

I actually like this result for two reasons.

  1. There are multiple way to do something in SQL, there are good ways and better ways. The main point is whatever option you choose, make sure you know what it entails and can justify it.
    Whatever works for you, works for you!
  2. You don’t know something, test it and find out! What you think the outcome may be, may not be true.

Now if you’ll excuse me, I want to figure out if there’s a way to return execution plans with PowerShell.

 

Chaos Theory, Compound Effects, and Consequences.

Straight away I want to apologise for the Nicolas Cage memes!

User Groups are great, aren’t they?

I just got back from the Reading User Group and I’m still in that post “User Group Glow”, also known as “Long Day Lethargy”, or “Twelve Hour Tiredness”.

They are great though! A chance to talk to other people in the SQL Server community, – a slight reminder that even if you work alone, people are still experiencing some of the same problems that you are (apparently everyone has to deal with multiple nested views, who knew!) – a chance to hear presentations on different topics, and pizza if you’re lucky (we were).

WishYouWereHere.PNG
They’re really great!

I realised during the session that the two presentations given during the User Group had a connection with a small issue with a table change I had been given with a developer.

Here’s what did not happen to me so you can watch out for it.

The Chaos Theory

NicCageChaos.PNG
Nic Chaos

 

Raul Gonzalez ( blog | twitter ) was first up with this presentation “Database Design Matters, Seriously”, showing us the chaos that can occur from not giving some serious thought into how you design your database.

His session is not yet up on his blog as I’m writing this but it will be soon so keep an eye out for that!

Now he had a lot of good points but, for brevity’s sake, the main chaos theory points here are what happens if you don’t take advantage of CHECK CONSTRAINTS, FOREIGN KEY CONSTRAINTS, and not specifying a columns NULLABILITY (yes, that’s a word!). SQL Server is a powerful program with many performance optimizations provided for you, but it’s not omniscient; it can only use the information that you give it!

His points on NULLABILITY (I mean, I think it’s a word) tied in nicely with the next presentation…

Compound Effects

NicCageChaos.PNG
Compound Effects

David Morrison ( blog | twitter ) followed up with his presentation on “Query Plan Deep Dives” (I had seen this at SQL Bits, but it’s a great session so I had no problems watching it again) and, as an aside, through his presentation he showed us the compound effects that can happen from not specifying a columns NULLABILITY (it’s got letters so it’s word-like…)

Now his slides and scripts are up on his blog and they do a great job of walking you through them so check them out and you’ll see the compound effects they create!

Here’s a little teaser…


-- now I want all people who's email isn't in the email table
SELECT /*C.FirstName ,
    C.LastName ,*/
    C.EmailAddress
FROM dbo.Contact AS C
WHERE C.EmailAddress NOT IN (SELECT E.EmailAddress
                             FROM dbo.Emails AS E)

GO
NULLABILITY.png
This should be A LOT simpler!!!

Consequences

Which brings us back around to consequences or as I like to put it “How I Pissed Off A Dev By Refusing A Simple Request”.

To be quite honest, it was a simple request. A requirement came in to expand a column datatype up to varchar(100), so one of devs wrote up a simple script and passed it onto the DBAs to check as part of the change control procedure.

ALTER TABLE tablename
ALTER COLUMN columnname varchar(100)

And I said no.

WHY???!!!“, you may shout at me (he certainly did), but I’m going to say to you what I said to him. “Give me a chance to explain before you take my head off, alright?”

ArgumentInvalid.PNG
Argue with a DBA, go on!

While there is nothing wrong with the above code syntactically (is that a word?) but I couldn’t approve it since that column was originally NOT NULL and the above script would have stripped the column of that attribute! Business requirements dictated that it should not allow NULLS, and hey, who are we to argue with that 😐

Double checking to see if the column is NULL or NOT NULL allowed me to see a problem with that code, one that many people would consider simple enough to just allow it through at a quick glance. Which could have opened up problems further down the line if it had run…

Thanks to the User Group, I now know that it could have a knock on effect with our query plans as well!

ALTER TABLE tablename
ALTER COLUMN columnname varchar(100) NOT NULL

There, that’s better!

DBAs deal with databases and consequences

YDS.PNG

 

DBAs get a lot of stick sometime, the “Default Blame Acceptors” or the “Don’t Bother Asking” but a lot of the time, it’s not that we want to say no, it’s just that we have to take into consideration a thousand little things that could snowball into 1 giant problem.

With the rise of DevOps, check out the latest T-SQL Tuesday, DBAs have gone from going

“No”

to somewhere along the lines of

“Not this second, let me check it out and see what we can do”

If pressed further, we may rely on the good, old “it depends” though. Hey, clichés are there for a reason; they work!

It just goes to show that, like the IT profession, DBAs are constantly evolving.
Continuosly learning, checking out new helping technologies, and going to User Groups are going to help us to deal with it.

Just remember, in the end,

LeFin.PNG

P.S. I should probably mention that the Nicolas Cage memes are because of this blog post by Nate Johnson ( blog ) that I enjoyed so much that I had to do something in response. I’m not normally this crazy, I swear!

Problems Creating XML Schema Collection

Ever created an XML Schema collection before? Our developers work with a lot of XML so I wasn’t surprised when eventually a request came in about permissions with XML SCHEMA COLLECTION.

Surprised that they had a permissions issue, yes, but not surprised that they were working with XML.

Why is it “an XML” and not “a XML”?

For information purposes, I’d normally provide a brief description of what an XML SCHEMA COLLECTION is but, being completely honest, I’m still not sure I can vocalize it in an understandable way. It’s kind of like explaining the colour purple without using other colours (and yes, that’s colour with a ‘u’).

I know what it is, I just can’t explain it properly…yet

So what I’m going to do is point you to the link for Microsoft docs for XML Schema Collection (done) and just gloss right over it (nothing to see here).

Permissions Shane, you mentioned permissions.

Right, sorry.

Investigation first. This was on the Development server and they had emailed me the creation code along with the error message they had received, which was this guy:

CREATE XML SCHEMA COLLECTION name AS
N'RANDOM XML ALERT...'

Msg 2797, Level 16, State 2, Line 20
The default schema does not exist.

However, when I ran the code, I got a different error message, mainly this guy:

CREATE XML SCHEMA COLLECTION name AS
N'RANDOM XML ALERT...'

Msg 9459, Level 16, State 1, Line 3
XML parsing: line 2, character 34, undeclared prefix

Which meant I had to go back and tell them to fix their darn XML.

Now I’m pretty sure we have a problems though:

That error message did not fill me with confidence. Yeah, sure they had bad XML but I was now fairly sure that there was also a permissions problem. Mainly because if there’s one thing that I’ve learned so far, it’s this:

No good can come from two different people getting two different errors from the same code!

Proper XML:

Proper XML was provided and ran by the developers but the same error message came back…

CREATE XML SCHEMA COLLECTION name AS
N'random xml alert...'

Msg 2797, Level 16, State 2, Line 20
The default schema does not exist.

The difference this time was, when I ran the code, I received the following message

Command(s) completed successfully.

…That’s not good.

Developers happy. DBAs not.

At this stage, I’m nearly convinced that it’s a permissions issue.

Checking the permissions required to create an XML Schema Collection doesn’t help, since the Devs were part of the db_dlladmin database role, so that should have been covered.

In my head I’m thinking of all the things that I can do to try and troubleshoot this problem.

  1. Extended Events my session,
  2. Ask my Senior DBA,
  3. Cry

Then I realize that I’m jumping the gun again and I slow down, and check the first error message again. This time without the developers shouting in my ear, about permissions.

The DEFAULT schema

That says “schema”, not “permission”. Maybe the difference between the DBAs and the Devs was to do with default schema and not permissions this time. Let’s check it out!

SELECT
    IIF(principal_id = 1, 'DBA', 'Dev') AS DBPrincipal,
    default_schema_name
FROM sys.database_principals
WHERE principal_id IN (1, 14);
DiffSchema
Devs don’t even have a default schema!

Wait, so it was a SCHEMA issue?

Have you checked the Examples section of Microsoft Docs? Normally, they are a great source of material for examples but if you check out the examples for XML Schema Collection , not one of them shows the schema name in the examples.

So, I walk over to the original developer and his machine, change his code to…

 
CREATE XML SCHEMA COLLECTION dbo.name_test AS 
N'RANDOM XML ALERT...' 

And it works!

Apparently what had happened was the Senior Dev had gotten sick of developers not specifying the schema when creating objects and had asked the Senior DBA to remove the default schema for Developers. That seems to work (by that I mean, everything error-ed out correctly), they were happy that developers now had to specify the schema, and life moved on.

Yet, later on, when the developer read the docs for XML Schema Collection, and saw that there was no schema in the examples, it didn’t cross their mind that a schema was required. So they didn’t specify it and that, in combination with no default schema, caused this whole mess.

The (fast food) takeaways:

  1. Slow down! Don’t jump the gun,
  2. Developers don’t know everything,
  3. It’s not always permissions,
  4. Schemas are important(!),
  5. Having checklists for investigations are highly useful, and
  6. Documentation, especially on past decisions, are even more useful!

Apologies for the blurb of a blog post but I have to go.
Apparently, there’s a permissions issue with a Stored Procedure now…

 

 

Keeping New Lines in SQL Server.

Where I compare scripts to BBQ because of course I would 😐

I have this personal opinion that one sign of a good DBA is their ability to automate things and, before the DBA world found PowerShell, the way to do this was with T-SQL.

For example, a T-SQL script to get permissions assigned to a database principal could also include a column to REVOKE those permissions. This could be “automated” with some dynamic SQL.

SELECT dprin.name AS DatabasePrincipalName,
       OBJECT_NAME(dperm.major_id) AS ObjectName,
       dperm.permission_name AS PermissionName,
       N'REVOKE '
         + dperm.permission_name
         + N' ON OBJECT::'
         + OBJECT_NAME(dperm.major_id)
         + N' FROM '
         + dprin.name COLLATE Latin1_General_CI_AS AS RevokeMe
FROM sys.database_permissions AS dperm
INNER JOIN sys.database_principals AS dprin
  ON dperm.grantee_principal_id = dprin.principal_id
WHERE dprin.name = 'public';
RevokePermissions
This can be improved A WHOLE LOT…

What about if we want to improve this?

This is nice but what about if we are paranoid forward-thinking enough to realize that this could cause us problems?

“How?” You ask. Well what happens if there existed another database, say [NeedsAllPermissions], with the same table name and the same login has permissions on it.

Are you going to revoke permissions from that database? It needs ALL of them! It says so in the name!

So in an effort to not shoot ourselves in the foot, we add in the database name to our revoke script.

SELECT dprin.name AS DatabasePrincipalName,
       OBJECT_NAME(dperm.major_id) AS ObjectName,
       dperm.permission_name AS PermissionName,
       N'USE '
         + DB_NAME()
         + 'GO'
         + N'REVOKE '
         + dperm.permission_name
         + N' ON OBJECT::'
         + OBJECT_NAME(dperm.major_id)
         + N' FROM '
         + dprin.name COLLATE Latin1_General_CI_AS AS RevokeMe
FROM sys.database_permissions AS dperm
INNER JOIN sys.database_principals AS dprin
  ON dperm.grantee_principal_id = dprin.principal_id
WHERE dprin.name = 'public';

WithDBNoChar10

Yes, we’re only using our database now!

So all is well with the world…

Until the day comes when you actually want to revoke permissions to that user. So you run the above code, copy the RevokeMe column and paste it into the management window. and you get…

NoNewLine
No GO my friend…

GO is a special little guy. It’s not exactly T-SQL. It’s a way of telling the SQL Server Management Studio (SSMS) to send everything before it, from the beginning of the script or the preceding GO, to the SQL Server instance.

If you read the documents, the main point to take away is…

A Transact-SQL statement cannot occupy the same line as a GO command. However, the line can contain comments.

GO is a special little snowflake and needs to be on his own line then. Simple enough if you know that SQL Server converts CHAR(10) into a New Line.

If you didn’t know that, well you know that now….P.S. CHAR(13) is a carriage return 😉

So let’s update our script with some CHAR(10) and see what happens then.

SQL & BBQ, both work well with CHAR

SELECT dprin.name AS DatabasePrincipalName,
       OBJECT_NAME(dperm.major_id) AS ObjectName,
       dperm.permission_name AS PermissionName,
       N'USE '
         + DB_NAME()
         + CHAR(10)
         + 'GO'
         + CHAR(10)
         + N'REVOKE '
         + dperm.permission_name
         + N' ON OBJECT::'
         + OBJECT_NAME(dperm.major_id)
         + N' FROM '
         + dprin.name COLLATE Latin1_General_CI_AS AS RevokeMe
FROM sys.database_permissions AS dperm
INNER JOIN sys.database_principals AS dprin
  ON dperm.grantee_principal_id = dprin.principal_id
WHERE dprin.name = 'public';

 

WithDBAndChar10
That smokey, wood-fire CHAR

Now, when we paste the RevokeMe column to a new window, we get…

StillNoNewLine
Oh look, it’s a wild, rare nothing…I love them!

…absolutely no difference. 🙂

Why am I smiling?

Here, around 500 words in, we get to the meat of our post. How do we keep new lines when copying in SQL Server?

Tools | Options | Query Results | Results to Grid | Retain CR/LF on copy or save

Two things need to be done here.

  1. This checkbox needs to be enabled.

    KeepNewLines
    CHECK!
  2. A new window needs to be opened and used.

New window open, we run our script again, and this time, when we copy and paste the results, we get…

FinallyWorks
Winner, Winner, BBQ Chicken Dinner

Dessert:

So if you are using T-SQL to create scripts, and you’re having this problem with GO or just new lines in general, make sure that the “retain CR/LF on copy and save” checkbox is ticked.

Now, improve that script more, throw it in a stored procedure, and you never know, it may be semi-useful. 🙂

Why You May Need More Than FOR XML PATH(”)

XML – both easy, easy, lemon easy and difficult, difficult, lemon difficult…

Working on a blog post and I came up against a problem that I had heard of before but did not spend much brain-CPU power against.

know I’m going to run into this again so let’s document this for future me. Oh he’s going to appreciate this so much!

Commas are all the rage nowadays:

There are a fair number of questions nowadays about returning data from a database in a comma separated string. Sure the application should probably do that but hey, database servers are expensive, why not get some bang for your bucks!

Done!-ish…

SQL Server 2017 has this lovely function called STRING_AGG()

Pop in your column and your separator and it takes care of it for you!

…wait not everyone has SQL Server 2017 yet?

…wait don’t have SQL Server 2017 yet? Oh, I should really fix that…

Pre-SQL Server 2017:

So what can we do if we are not on SQL Server 2017? Take the advice that I was given for most of my life and STUFF it!

The STUFFing:

Our playground:

USE tempdb;

-- Test table
SELECT dt.comments
FROM ( VALUES ( 'XML is the bomb!'),
              ( '& JSON is cool too...')
     ) AS dt (comments);
TestTable
testing stuff…

 

Plain Old STUFFing:

I’m not the biggest fan of stuffing if I’m honest…tastes like dirt to me but hey, it works in 99% of situations…

SELECT STUFF((SELECT ', ' + dt.comments
              FROM ( VALUES ( 'XML is the bomb!'),
                            ( '& JSON is cool too...')
                   ) AS dt (comments)
              FOR XML PATH('')
             ), 1, 1, '') AS CommentsEnXML;

 

StandardWay
…that’s…that’s not what I said!

Bacon Sausage STUFFing however:

So…SQL Server is trying to be too helpful. What do we do? I normally turn to the SQL community and people like Rob Farley ( blog | twitter ), who has a lovely post about this.

So let’s try it out.

SELECT STUFF((SELECT ', ' + dt.comments
              FROM ( VALUES ( 'XML is the bomb!'),
                            ( '& JSON is cool too...')
                   ) AS dt (comments)
              FOR XML PATH(''),
              TYPE).value('.', 'varchar(max)'
             ), 1, 1, '') AS CommentsEnXML;

 

BetterWay
That exclamation mark is annoying me 😡

I just eat the turkey around the STUFFing:

Little hiccup in preparing for my next post. Thankfully I learn from my mistakes and failures (there’d be no help for me otherwise!).

I’ll leave this post with a quote from the blog of the main man himself:

It’s a habit I need to use more often.

Yeah, me too Rob, me too…

SQL Prompt: For Your Group By Problems

I’m going to point people to this that have “My Group By isn’t working” questions…

The Joys of SQL:

Did you know that the SQL language allows you to do amazing analysis of data such as aggregate functions?

SELECT t.session_id,
       t.request_id,
       SUM(t.user_objects_alloc_page_count) AS UserObjectAllocated,
       SUM(t.user_objects_dealloc_page_count) AS UserObjectDeallocated
FROM sys.dm_db_task_space_usage AS t
GROUP BY t.session_id,
         t.request_id;
Works Written
0’s! Amazing!

The Pains of SQL:

But…if you forget to put in the GROUP BY clause, as a ski instructor once said, you’re going to have a bad time!

Need Group By
Pizza…French Fries…Pizza

The Repetitiveness of Questioners:

So some eager yet lost scholar ventures into this land of aggregate functions, reads the error message and adds in a GROUP BY clause.

SELECT t.session_id,
       t.request_id,
       SUM(t.user_objects_alloc_page_count) AS UserObjectAllocated,
       SUM(t.user_objects_dealloc_page_count) AS UserObjectDeallocated
FROM sys.dm_db_task_space_usage AS t
GROUP BY t.session_id;
Needs second column
French Fries….

Now don’t scoff, this happens. I mean the error message is still red, looks nearly identical to the original one encountered, and can cause a rage-inducing damnation of SQL Server error messages.

The Enlightenment of Questioners:

Trawling the bulletin boards, question sites, and forums – okay maybe a quick question online, it’s called poetic exaggeration people! – they eventually learn the folly of their ways and correct their mistake.

SELECT t.session_id,
       t.request_id,
       SUM(t.user_objects_alloc_page_count) AS UserObjectAllocated,
       SUM(t.user_objects_dealloc_page_count) AS UserObjectDeallocated
FROM sys.dm_db_task_space_usage AS t
GROUP BY t.session_id,
         t.request_id;
Works Written
PIZZA!

The Euphoria of SQL Prompt:

Now I consider myself lucky that work has invested in the RedGate tools, and right now, especially SQL Prompt.

I’m not going to talk about “Save and Recover Lost Tabs” – saved my ass many times.
I’m not going to talk about “Code Formatting” – saved my sanity many times.
I’m going to talk about “Autocomplete”.

A well-known secret with SQL Prompt’s autocomplete is the snippets feature. With this, you can increase your productivity by 75% from typing out G R O U P [space] B Y and instead use gb and hit tab.

gb shortcut
Wait? I can order Pizza?

The Ecstasy of SQL Prompt:

Do not get me wrong, a 75% increase in productivity? I’ll take that!

That is a well-known secret though, and it’s slightly hard to get excited about a well-known secret.

However, what if I told you that SQL Prompt had another lesser-known secret that can increase your productivity and ensure that you do not forgot to add the necessary columns to your GROUP BY clause?

Interested? Ah c’mon!
You sure you’re not interested?…. That’s better!

So first of all, let us increase the number of non-aggregated columns in our SELECT to include database_id, is_remote_work, and exec_context_id. Including our session_id and request_id these are all columns that we are going to need to add to our GROUP BY clause, because…well…business logic.

Only problem is ain’t nobody got time for that.
SQL Prompt knows this and adds the following little snippet after a GROUP BY autocomplete.

Shortcut shortcut
Whoa! Whoa! You can deliver Pizza to me?

Hitting tab on that includes everything in the SELECT that is not part of an aggregate function, leaving us to concern ourselves with loftier things…

Final Works
Like whatever happened to Pizza in 30 mins or free?

Pizza:

Now I don’t work for pizza RedGate, I’m not affiliated with them, and I don’t get any money off of them. In fact, I’d say that they’d happily pay me not to write about them but when I found this autocomplete feature, I got too happy not to share it!

So save yourself the trouble of typing everything out and spare yourself the pain of error messages.

Use this lesser-known secret and have more time for pizza.

Table Column Differences with T-SQL and PowerShell

Where I start off with one idea, than veer widely towards another…much like me normally

As part of my post for the latest T-SQL Tuesday, I talked about community scripts. One of these scripts was by Jana Sattainathan ( blog | twitter ) and it was to do with finding tables with similar table structure. This is a great script for initially finding the tables and something that I am not sure that I would have thought of!

However, do me a favour and in your favourite search engine, do a search for “stackoverflow sql server wide tables” (Google search included for your benefit).

For some reason, people have a slight fascination with wwwwwiiiiiiiiddddeeeee tables!

So thanks to the excellent work done by Jana, you have now identified the tables with similar structure, but what about if we want to know which column names match exactly?

I recently had this problem with consolidating copies of tables of medium to wide length in a database and needed to know. So I created a little snippet of code to help me out and I present it to you, in all of it’s unashamed vulnerability, in the hope that it can help you out.

Be warned: Thar be PIVOTs ahead!

Ahoy, matey!

So we’ve got two tables that are slightly different, not completely different (hey, they could be) but different enough to be similar without matching… (differently similar?)

Our task is to find out which columns are in both tables, which columns are only in 1 table, and which columns are only in the other.

Now this isn’t so bad manually, especially if you only need to do this once, maybe twice. What happens though if you need to do it with multiple pairs of tables? Or multiple pairs of wide tables like our search engines showed us above?

So let us do what all DBAs should do when they have a repetitive, manual task stopping them from doing more fun important things: Automate it away!

Avast Ye!

Our two tables are as follows:

CREATE TABLE dbo.DifferenceTable01
(
    col1 int,
    col2 int,
    col4 int,
    col6 int
);
GO

CREATE TABLE dbo.DifferenceTable02
(
    col2 int,
    col3 int,
    col4 int,
    col5 int
);
GO

Now we can use the sys.columns table to check out the different columns in the table but the results are, well, meh

SELECT 
    OBJECT_NAME([object_id]) AS TableName,
    [name] AS ColumnName
FROM sys.columns 
WHERE [object_id] IN
(
    OBJECT_ID(N'dbo.DifferenceTable01', N'U'),
    OBJECT_ID(N'dbo.DifferenceTable02', N'U')
);
GO
sys_columns_results.PNG
Even if I ordered it, it would still be “meh”…

That is not really going to work out for us…
So I’m not liking the look of this, and going through the results, it seems to me that these results are just not useful. This isn’t the computers fault – it’s done exactly what I’ve told it to do – but a more useful result would be a list of columns and then either a simple ‘Yes’, or a ‘No’.

There’s syntax for this…PIVOT

Thar She Blows!

As anyone who has seen me dance can attest to, I can neither shake, rattle, nor roll. And I definitely do not normally PIVOT. However, as I’m trying to know my tools, I do know that this is the syntax that I need.

PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output.

So after looking up the syntax for this once (ok, 5 times!) I managed to come out with a script that I’m reasonably happy with.

And the results are a lot easier to read 🙂

human_readable_format
Can also be extended to more than 2 tables!

So much better! This way, no matter how long the tables, I can easily figure out what columns are in what table(s) based on their names.

Shiver Me Timbers!

Isn’t it always the way after you’ve done something, you realise a much easier way to do it?

The Old Seadog!

I’ve talked about automation in this post and I have yet to mention PowerShell. I bow my head in shame.

Especially when it has a native command like Compare-Object.

help Compare-Object -ShowWindow

Synopsis
Compares two sets of objects.

Description
The Compare-Object cmdlet compares two sets of objects. One set of objects is the “reference set,” and the other set is the “difference set.”

The result of the comparison indicates whether a property value appeared only in the object from the reference set (indicated by the <= symbol), only in the object from the difference set (indicated by the => symbol) or, if the IncludeEqual parameter is specified, in both objects (indicated by the == symbol).

If the reference set or the difference set is null ($null), this cmdlet generates a terminating error.

So the question we have to ask ourselves now is “Can we do what we did with the PIVOTs easier?”

The Old Salt.

I’ve recently found out about splatting so, of course, I’ve started to use it EVERYWHERE!

Let’s “splat” our two parameters

$Table01Param = @{
ServerInstance = 'localhost\SQLDEV2K14'
Database = 'master'
Query = "SELECT OBJECT_NAME([object_id]) AS TableName, [name] AS ColumnName FROM sys.columns WHERE [object_id] = OBJECT_ID(N'dbo.DifferenceTable01', N'U');"
}

$Table02Param = @{
ServerInstance = 'localhost\SQLDEV2K14'
Database = 'master'
Query = "SELECT OBJECT_NAME([object_id]) AS TableName, [name] AS ColumnName FROM sys.columns WHERE [object_id] = OBJECT_ID(N'dbo.DifferenceTable02', N'U');"
}

And we now save ourselves the trouble of writing the parameters to the functions.

Invoke-Sqlcmd @Table01Param
Invoke-Sqlcmd @Table02Param
splat-attack
SPLAT-ATTACK!!

Since everything is now set up, we can just pass those results into 2 different variable holders and use our Compare-Object.

$Table01 = Invoke-Sqlcmd @Table01Param
$Table02 = Invoke-Sqlcmd @Table02Param

Compare-Object -ReferenceObject $Table01 -DifferenceObject $Table02 -Property ColumnName -IncludeEqual
Non-splat-attack
It’s annoyingly wide without splatting…

And for anyone saying “Yeah, but who knows what ‘SideIndicator’ means!”,  I advise you to read the help with PowerShell. It helps a-lot!

The result of the comparison indicates whether a property value appeared only in the object from the reference set (indicated by the <= symbol), only in the object from the difference set (indicated by the => symbol) or, if the IncludeEqual parameter is specified, in both objects (indicated by the == symbol).

If you are still complaining about it – “You can’t have the HumanReadableFormat column like you did in T-SQL” – then please stop. There are experts out there who can make you go “wow” with what they do. I’m still learning but even I have an answer to that.

Compare-Object -ReferenceObject $Table01 -DifferenceObject $Table02 -Property ColumnName -IncludeEqual |
Select-Object -Property *,@{N='HRF';e={switch ($_.SideIndicator)
{'==' {'Both'}
'=>' {'Table 2 only'}
'<=' {'Table 1 only'}
}
}}
hrf
Go on! Compare this with T-SQL

Land-ho!

I’m not trying to argue with who would win between T-SQL and PowerShell, that was definitely not my intention with this post. T-SQL is my first language love, PowerShell is very quickly becoming my second (and that’s not just because it’s the only other langauge I know).

They both accompolish the same thing!

It’s just a matter of preference, that’s all, not a matter of competition. You want to work in SSMS, PIVOT it up! You like PowerShell, hammer that Compare-Object nail!
Whatever works for you.

My first idea for this was T-SQL but it turns out for me that PowerShell is the easier option. Don’t rule out one, just because the other was the first thing to pop into your head.

Now I’m going to go before I make a joke about Pirates, SQL and the R language…