Table Column Differences with T-SQL and PowerShell

Where I start off with one idea, than veer widely towards another…much like me normally

Advertisements

As part of my post for the latest T-SQL Tuesday, I talked about community scripts. One of these scripts was by Jana Sattainathan ( blog | twitter ) and it was to do with finding tables with similar table structure. This is a great script for initially finding the tables and something that I am not sure that I would have thought of!

However, do me a favour and in your favourite search engine, do a search for “stackoverflow sql server wide tables” (Google search included for your benefit).

For some reason, people have a slight fascination with wwwwwiiiiiiiiddddeeeee tables!

So thanks to the excellent work done by Jana, you have now identified the tables with similar structure, but what about if we want to know which column names match exactly?

I recently had this problem with consolidating copies of tables of medium to wide length in a database and needed to know. So I created a little snippet of code to help me out and I present it to you, in all of it’s unashamed vulnerability, in the hope that it can help you out.

Be warned: Thar be PIVOTs ahead!

Ahoy, matey!

So we’ve got two tables that are slightly different, not completely different (hey, they could be) but different enough to be similar without matching… (differently similar?)

Our task is to find out which columns are in both tables, which columns are only in 1 table, and which columns are only in the other.

Now this isn’t so bad manually, especially if you only need to do this once, maybe twice. What happens though if you need to do it with multiple pairs of tables? Or multiple pairs of wide tables like our search engines showed us above?

So let us do what all DBAs should do when they have a repetitive, manual task stopping them from doing more fun important things: Automate it away!

Avast Ye!

Our two tables are as follows:

CREATE TABLE dbo.DifferenceTable01
(
    col1 int,
    col2 int,
    col4 int,
    col6 int
);
GO

CREATE TABLE dbo.DifferenceTable02
(
    col2 int,
    col3 int,
    col4 int,
    col5 int
);
GO

Now we can use the sys.columns table to check out the different columns in the table but the results are, well, meh

SELECT 
    OBJECT_NAME([object_id]) AS TableName,
    [name] AS ColumnName
FROM sys.columns 
WHERE [object_id] IN
(
    OBJECT_ID(N'dbo.DifferenceTable01', N'U'),
    OBJECT_ID(N'dbo.DifferenceTable02', N'U')
);
GO
sys_columns_results.PNG
Even if I ordered it, it would still be “meh”…

That is not really going to work out for us…
So I’m not liking the look of this, and going through the results, it seems to me that these results are just not useful. This isn’t the computers fault – it’s done exactly what I’ve told it to do – but a more useful result would be a list of columns and then either a simple ‘Yes’, or a ‘No’.

There’s syntax for this…PIVOT

Thar She Blows!

As anyone who has seen me dance can attest to, I can neither shake, rattle, nor roll. And I definitely do not normally PIVOT. However, as I’m trying to know my tools, I do know that this is the syntax that I need.

PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output.

So after looking up the syntax for this once (ok, 5 times!) I managed to come out with a script that I’m reasonably happy with.

And the results are a lot easier to read 🙂

human_readable_format
Can also be extended to more than 2 tables!

So much better! This way, no matter how long the tables, I can easily figure out what columns are in what table(s) based on their names.

Shiver Me Timbers!

Isn’t it always the way after you’ve done something, you realise a much easier way to do it?

The Old Seadog!

I’ve talked about automation in this post and I have yet to mention PowerShell. I bow my head in shame.

Especially when it has a native command like Compare-Object.

help Compare-Object -ShowWindow

Synopsis
Compares two sets of objects.

Description
The Compare-Object cmdlet compares two sets of objects. One set of objects is the “reference set,” and the other set is the “difference set.”

The result of the comparison indicates whether a property value appeared only in the object from the reference set (indicated by the <= symbol), only in the object from the difference set (indicated by the => symbol) or, if the IncludeEqual parameter is specified, in both objects (indicated by the == symbol).

If the reference set or the difference set is null ($null), this cmdlet generates a terminating error.

So the question we have to ask ourselves now is “Can we do what we did with the PIVOTs easier?”

The Old Salt.

I’ve recently found out about splatting so, of course, I’ve started to use it EVERYWHERE!

Let’s “splat” our two parameters

$Table01Param = @{
ServerInstance = 'localhost\SQLDEV2K14'
Database = 'master'
Query = "SELECT OBJECT_NAME([object_id]) AS TableName, [name] AS ColumnName FROM sys.columns WHERE [object_id] = OBJECT_ID(N'dbo.DifferenceTable01', N'U');"
}

$Table02Param = @{
ServerInstance = 'localhost\SQLDEV2K14'
Database = 'master'
Query = "SELECT OBJECT_NAME([object_id]) AS TableName, [name] AS ColumnName FROM sys.columns WHERE [object_id] = OBJECT_ID(N'dbo.DifferenceTable02', N'U');"
}

And we now save ourselves the trouble of writing the parameters to the functions.

Invoke-Sqlcmd @Table01Param
Invoke-Sqlcmd @Table02Param
splat-attack
SPLAT-ATTACK!!

Since everything is now set up, we can just pass those results into 2 different variable holders and use our Compare-Object.

$Table01 = Invoke-Sqlcmd @Table01Param
$Table02 = Invoke-Sqlcmd @Table02Param

Compare-Object -ReferenceObject $Table01 -DifferenceObject $Table02 -Property ColumnName -IncludeEqual
Non-splat-attack
It’s annoyingly wide without splatting…

And for anyone saying “Yeah, but who knows what ‘SideIndicator’ means!”,  I advise you to read the help with PowerShell. It helps a-lot!

The result of the comparison indicates whether a property value appeared only in the object from the reference set (indicated by the <= symbol), only in the object from the difference set (indicated by the => symbol) or, if the IncludeEqual parameter is specified, in both objects (indicated by the == symbol).

If you are still complaining about it – “You can’t have the HumanReadableFormat column like you did in T-SQL” – then please stop. There are experts out there who can make you go “wow” with what they do. I’m still learning but even I have an answer to that.

Compare-Object -ReferenceObject $Table01 -DifferenceObject $Table02 -Property ColumnName -IncludeEqual |
Select-Object -Property *,@{N='HRF';e={switch ($_.SideIndicator)
{'==' {'Both'}
'=>' {'Table 2 only'}
'<=' {'Table 1 only'}
}
}}
hrf
Go on! Compare this with T-SQL

Land-ho!

I’m not trying to argue with who would win between T-SQL and PowerShell, that was definitely not my intention with this post. T-SQL is my first language love, PowerShell is very quickly becoming my second (and that’s not just because it’s the only other langauge I know).

They both accompolish the same thing!

It’s just a matter of preference, that’s all, not a matter of competition. You want to work in SSMS, PIVOT it up! You like PowerShell, hammer that Compare-Object nail!
Whatever works for you.

My first idea for this was T-SQL but it turns out for me that PowerShell is the easier option. Don’t rule out one, just because the other was the first thing to pop into your head.

Now I’m going to go before I make a joke about Pirates, SQL and the R language…