A couple of weeks ago I talked about Best Practices and how it was important to understand why that something was best practice.
Well another aspect to take from that post was the importance of knowing; if you do not know something, then it is important for you to learn it.
That being said something that I did not know, but recently learned, was that there is nothing stopping a Primary Key from also being a Foreign Key.
there is nothing stopping a Primary Key from also being a Foreign Key
When you think about it, this lack of knowledge came from incorrect assumptions. You read Primary KEY and Foreign KEY and you think to yourself, well they are both keys aren’t they? Same thing.
That is the trap that I fell down and the trap is not knowing and making invalid assumptions. So let’s hopefully help you with knowing what the differences between them are.
First let’s create our tables:
-- Create our Foo and Bar table. IF OBJECT_ID(N'dbo.Bar', N'U') IS NOT NULL DROP TABLE dbo.Bar; GO IF OBJECT_ID(N'dbo.Foo', N'U') IS NOT NULL DROP TABLE dbo.Foo; GO CREATE TABLE dbo.Foo ( FooID int IDENTITY(1, 1) NOT NULL CONSTRAINT [PK_dbo_Foo] PRIMARY KEY CLUSTERED, FooValue char(8) ); GO CREATE TABLE dbo.Bar ( BarID int CONSTRAINT [PK_dbo_Bar] PRIMARY KEY CLUSTERED, BarValue char(8), CONSTRAINT [FK_dbo_Bar_dbo_Foo] FOREIGN KEY (BarID) REFERENCES dbo.Foo (FooID) ); GO -- Declare our holding table. DECLARE @FooIDs TABLE (FooID int); -- Insert into Foo. INSERT INTO dbo.Foo (FooValue) -- Gather the new ID's from foo. OUTPUT inserted.FooID INTO @FooIDs (FooID) SELECT LEFT(NEWID(), 8) FROM sys.all_objects; -- Insert Foo's ID into the Bar table. INSERT INTO dbo.Bar (BarID, BarValue) SELECT FooID, RIGHT(NEWID(), 8) FROM @FooIDs; -- Select our tables. SELECT * FROM dbo.Foo; SELECT * FROM dbo.Bar; GO
Primary Keys (PK)
a column or combination of columns that contain values that uniquely identify each row in the table
A Primary key is a column or combination of columns that contain values that uniquely identify each row in the table.
That’s it; it just has to uniquely identify the row.
…btw you are going to hear the word “unique” a lot with regard to Primary keys…
Now there are other types of keys that can do the same (Surrogate Keys, Composite Keys, Unique Keys, Alternate Keys, etc) but these are outside the scope of this post.
So if we attempt to insert another record into our Primary Key column/column combo that violates this unique, identifying property, we’re going to have a bad time.
We have to use IDENTITY_INSERT syntax because I’ve created the tables using IDENTITY and, if we were to insert a record into the identity column without turning IDENITY_INSERT on first, then another error pops up before the PK violation error that we want.
However, if we were to create our table without specifying the Primary Key constraint then the above insert would work and you would have duplicate entries populating your table, silently and deadly.
Foreign Keys (FK)
a column or combination of columns that is used to establish and enforce a link between the data in two tables to control the data that can be stored in the foreign key table
A Foreign key is a column or combination of columns that is used to establish and enforce a link between the data in two tables to control the data that can be stored in the foreign key table.
That’s it; it just has to establish and enforce a link between data.
If we try to violate this link, SQL Server will throw a different error and not let us.
Yet if we were to create our table without specifying our Foreign key, then there would be no real link between our tables. So if our business depends on a record not being in Bar without being in Foo and we don’t have a constraint specified to that extent…
Unfortunately, I’m hard pressed to think of a way you can ensure this.
In fact, I don’t even like the above definition for Foreign keys as it states that two tables are necessary for a Foreign key constraint when only one is needed:
IF OBJECT_ID(N'dbo.HR', N'U') IS NOT NULL DROP TABLE dbo.HR; GO CREATE TABLE dbo.HR ( EmployeeID int CONSTRAINT [PK_dbo_HR] PRIMARY KEY CLUSTERED, FirstName varchar(20) NOT NULL, SurName varchar(20) NOT NULL, ManagerID int NULL CONSTRAINT [FK_EmployeeID_Manager_ID] FOREIGN KEY REFERENCES dbo.HR (EmployeeID) ); GO -- Check for foreign key SELECT * FROM sys.foreign_keys WHERE [parent_object_id] = OBJECT_ID('dbo.HR'); GO -- Check for primary key SELECT * FROM sys.key_constraints WHERE [parent_object_id] = OBJECT_ID('dbo.HR'); GO -- Check for everything. EXEC sp_helpconstraint'dbo.HR'; GO
If you check the two definitions for Primary key and Foreign key you’ll see that, even though they are both called keys, they serve two different purposes; namely identifying rows and enforcing links.
And those two purposes are not mutually exclusive!
A column/column combo that identifies a row can also be used to enforce a link back to another table (or itself, as shown above with Foreign keys).
The assumption, that if you were one then you couldn’t be the other, was incorrect. If your business rules call for it, don’t let a column being one type of key stop it from being the other.
Let’s not go to C to get to B
You may be thinking that this is a lot of hassle and that’s not an unfair thought.
Why not just not declare any key constraints and let the data fall as they may?
I will admit that is a fair bit of effort to constantly define and declare the different key constraints when creating tables, especially as Developers are focused on efficiency, but it is worth it!
Now, while the following appears to hold true for any foreign key constraint (I haven’t finished testing yet), I found these while testing the above so I’m going to include them here.
SQL Server loves primary key and foreign key constraints.
A primary key gets a unique index created on it to enforce that it is unique and, since it has an index placed upon it, it can be used to speed up query selection.
A foreign key is special though as it forces a constraint and the query optimiser can use these constraints to take certain shortcuts 🙂
Query Optimizer (QO) Examples
-- Join our tables SELECT F.* FROM dbo.Foo AS [F] JOIN dbo.Bar AS [B] ON F.FooID = B.BarID; SELECT B.* FROM dbo.Foo AS [F] JOIN dbo.Bar AS [B] ON F.FooID = B.BarID;
Thanks to our constraint the QO knows that if something were to exist in Bar, it must be in Foo and, since we are not selecting or ordering anything from Foo, it straight up ignores it.
Less reads, less IO; in general all around better performance.
Does this work with other joins though?
Like above, with for something to exist in Bar it must exist in Foo, see if you can figure out why the QO figures it is safe to ignore some joins.
-- L.Join SELECT F.* FROM dbo.Foo AS [F] LEFT JOIN dbo.Bar AS [B] ON F.FooID = B.BarID; SELECT B.* FROM dbo.Foo AS [F] LEFT JOIN dbo.Bar AS [B] ON F.FooID = B.BarID;
-- R.Join. SELECT F.* FROM dbo.Foo AS [F] RIGHT JOIN dbo.Bar AS [B] ON F.FooID = B.BarID; SELECT B.* FROM dbo.Foo AS [F] RIGHT JOIN dbo.Bar AS [B] ON F.FooID = B.BarID;
-- F.Join SELECT F.* FROM dbo.Foo AS [F] FULL OUTER JOIN dbo.Bar AS [B] ON F.FooID = B.BarID; SELECT B.* FROM dbo.Foo AS [F] FULL OUTER JOIN dbo.Bar AS [B] ON F.FooID = B.BarID;
The “SET operators” (UNION, UNION ALL, INTERSECT, EXCEPT) act a bit differently.
I’ll let you take a look at them yourself though.
There is a lot that I have yet to learn about SQL Server, in fact that is the main reason that I created this blog; so I could read back on these posts sometime in the future and smile at my ignorance.
Hopefully the main aspect that I take from this post though is that it is okay not to know something as long as you have the desire and the initiative to learn.
Oh, and a Primary Key can be a Foreigh Key too. 🙂