MERGE Statement - UPSERT into SQL Server
Normally, when you want to write data to a table from the application you must first do a SELECT to check if the row exists, if it does exist you execute an UPDATE and if it does not exist you execute an INSERT, which is the standard SELECT-UPDATE-INSERT pattern. Check the below query:
This looks like a pretty
logical flow that reflects how we think about this in real life:
Does a row already exist for this key?
- YES: OK, update that row.
- NO: OK, then add it.
But this is a wrong practice.
Locating the row to confirm it exists, only to have to locate it
again to update it, is doing twice the work for
nothing. Even if the key is indexed (which we hope is always the case). If we
put this logic into a flow chart and associate, at each step, the type of
operation that would have to happen within the database, we would have this:
From
a developer point of view, we really enjoy the freedom, if we have, of making a single call to the database and be able to Insert/Update the data without
having to check if it exists something like in Mongo DB.
Fortunately,
SQL Server 2008 introduced a new function MERGE for handling
Insert/Update in one single query. Let’s explore it.
MERGE in SQL Server
There are five sections of the Merge statement:
- MERGE – specifies the tables in which we will insert or update record
- USING – defines the condition we will be using to check if the row exists or not
- WHEN MATCHED THEN – Update statement when the row exists
- WHEN NOT MATCHED BY TARGET THEN – Insert statement then the row does not exists
- WHEN NOT MATCHED BY SOURCE THEN – Delete statement (conditional)
If we write it up, then the syntax for the Merge statement would
be like below:
First, you
specify the target table and the source table in the MERGE clause.
Second, the
merge_condition determines how the rows from the source table are matched to
the rows from the target table. Typically, you use the key columns either
primary key or unique key for matching.
Third, the
merge_condition results in three states: MATCHED, NOT MATCHED, and NOT MATCHED
BY SOURCE.
MATCHED:
these are the rows that match the merge condition. For the matching rows, you
need to update the rows columns in the target table with values from the source
table.
NOT MATCHED:
these are the rows from the source table that does not have any matching rows
in the target table. In this case, you need to add the rows from the source
table to the target table. Note that NOT MATCHED is also known as NOT MATCHED
BY TARGET.
NOT MATCHED
BY SOURCE: these are the rows in the target table that does not match any rows
in the source table. If you want to synchronize the target table with the data
from the source table, then you will need to use this match condition to delete
rows from the target table.
Ok, enough of the theory so let’s get into the binary language:
For example,
we have two tables
- CyclesCategory
- CyclesCategory_Staging
CyclesCategory
is the Main table (Target table) and we want to merge the data from Staging
table (Source table). See below snapshots:
As it is
clear from the above two snapshots that if we Merge the above tables then
- 2
rows (CategoryID – 5, 6) from Staging table should be inserted as they are not
present in the CyclesCategory table.
- 3
rows (CategoryID – 1, 3, 4) from Staging table should be updated in the CyclesCategory
table because same ID’s exists in the CyclesCategory table.
- 1
row (CategoryID – 2) should be deleted from CyclesCategory table as it is not
present in the Staging table.
Let’s write
the Merge statement:
After
execution of the Merge statement the CyclesCategory would be like below:
As a result
of Merge statement, the data in CyclesCategory table is fully synchronized with
the CyclesCategory_Staging table.
Performance Tip
Use an UPSERT when needing to
follow a SELECT-UPDATE-INSERT pattern. Stick to basic INSERTS, UPDATES, or
DELETES when you know only one operation will be applied.
Comments
Post a Comment