MERGE Statement - UPSERT into SQL Server

July 13, 2019

Normally, when you want to write data to a table from the application you must first do a SELECT to check if the row exists, if it does exist you execute an UPDATE and if it does not exist you execute an INSERT, which is the standard SELECT-UPDATE-INSERT pattern. Check the below query:

This looks like a pretty logical flow that reflects how we think about this in real life:

Does a row already exist for this key?

YES: OK, update that row.
NO: OK, then add it.

But this is a wrong practice.

Locating the row to confirm it exists, only to have to locate it again to update it, is doing twice the work for nothing. Even if the key is indexed (which we hope is always the case). If we put this logic into a flow chart and associate, at each step, the type of operation that would have to happen within the database, we would have this:

From a developer point of view, we really enjoy the freedom, if we have, of making a single call to the database and be able to Insert/Update the data without having to check if it exists something like in Mongo DB.

Fortunately, SQL Server 2008 introduced a new function MERGE for handling Insert/Update in one single query. Let’s explore it.

MERGE in SQL Server

There are five sections of the Merge statement:

MERGE – specifies the tables in which we will insert or update record
USING – defines the condition we will be using to check if the row exists or not
WHEN MATCHED THEN – Update statement when the row exists
WHEN NOT MATCHED BY TARGET THEN – Insert statement then the row does not exists
WHEN NOT MATCHED BY SOURCE THEN – Delete statement (conditional)

If we write it up, then the syntax for the Merge statement would be like below:

First, you specify the target table and the source table in the MERGE clause.

Second, the merge_condition determines how the rows from the source table are matched to the rows from the target table. Typically, you use the key columns either primary key or unique key for matching.

Third, the merge_condition results in three states: MATCHED, NOT MATCHED, and NOT MATCHED BY SOURCE.

MATCHED: these are the rows that match the merge condition. For the matching rows, you need to update the rows columns in the target table with values from the source table.

NOT MATCHED: these are the rows from the source table that does not have any matching rows in the target table. In this case, you need to add the rows from the source table to the target table. Note that NOT MATCHED is also known as NOT MATCHED BY TARGET.

NOT MATCHED BY SOURCE: these are the rows in the target table that does not match any rows in the source table. If you want to synchronize the target table with the data from the source table, then you will need to use this match condition to delete rows from the target table.

Ok, enough of the theory so let’s get into the binary language:

For example, we have two tables

CyclesCategory
CyclesCategory_Staging

CyclesCategory is the Main table (Target table) and we want to merge the data from Staging table (Source table). See below snapshots:

As it is clear from the above two snapshots that if we Merge the above tables then

2 rows (CategoryID – 5, 6) from Staging table should be inserted as they are not present in the CyclesCategory table.
3 rows (CategoryID – 1, 3, 4) from Staging table should be updated in the CyclesCategory table because same ID’s exists in the CyclesCategory table.
1 row (CategoryID – 2) should be deleted from CyclesCategory table as it is not present in the Staging table.

Let’s write the Merge statement:

After execution of the Merge statement the CyclesCategory would be like below:

As a result of Merge statement, the data in CyclesCategory table is fully synchronized with the CyclesCategory_Staging table.

Performance Tip

Use an UPSERT when needing to follow a SELECT-UPDATE-INSERT pattern. Stick to basic INSERTS, UPDATES, or DELETES when you know only one operation will be applied.

Search This Blog

Microsoft Technologies Solutions