1 Introduction
After the topic of joins, I now come to another very useful operator: Union
With the UNION command you have the possibility to connect several select statements with each other.
2 Preparation
For the following examples, I set up a new database.
SET LANGUAGE ENGLISH
CREATE DATABASE UNION_DB;
USE UNION_DB;
Below I will create 2 example tables that we will continue to work with.
Table 1:
CREATE TABLE Office_1
(ID INT NOT NULL,
Customer_Name VARCHAR(100) NOT NULL,
Visit_Date DATE NOT NULL,
Gender VARCHAR(50) NOT NULL)
;
INSERT INTO Office_1 VALUES (1, 'Mike', '2021-03-01', 'male')
INSERT INTO Office_1 VALUES (2, 'Sven', '2021-02-25', 'male')
INSERT INTO Office_1 VALUES (3, 'Tom', '2021-05-16', 'male')
INSERT INTO Office_1 VALUES (4, 'Sarah', '2021-04-03', 'female')
INSERT INTO Office_1 VALUES (5, 'Alexander', '2021-01-18', 'male')
Table 2:
CREATE TABLE Office_2
(ID INT NOT NULL,
Customer_Name VARCHAR(100) NOT NULL,
Visit_Date DATE NOT NULL,
Gender VARCHAR(50) NOT NULL)
;
INSERT INTO Office_2 VALUES (1, 'Jonny', '2021-05-05', 'male')
INSERT INTO Office_2 VALUES (2, 'Miriam', '2021-03-17', 'female')
INSERT INTO Office_2 VALUES (3, 'Mike', '2021-03-02', 'male')
INSERT INTO Office_2 VALUES (4, 'Alex', '2021-01-22', 'female')
INSERT INTO Office_2 VALUES (5, 'Max', '2021-02-27', 'male')
Let’s have a look at the created tables:
SELECT * FROM Office_1;
SELECT * FROM Office_2;
Here we have two sample data sets about customer visits from two different stores.
3 The SQL UNION Operator
3.1 UNION
As mentioned at the beginning, with UNION we have the opportunity to combine two Select Statements. Here is a simple example:
SELECT *
FROM Office_1
UNION
SELECT *
FROM Office_2
;
The condition for this is that both SELECTS have the same number of columns.
To get the output ordered (for example, by visitor date), we need to put the two SELECTS with the UNION into a subquery.
SELECT *
FROM (SELECT *
FROM Office_1
UNION
SELECT *
FROM Office_2) AS sub
ORDER BY Visit_Date
;
3.2 UNION ALL
So far we have used UNION only with the SELECT ALL (SELECT *). Of course we also have the possibility to select specific columns.
Here again briefly shown the two original tables:
SELECT * FROM Office_1;
SELECT * FROM Office_2;
Now we use UNION in combination with a specific SELECT (here Customer_Name).
SELECT Customer_Name
FROM Office_1
UNION
SELECT Customer_Name
FROM Office_2
;
OK what happened?
The two original tables contained 5 observations each. However, the output with UNION shows only 9 …
The answer is that UNION by default excludes duplicate values. Since Mike occurs in both tables, it was taken over only once.
If you do not want this you need UNION ALL.
SELECT Customer_Name
FROM Office_1
UNION ALL
SELECT Customer_Name
FROM Office_2
;
Now we have all our observations again.
Let’s try this time with the selection of two columns:
SELECT Customer_Name, Visit_Date
FROM Office_1
UNION
SELECT Customer_Name, Visit_Date
FROM Office_2
;
Again we get 10 observations although we only used the UNION (not UNION ALL). The reason is that the Visit_Date is different for Max, which occurs twice.
And now again the output nicely sorted by date via a subquery:
SELECT Customer_Name, Visit_Date
FROM (SELECT *
FROM Office_1
UNION
SELECT *
FROM Office_2) AS sub
ORDER BY Visit_Date
;
3.3 UNION with WHERE
Here I filter all visitors by visitor date greater than or equal to 2021/03/01.
SELECT Customer_Name, Visit_Date
FROM Office_1
WHERE Visit_Date >= '2021-03-01'
UNION
SELECT Customer_Name, Visit_Date
FROM Office_2
WHERE Visit_Date >= '2021-03-01'
;
Or a little more efficient that way:
SELECT Customer_Name, Visit_Date
FROM (SELECT Customer_Name, Visit_Date
FROM Office_1
UNION
SELECT Customer_Name, Visit_Date
FROM Office_2) AS sub
WHERE MONTH(Visit_Date) >= 3
ORDER BY Visit_Date
;
3.4 UNION with GROUP BY
SELECT Customer_Name, Count(*) AS visit_count
FROM (SELECT Customer_Name, Visit_Date
FROM Office_1
UNION
SELECT Customer_Name, Visit_Date
FROM Office_2) AS sub
GROUP BY Customer_Name
ORDER BY visit_count DESC
;
4 Difference between JOIN and UNION
In a nutshell, JOIN is used to bring data from multiple sources into a wide horizontal format, and UNION is used to bring data into a long deep format.
SELECT * FROM Office_1;
SELECT * FROM Office_2;
SELECT t1.ID,
t1.Customer_Name,
t2.Customer_Name
FROM Office_1 AS t1
LEFT JOIN Office_2 AS t2
ON t1.ID = t2.ID
;
SELECT Customer_Name
FROM Office_1
UNION ALL
SELECT Customer_Name
FROM Office_2
;
5 Conclusion
In this post I showed how to use the UNION operator to bring data from multiple sources into one long format. Furthermore I went into the specifications of the UNION command.