4 min read

Union

1 Introduction

After the topic of joins, I now come to another very useful operator: Union

With the UNION command you have the possibility to connect several select statements with each other.

2 Preparation

For the following examples, I set up a new database.

SET LANGUAGE ENGLISH


CREATE DATABASE UNION_DB;

USE UNION_DB;

Below I will create 2 example tables that we will continue to work with.

Table 1:

CREATE TABLE Office_1
    (ID INT NOT NULL,
     Customer_Name VARCHAR(100) NOT NULL,
     Visit_Date DATE NOT NULL,
     Gender VARCHAR(50) NOT NULL)
    ;

INSERT INTO Office_1 VALUES (1, 'Mike', '2021-03-01', 'male')
INSERT INTO Office_1 VALUES (2, 'Sven', '2021-02-25', 'male')
INSERT INTO Office_1 VALUES (3, 'Tom', '2021-05-16', 'male')
INSERT INTO Office_1 VALUES (4, 'Sarah', '2021-04-03', 'female')
INSERT INTO Office_1 VALUES (5, 'Alexander', '2021-01-18', 'male')

Table 2:

CREATE TABLE Office_2
    (ID INT NOT NULL,
     Customer_Name VARCHAR(100) NOT NULL,
     Visit_Date DATE NOT NULL,
     Gender VARCHAR(50) NOT NULL)
    ;

INSERT INTO Office_2 VALUES (1, 'Jonny', '2021-05-05', 'male')
INSERT INTO Office_2 VALUES (2, 'Miriam', '2021-03-17', 'female')
INSERT INTO Office_2 VALUES (3, 'Mike', '2021-03-02', 'male')
INSERT INTO Office_2 VALUES (4, 'Alex', '2021-01-22', 'female')
INSERT INTO Office_2 VALUES (5, 'Max', '2021-02-27', 'male')

Let’s have a look at the created tables:

SELECT * FROM Office_1;
SELECT * FROM Office_2;

Here we have two sample data sets about customer visits from two different stores.

3 The SQL UNION Operator

3.1 UNION

As mentioned at the beginning, with UNION we have the opportunity to combine two Select Statements. Here is a simple example:

SELECT *
    FROM Office_1
UNION
SELECT * 
    FROM Office_2
    ;

The condition for this is that both SELECTS have the same number of columns.

To get the output ordered (for example, by visitor date), we need to put the two SELECTS with the UNION into a subquery.

SELECT *
    FROM (SELECT *
            FROM Office_1
          UNION
          SELECT * 
            FROM Office_2) AS sub
    ORDER BY Visit_Date
    ;

3.2 UNION ALL

So far we have used UNION only with the SELECT ALL (SELECT *). Of course we also have the possibility to select specific columns.

Here again briefly shown the two original tables:

SELECT * FROM Office_1;
SELECT * FROM Office_2;

Now we use UNION in combination with a specific SELECT (here Customer_Name).

SELECT Customer_Name
    FROM Office_1
UNION
SELECT Customer_Name
    FROM Office_2
    ;

OK what happened?

The two original tables contained 5 observations each. However, the output with UNION shows only 9 …

The answer is that UNION by default excludes duplicate values. Since Mike occurs in both tables, it was taken over only once.

If you do not want this you need UNION ALL.

SELECT Customer_Name
    FROM Office_1
UNION ALL
SELECT Customer_Name
    FROM Office_2
    ;

Now we have all our observations again.

Let’s try this time with the selection of two columns:

SELECT Customer_Name, Visit_Date
    FROM Office_1
UNION
SELECT Customer_Name, Visit_Date
    FROM Office_2
    ;

Again we get 10 observations although we only used the UNION (not UNION ALL). The reason is that the Visit_Date is different for Max, which occurs twice.

And now again the output nicely sorted by date via a subquery:

SELECT Customer_Name, Visit_Date
    FROM (SELECT *
            FROM Office_1
          UNION
          SELECT * 
            FROM Office_2) AS sub
    ORDER BY Visit_Date
    ;

3.3 UNION with WHERE

Here I filter all visitors by visitor date greater than or equal to 2021/03/01.

SELECT Customer_Name, Visit_Date
    FROM Office_1
    WHERE Visit_Date >= '2021-03-01'
UNION
SELECT Customer_Name, Visit_Date
    FROM Office_2
    WHERE Visit_Date >= '2021-03-01'
    ;

Or a little more efficient that way:

SELECT Customer_Name, Visit_Date
    FROM (SELECT Customer_Name, Visit_Date
            FROM Office_1
          UNION
          SELECT Customer_Name, Visit_Date 
            FROM Office_2) AS sub
    WHERE MONTH(Visit_Date) >= 3
    ORDER BY Visit_Date
    ;

3.4 UNION with GROUP BY

 SELECT Customer_Name, Count(*) AS visit_count
    FROM (SELECT Customer_Name, Visit_Date
            FROM Office_1
          UNION
          SELECT Customer_Name, Visit_Date 
            FROM Office_2) AS sub
    GROUP BY Customer_Name
    ORDER BY visit_count DESC
    ;

4 Difference between JOIN and UNION

In a nutshell, JOIN is used to bring data from multiple sources into a wide horizontal format, and UNION is used to bring data into a long deep format.

SELECT * FROM Office_1;
SELECT * FROM Office_2;

SELECT t1.ID,
       t1.Customer_Name,
       t2.Customer_Name
    FROM Office_1 AS t1
    LEFT JOIN Office_2 AS t2
        ON t1.ID = t2.ID
    ;

SELECT Customer_Name
    FROM Office_1
UNION ALL
SELECT Customer_Name
    FROM Office_2
    ;

5 Conclusion

In this post I showed how to use the UNION operator to bring data from multiple sources into one long format. Furthermore I went into the specifications of the UNION command.