Worldscope

PostgreSQL Union

Palavras-chave:

Publicado em: 05/08/2025

PostgreSQL UNION: Combining Result Sets

The `UNION` operator in PostgreSQL allows you to combine the result sets of two or more `SELECT` statements into a single result set. This article will guide you through the fundamentals of using `UNION`, provide practical examples, and discuss its performance implications and alternatives.

Fundamental Concepts / Prerequisites

Before diving into the `UNION` operator, you should have a basic understanding of SQL queries, specifically the `SELECT` statement. Familiarity with database tables, columns, and data types is also essential. You should also have PostgreSQL installed and be able to connect to a database.

Core Implementation/Solution

The `UNION` operator combines the results of two or more `SELECT` statements, removing duplicate rows. The `UNION ALL` operator combines the results without removing duplicates. Here's an example:


-- Create two sample tables
CREATE TABLE employees_us (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    department VARCHAR(50)
);

CREATE TABLE employees_eu (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    department VARCHAR(50)
);

-- Insert some sample data
INSERT INTO employees_us (name, department) VALUES
('Alice', 'Sales'),
('Bob', 'Marketing'),
('Charlie', 'Engineering');

INSERT INTO employees_eu (name, department) VALUES
('David', 'Sales'),
('Eve', 'Engineering'),
('Alice', 'Sales');

-- Using UNION to combine the results
SELECT name, department FROM employees_us
UNION
SELECT name, department FROM employees_eu;

-- Using UNION ALL to combine the results (including duplicates)
SELECT name, department FROM employees_us
UNION ALL
SELECT name, department FROM employees_eu;

Code Explanation

The code first creates two tables, `employees_us` and `employees_eu`, each with an `id`, `name`, and `department` column. We then insert sample data into both tables. Notice that 'Alice' in the 'Sales' department appears in both tables.

The first `SELECT` statement uses `UNION` to combine the results. The `UNION` operator automatically removes duplicate rows, so 'Alice' in 'Sales' will only appear once in the combined result.

The second `SELECT` statement uses `UNION ALL` to combine the results. `UNION ALL` does *not* remove duplicates, so 'Alice' in 'Sales' will appear twice in the combined result.

**Important Considerations:** The `SELECT` statements in a `UNION` or `UNION ALL` must have the same number of columns, and the corresponding columns must have compatible data types. The column names in the resulting combined result set will be derived from the column names in the first `SELECT` statement.

Complexity Analysis

The time complexity of `UNION` and `UNION ALL` operations depends heavily on the size of the input result sets and the presence of indexes.

  • UNION ALL: The time complexity of `UNION ALL` is typically O(m + n), where m and n are the number of rows in the two input result sets. This is because it simply appends the result sets together without checking for duplicates.
  • UNION: The time complexity of `UNION` is generally O((m + n) log(m + n)) due to the need to sort the combined result set to identify and remove duplicates. In the worst-case scenario, where all rows are unique, the sorting dominates the runtime. The complexity can be reduced with optimized database implementations and indexes.

The space complexity for both `UNION` and `UNION ALL` is O(m + n), as they need to store the combined result set in memory (or on disk if the result set is too large to fit in memory).

Alternative Approaches

One alternative approach to achieving similar results to `UNION` is using the `OR` operator in a single `SELECT` statement with `WHERE` clauses. However, this approach can become cumbersome and less readable when dealing with more than two tables or complex conditions. Furthermore, the `OR` operator doesn't inherently remove duplicates, requiring an additional `DISTINCT` clause.


-- Example using OR operator (less efficient and harder to read for multiple tables)
SELECT name, department FROM employees_us
WHERE department = 'Sales'
OR EXISTS (SELECT 1 FROM employees_eu WHERE employees_eu.name = employees_us.name AND employees_eu.department = employees_us.department);

The `OR` approach also will not automatically remove duplicates, so a `DISTINCT` keyword might be required.

Conclusion

The `UNION` and `UNION ALL` operators are powerful tools in PostgreSQL for combining result sets from multiple queries. `UNION` removes duplicate rows, while `UNION ALL` preserves them. Understanding the performance implications and considering alternative approaches is crucial for writing efficient and maintainable SQL code. Choose the appropriate operator based on your specific needs and data characteristics.