Section 7.1. Introducing Set Operations

7.1. Introducing Set Operations

A set is a collection of objects. In relational databases, a table can be regarded as a set of rows. Elements in a set do not have to be ordered. In relational databases, rows do not have to be ordered as they are entered or stored. Set operations are used in SQL to retrieve data from multiple sets, and include a binary union, binary intersection and binary set difference . A result set is obtained in SQL from the result of a SELECT.

A binary union is a set operation on two sets, the result of which contains all the elements of both sets. A binary intersection generates values in common between two sets. And, a binary set difference generates values in one set less those contained in another set.

Three explicit set operations are used in SQL: UNION, INTERSECT, and MINUS (for set difference). SQL Server allows the explicit use of the UNION and INTERSECT operations. Because the MINUS set operation cannot be explicitly used in SQL Server , we will illustrate the MINUS operation by using the very common IN predicate and its negation, NOT..IN, which enable us to accomplish the same result as using INTERSECT and MINUS.

The format of a set statement is as follows:

   set OPERATOR set

where OPERATOR is a UNION, INTERSECT or MINUS, and where "set" is defined by a SELECT.

First we will discuss the UNION operator; the INTERSECT operator will be discussed later in the chapter.

The following is the syntax for a general form of an UNION:

    SELECT *
    FROM   TableA
    UNION
    SELECT *
    FROM TableB

Set statements allow us to combine two distinct sets of data (two result sets) only if we insure union compatibility, as explained in the next section.

7.1.1. Union Compatibility

Union compatibility, the commonly used SQL terminology for set compatibility, means that when using set operations, the two sets (in this case, the results of two SELECTs) being unioned have to have the same number of similar columns and the columns have to have compatible data types. Next we will explain what compatible data types means, and we will return to the issue of "similar" columns in a later section.

So what does "compatible" data types mean? The data types of the columns of the two sets being unioned do not necessarily have to be exactly the same, meaning that they may differ in length and even type, but they have to be "well-matched." For union compatibility, the three basic data types are numeric, string, and dates. All numeric columns are compatible with one another, all string columns are compatible with one another, and all date columns are compatible with one another. For numbers, SQL will convert integers, floating-point numbers, and decimals into a numeric data type, to make them compatible with one another. So any numeric column (for example, integers) can be unioned with any other numeric column (for example, decimals). Likewise, any fixed-length character column and any variable-length character column will be converted to a character data type, and take on the larger size of the character columns being unioned. Similarly, date columns will be combined to a date data type.

For union compatibility, the three basic data types are numeric, string, and dates.

Union compatibility can happen in several ways:

By unioning two tables or views that have identical columns (which implies the same domains as well).
By taking two subsets from a table and combining them.
By using two views from two tables respectively with the columns chosen so that they are compatible.

For the data type precedence rules, refer to the "Data Type Precedence" section in Chapter 6.