Previous Page
Next Page

4.1. The JOIN

In SQL Server , the join is accomplished using the ANSI JOIN SQL syntax (based on ANSI Standard SQL-92), which uses the JOIN keyword and an ON clause. The ANSI JOIN syntax requires the use of an ON clause for specifying how the tables are related. One ON clause is used for each pair of tables being joined. The general form of the ANSI JOIN SQL syntax is:

SELECT columns
FROM table1 JOIN table2
ON table1.column1=table2.column1

The basic idea of a join is as follows: Suppose we have the following two tables, Table 4-1 and Table 4-2.

Table 4-1. The XYZ Table

columnA

columnB

columnC

X1

Y1

Z1

X2

Y2

Z2

X3

Y3

Z3


Table 4-2. The XDE Table

columnA

columnD

columnE

X1

D1

E1

X2

D2

E2

X3

D3

E3


The common column between the two tables (Table 4-1 and Table 4-2) is columnA. So the join would be performed on columnA. A SQL JOIN would give a table where columnA of Table1 = columnA of Table2. This would produce the new table, Table 4-3, the result of the join, as shown below:

Table 4-3. Joining XYZ with XDE

columnA

columnB

columnC

columnA

columnD

columnE

X1

Y1

Z1

X1

D1

E1

X2

Y2

Z2

X2

D2

E2

X3

Y3

Z3

X3

D3

E3


There are several types of joins in SQL. To be precise, the previous model refers to an inner join, where the two tables being joined must share at least one common column. The columns of the two tables being joined by the JOIN command are matched using an ON clause. SQL Server will actually translate the example JOIN statement to an unambiguous INNER JOIN form, as you shall see. When inner-joining two tables, the JOIN returns rows from both tables only if there is a corresponding value in both tables as described by the ON clause column. In other words, the JOIN disregards any rows in which the specific join condition, specified in the ON clause, is not met.

To illustrate the JOIN using our database (Student_course database), we present the following two examples.

4.1.1. Example 1

To find the student names and dependent names of all the students who have dependents, we need to join the Student table with the Dependent table, because the data that we want to display is spread across these two tables. Before we can formulate the JOIN query, we have to examine both tables and find out what relationship exists between the two tables. Usually this relationship is where one table has a column as a primary key and the other table has a column as a foreign key. A primary key is a unique identifier for a row in a table. A foreign key is so called because the key it references is "foreign" to the table where it exists.

Let us first look at the table descriptions of the Student and Dependent tables, shown in Figures 4-1 and 4-2, respectively.

Figure 4-1. Description of Student table


Figure 4-2. Description of Dependent table


In examining these two tables, we note that student number (stno in the Student table) is the primary key of the Student table. stno is the unique identifier for each student. The Dependent table, which was not created with a primary key of its own, contains a reference to the Student table in that for each dependent, a parent number (pno) is recorded. pno in the Dependent table is a foreign keyit represents a primary key from the table it is referencing, Student. pno in the Dependent table is not unique, because a student can have more than one dependent; that is, one stno can be linked to more than one pno.

From the table descriptions, we can see that the Student table (which has columns stno, sname, major, class, and bdate) can be joined with the Dependent table (which has columns pno, dname, relationship, sex, and age) by columns stno from the Student table and pno from the Dependent table. Following the ANSI JOIN syntax, we can join the two tables as follows:

SELECT stno, sname, relationship, age
FROM Student s JOIN Dependent d
ON s.stno=d.pno

In this construction, Student refers to the Student table and s is the table alias of the Student table. Likewise, Dependent refers to the Dependent table and d is the table alias of the Dependent table. The table alias simplifies writing queries or expressions using single-letter table aliases. We very strongly recommend using table aliases in all multi-table queries. This query requests the student number (stno) and student name (sname) from the Student table, and the relationship and age from the Dependent table when the student number in the Student table (stno) matches a parent number (pno) in the Dependent table.

Table aliases were discussed in Chapter 2.


When the previous query is typed and executed, you will get the following output showing the dependents of the students:

stno   sname                relationship age
------ -------------------- ------------ ------
2      Lineas               Son          8
2      Lineas               Daughter     9
2      Lineas               Spouse       31
10     Richard              Son          3
10     Richard              Daughter     5
14     Lujack               Son          1
14     Lujack                            3
17     Elainie              Daughter     4
17     Elainie              Son          1
20     Donald               Son          NULL
20     Donald               Son          6
34     Lynette              Daughter     5
34     Lynette              Daughter     1
62     Monica               Husband      45
62     Monica               Son          14
62     Monica               Daughter     16
62     Monica               Daughter     12
123    Holly                Son          5
123    Holly                Son          2
126    Jessica              Son          6
126    Jessica              Son          1
128    Brad                 Son          1
128    Brad                 Daughter     NULL
128    Brad                 Daughter     2
128    Brad                 Wife         26
132    George               Daughter     6
142    Jerry                Daughter     2
143    Cramer               Daughter     7
144    Fraiser              Wife         22
145    Harrison             Wife         22
146    Francis              Wife         22
147    Smithly              Wife         23
147    Smithly              Son          4
147    Smithly              Son          2
147    Smithly              Son          NULL
153    Genevieve            Daughter     5
153    Genevieve            Daughter     4
153    Genevieve            Son          2
158    Thornton             wife         22
 
 (39 row(s) affected)

4.1.2. Example 2

To find the course names and the prerequisites of all the courses that have prerequisites, we need to join the Prereq table with the Course table. Course names are in the Course table and the Prereq (prerequisites) table contains the relationship of each course to its prerequisite course. The descriptions of the Prereq table and Course tables are shown in Figures 4-3 and 4-4, respectively.

Figure 4-3. Description of Prereq table


Figure 4-4. Description of Course table


From these descriptions, we first note that the Course table has course_number as its primary keythe unique identifier for each course. The Prereq table also contains a course number, but the course number in the Prereq table is not uniquethere are often several prerequisites for any given course. The course number in the Prereq table is a foreign key referencing the primary key of the Course table. The Prereq table (which has columns course_number and prereq) can be joined with the Course table (which has columns course_name, course_number, credit_hours, and offering_dept) by the relationship column in both tables, course_number, as follows:

SELECT *
FROM Course c JOIN Prereq p
ON c.course_number=p.course_number

The same query could be written without the table alias (using a table qualifier) as follows:

SELECT *
FROM Course JOIN Prereq
ON Course.course_number=Prereq.course_number

However, the use of the table alias is so common that the table-alias form should be used. Also, aliases let you select columns that have the same names from the tables. This query will display those rows (12 rows) that have course_number in the Course table equal to course_number in the Prereq table, as follows:

COURSE_NAME          COURSE_NUMBER CREDIT_HOURS OFFERING_DEPT COURSE_NUMBER PREREQ
-------------------- ------------- ------------ ------------- ------------- --------
MANAGERIAL FINANCE   ACCT3333      3            ACCT          ACCT3333     ACCT2220
ORGANIC CHEMISTRY    CHEM3001      3            CHEM          CHEM3001     CHEM2001
DATA STRUCTURES      COSC3320      4            COSC          COSC3320     COSC1310
DATABASE             COSC3380      3            COSC          COSC3380     COSC3320
DATABASE             COSC3380      3            COSC          COSC3380     MATH2410
ADA - INTRODUCTION   COSC5234      4            COSC          COSC5234     COSC3320
ENGLISH COMP II      ENGL1011      3            ENGL          ENGL1011     ENGL1010
FUND. TECH. WRITING  ENGL3401      3            ENGL          ENGL3401     ENGL1011
WRITING FOR NON MAJO ENGL3520      2            ENGL          ENGL3520     ENGL1011
MATH ANALYSIS        MATH5501      3            MATH          MATH5501     MATH2333
AMERICAN GOVERNMENT  POLY2103      2            POLY          POLY2103     POLY1201
POLITICS OF CUBA     POLY5501      4            POLY          POLY5501     POLY4103
 
 (12 row(s) affected)

Rows from the Course table without a matching row in the Prereq table are not included from the JOIN result set. Courses that do not have prerequisites are not in the result set.

A primary key is a column or a minimal set of columns whose values uniquely identify a row in a table. A primary key cannot have a null value. Creation of primary keys is discussed in Chapter 11.


The inner join uses equality in the ON clause (the join condition). When an equal sign is used as a join condition, the join is called an equi-join. The use of equi-joins is so common that many people use the phrase "join" synonymously with "equi-join"; when the term "join" is used without qualification, "equi-join" is inferred.

When dealing with table combinations, specifically joins, it is a good idea to estimate the number of rows one might expect in the result set. To find out how many rows will actually occur in the result set, the COUNT function is used. For example:

SELECT COUNT(*)
FROM Course c JOIN Prereq p
ON c.course_number=p.course_number

will tell us that there are 12 rows in the result set.

In any equi join, let us suppose that the two tables to be joined have X number of rows and Y number of rows respectively. How many rows does one expect in the join? A good guideline is in the order of MAX(X,Y). In our case, we have 12 rows in the Prereq table and 32 rows in the Course table. MAX(12,32) = 32, but we actually got 12 rows. MAX(X,Y) is just a guideline. The actual and expected number of rows need not match exactly. It is possible that some Course-Prereq combinations might be repeated.

4.1.3. The INNER JOIN

In SQL Server, the keyword combination INNER JOIN behaves just like the JOIN discussed in the previous section. The general syntax for the INNER JOIN is:

SELECT columns
FROM table1 INNER JOIN table2
ON table1.column1=table2.column1

Using the INNER JOIN, the JOIN query presented in the previous section also could be written as:

SELECT *
FROM Course INNER JOIN Prereq
ON Course.course_number=Prereq.course_number

And, this query too, would produce the same results as given in the previous section.

As with the JOIN, the INNER JOIN cannot be used without the ON clause.


4.1.4. Using a WHERE Clause Instead of a JOIN

Another way of joining tables in SQL Server is to use a WHERE clause instead of using the JOIN or INNER JOIN command. According to the SQL-92 standard, the inner join can be specified either with the JOIN/INNER JOIN construction or with a WHERE clause. To perform a join with a WHERE clause, the tables to be joined are listed in the FROM clause of a SELECT statement, and the "join condition" between the tables to be joined is specified in the WHERE clause.

The JOIN from the preceding section could be written with a WHERE clause as follows:

SELECT *
FROM Course c, Prereq p
WHERE c.course_number= p.course_number

This command will display the same 12 rows as was previously shown (when the JOIN was used). You will soon see one of the reasons it is better not to use WHERE.

4.1.5. Associative Property of the JOIN

When two tables are being joined, it does not matter whether TableA is joined with TableB, or TableB is joined with TableA. For example, the following two queries would essentially give the same result set (output):

SELECT *
FROM Course c JOIN Prereq p
ON c.course_number=p.course_number

and:

SELECT *
FROM Prereq p JOIN Course c
ON p.course_number=c.course_number

The only difference in the two result sets would be the order of the columns . But the result set column order can be controlled by listing out the columns in the order that you want them after the SELECT instead of using the SELECT * syntax.

4.1.6. Column Types in Joins

Joins have to be performed on "compatible" columns; that is, a character column may be joined to another character column, a numeric column may be joined to another numeric column, and so forth. So, for example, a CHAR column can be joined to a VARCHAR column (both being character columns), or an INT column can be joined to a REAL column (both being numeric columns). Having made the point that compatible columns are required, and keeping in mind that SQL is not logical, it is up to the programmer to match semantics. In reality, why would you join two tables unless a relationship existed? If you ask SQL to join a job_title column with a last_name column, it will try to do so even though it makes no sense!

Some columns types--for example, IMAGE--cannot be joined, as these columns will generally not contain "like" columns. Joins cannot be operated on binary data types.

4.1.7. Performance Hint for Efficient Joins

Join on the narrowest columns possible. The narrower the column, the less storage space is used by SQL Server, and SQL Server can read and write the data faster.


Previous Page
Next Page