7.5. The IN and NOT..IN PredicatesAlthough SQL Server does not have the MINUS (difference) operator, it does have an IN predicate and its negation, the NOT..IN, which enables us to create differences. Let us look at this predicate from a set point of view. If we find the objects from set A that are not in set B, we have found the difference of set A and B (A - B). 7.5.1. Using INThe following is a simple example of an IN predicate with constants in a SELECT statement:
SELECT sname, class
FROM Student
WHERE class IN (3,4)
In this example, IN (3,4) is called a subquery-set, where (3, 4) is the set in which we are testing membership. This query says: "Find all student names from the Student table where the class is in the set (3, 4)." It produces the following 17 rows of output:
sname class
-------------------- ------
Mary 4
Kelly 4
Donald 4
Chris 4
Jake 4
Susan 3
Monica 3
Phoebe 3
Holly 4
Rachel 3
Jerry 4
Cramer 3
Harrison 4
Francis 4
Losmith 3
Gus 3
Benny 4
(17 row(s) affected)
The preceding query produces the same output as the following query:
SELECT sname, class
FROM Student
WHERE class = 3 OR class = 4
In other words, the IN(3,4) means belonging to either set (3) OR set (4), as shown by the WHERE class = 3 OR class = 4. 7.5.1.1. Using IN as a subqueryWe can expand the IN predicate's subquery-set part to be an actual query. For example, consider the following query:
SELECT Student.sname
FROM Student
WHERE Student.stno IN
(SELECT g.student_number
FROM Grade_report g
WHERE g.grade = 'A')
Note the following about this query:
Note also that you must retrieve the information from the same domains for purposes of union compatibility. The preceding query produces the following 14 rows of output:
sname
--------------------
Lineas
Mary
Brenda
Richard
Lujack
Donald
Lynette
Susan
Holly
Sadie
Jessica
Steve
Cedric
Jerry
(14 row(s) affected)
You could view the preceding query as a result derived from the intersection of the sets A and B, where set A is the set of student numbers in the student set (from the Student table) and set B is the set of student numbers in the grade set (from the Grade_report table) that have As. To make this command behave like a set operator (as if it were an INTERSECT operator), you can add the qualifier DISTINCT to the result set as follows:
SELECT DISTINCT (Student.sname)
FROM Student
WHERE Student.stno IN
(SELECT DISTINCT (g.student_number)
FROM Grade_report g
WHERE g.grade = 'A')
This query produces the following 14 rows of output:
sname
--------------------
Brenda
Cedric
Donald
Holly
Jerry
Jessica
Lineas
Lujack
Lynette
Mary
Richard
Sadie
Steve
Susan
(14 row(s) affected)
Here, SQL Server sorts the results for you and does not return duplicates. 7.5.2. The INTERSECT OperatorFrom a set point of view, an INTERSECT means if we find objects from set A that are also in set B (and vice versa), we have found the intersection of sets A and B. SQL Server has an INTERSECT operator. The following query is the previous query written using an INTERSECT (but we displayed student numbers instead of student names):
SELECT s.stno
FROM Student s
INTERSECT
SELECT g.student_number
FROM Grade_report g
WHERE g.grade = 'A'
This query gives the following 14 rows of output:
stno
-------------
2
3
8
10
14
20
34
49
123
125
126
127
129
142
(14 row(s) affected)
In this query, we had to display student numbers (stno) instead of the student names (sname) because of the set compatibility issue discussed earlier. INTERSECT is a set operator, so the two sets being intersected have to have the same number of columns and the columns have to have compatible data types. Another example of the use of the INTERSECT operator would be, for example, if we wanted to find all the students who had dependents, in which case we could type:
SELECT s.stno
FROM Student s
INTERSECT
SELECT d.pno
FROM Dependent d
This query would give the following 19 rows of output:
stno
------------------
2
10
14
17
20
34
62
123
126
128
132
142
143
144
145
146
147
153
158
(19 row(s) affected)
Though the INTERSECT operator gives us the right answer, in some ways the IN as a subquery (discussed earlier) is better to use, because when SQL Server performs the INTERSECT, it selects sets based on what is mentioned in the SELECT statements. So, for example, if we wanted the student names in addition to the student numbers, and we typed:
SELECT s.stno, s.sname
FROM Student s
INTERSECT
SELECT d.pno, relationship
FROM Dependent d
The query would not work. Here we would have to use an IN with a subquery as discussed earlier:
SELECT s.stno, s.sname
FROM Student AS s
WHERE (s.stno IN
(SELECT pno
FROM Dependent AS d))
giving us the following 19 rows of output:
stno sname
---- ------------------
2 Lineas
10 Richard
14 Lujack
17 Elainie
20 Donald
34 Lynette
62 Monica
123 Holly
126 Jessica
128 Brad
132 George
142 Jerry
143 Cramer
144 Fraiser
145 Harrison
146 Francis
147 Smithly
153 Genevieve
158 Thornton
(19 row(s) affected)
7.5.3. Using NOT..INThe NOT..IN is really a negated IN predicate. If you use the NOT..IN in your query, your query may perform poorly. The reason is that when NOT..IN is used, no indexing can be used, because the NOT..IN part of the query has to test the set with all values to find out what is not in the set. For smaller tables, no difference in performance will likely be detected. Nonetheless, we discuss how to use NOT..IN in this section, to demonstrate the logical negative of the IN predicate, which will help to complete your overall understanding of the SQL language. Instead of using NOT..IN, it is often preferable to use NOT EXISTS or outer join techniques, both of which are discussed later on.
Sometimes the NOT..IN may seem to more easily describe the desired outcome or may be used for a set difference. For a simple example, consider the following query:
SELECT sname, class
FROM Student
WHERE class IN (1,3,4)
This query produces the following 28 rows of output:
sname class
-------------------- ------
Lineas 1
Mary 4
Richard 1
Kelly 4
Lujack 1
Elainie 1
Donald 4
Chris 4
Jake 4
Lynette 1
Susan 3
Monica 3
Hillary 1
Phoebe 3
Holly 4
Steve 1
Brad 1
Rachel 3
George 1
Jerry 4
Cramer 3
Fraiser 1
Harrison 4
Francis 4
Losmith 3
Lindsay 1
Gus 3
Benny 4
(28 row(s) affected)
Contrast the preceding query to the following query:
SELECT sname, class
FROM Student
WHERE class NOT IN (2)
The output in this case is the same as the preceding output because the Student table only has classes 1, 2, 3, and 4. If counts (results) did not "add up," this would show that some value of class was not 1, 2, 3, or 4. As another example, suppose that you want the names of students who are not computer science (COSC) or math (MATH) majors. The query would be:
SELECT sname, major
FROM Student
WHERE major NOT IN ('COSC','MATH')
which produces the following output (28 rows):
sname major
-------------------- -----
Lineas ENGL
Ken POLY
Romona ENGL
Richard ENGL
Harley POLY
Donald ACCT
Chris ACCT
Lynette POLY
Susan ENGL
Bill POLY
Phoebe ENGL
Holly POLY
Jessica POLY
Steve ENGL
Cedric ENGL
Rachel ENGL
George POLY
Cramer ENGL
Fraiser POLY
Harrison ACCT
Francis ACCT
Smithly ENGL
Sebastian ACCT
Losmith CHEM
Genevieve UNKN
Lindsay UNKN
Gus ART
Benny CHEM
(28 row(s) affected)
The example output gave all majors other than COSC and MATH. But you must be very careful with the NOT..IN predicate, because if nulls are present in the data, you may get odd answers with NOT..IN. As an example, consider the following table called Stumajor:
name major
-------------------- --------------------
Mary Biology
Sam Chemistry
Alice Art
Tom NULL
(4 row(s) affected)
If you perform the following query:
SELECT *
FROM Stumajor
WHERE major IN ('Chemistry','Biology')
It produces the following output:
name major
-------------------- --------------------
Mary Biology
Sam Chemistry
(2 row(s) affected)
If you perform the following query:
SELECT *
FROM Stumajor
WHERE major NOT IN ('Chemistry','Biology')
It produces the following output:
name major
-------------------- --------------------
Alice Art
(1 row(s) affected)
The value, null, is not equal to anything. You might expect that NOT..IN would give you <Tom,null>, but it does not. Why? Because nulls in the selection column (here, major) are not matched with a NOT..IN. 7.5.3.1. Using NOT..IN in a subqueryA NOT..IN can also be used in a subquery. For example, assume that we have another table called Instructor, as shown here:
iname teaches
-------------------- --------------------
Richard COSC
Subhash MATH
Tapan BIOCHEM
(3 row(s) affected)
Now, if we want to find all the departments that do not have instructors, we could type the following query:
SELECT *
FROM department_to_major d
WHERE d.dcode NOT IN
(SELECT dcode
FROM department_to_major d, instructor i
WHERE d.dcode=i.teaches)
This query produces the following output (6 rows):
Dcode DNAME
----- --------------------
ACCT Accounting
ART Art
CHEM Chemistry
ENGL English
POLY Political Science
UNKN NULL
(6 row(s) affected)
Note that in this case, the NOT..IN "behaved" correctly and reported the NULL value for DNAME! |