10.3. Existence Queries and CorrelationCorrelated queries are often written so that the question in the inner query is one of existence. For example, suppose you want to find the names of students who have taken a computer science (COSC) class and have earned a grade of B in that course. This query can be written in several ways. For example, you can write it as a noncorrelated subquery as follows: SELECT s.sname FROM Student s WHERE s.stno IN (SELECT gr.student_number FROM Grade_report gr, Section WHERE Section.section_id = gr.section_id AND Section.course_num LIKE 'COSC%' AND gr.grade = 'B') This query produces the following output (17 rows): sname -------------------- Lineas Mary Brenda Lujack Reva Harley Chris Lynette Hillary Phoebe Holly George Cramer Fraiser Francis Lindsay Stephanie (17 row(s) affected) You can think of this query as first forming the set of student numbers of students who have made Bs in COSC coursesthe inner query result set. In the inner query, you must have both the Grade_report table (for the grades) and the Section table (for the course numbers). Once you form this set of student numbers (by completing the inner query), the outer query looks through the Student table and SELECTs only those students who are in the inner query set.
Had we chosen to write the query with an unnecessary correlation, it might look like this: SELECT s.sname FROM Student s WHERE s.stno IN (SELECT gr.student_number FROM Grade_report gr, Section WHERE Section.section_id = gr.section_id AND Section.course_num LIKE 'COSC%' AND gr.student_number = s.stno AND gr.grade = 'B') The output of this query would be the same as the previous query. In this case, the use of the Student table in the subquery is unnecessary. Although correlation is unnecessary, this example is included to show the following:
First, let's look at situations in which the correlation of a subquery is necessary, and introduce a new predicate: EXISTS. 10.3.1. Using EXISTSIn situations in which the correlation of a subquery is necessary, you can write the correlated subquery with the EXISTS predicate, which looks like this: SELECT s.sname FROM Student s WHERE EXISTS (SELECT 1 FROM Grade_report gr, Section WHERE Section.section_id = gr.section_id AND Section.course_num LIKE 'COSC%' AND gr.student_number = s.stno AND gr.grade = 'B') The output of this query would be the same as the output (17 rows) of both of the previous queries. Let's dissect this query. The EXISTS predicate says, "Choose the row from the Student table in the outer query if the subquery is true (that is, if a row in the subquery exists that satisfies the condition in the subquery WHERE clause)." Because no actual result set is formed, "SELECT 1" is used as a "dummy" result set to indicate that the subquery is true (1 is returned) or false (no rows are returned). In the noncorrelated case, we tied the student number in the Student table to the inner query by the IN predicate as follows: SELECT s.stno FROM Student s WHERE s.stno IN (SELECT "student number ...) When using the EXISTS predicate, we do not use any column of the Student table, but rather are seeking only to find whether the subquery WHERE can be satisfied. We have indicated that we are using EXISTS with (SELECT 1...). Using the EXISTS predicate, the subquery does not form a result set per se, but rather causes EXISTS to returns true or false. The use of SELECT * in the inner query is common among SQL programmers. However, from an "internal" standpoint, SELECT * causes the SQL engine to check the data dictionary unnecessarily. As the actual result of the inner query is not important, it is strongly suggested that you use SELECT 'X' (or SELECT 1 ...) instead of SELECT * ... so that a constant is SELECTed instead of some "sensible" entry. The SELECT 'X' .. or SELECT 1 ... is simply more efficient. In the EXISTS case, we do not specify any columns to be SELECTed in the inner query's result set; rather, we use a dummy result--SELECT 'X' (or we could use SELECT 1). If the subquery WHERE is satisfied, it returns true, and if the inner query is not satisfied, it selects nothing, then the subquery returns false. The EXISTS predicate forces us to correlate the query. To illustrate that correlation is usually necessary with EXISTS, consider the following query: SELECT s.sname FROM Student s WHERE EXISTS (SELECT 'X' FROM Grade_report gr, Section t WHERE t.section_id = gr.section_id AND t.course_num LIKE 'COSC%' AND gr.grade = 'B') This query produces 48 rows of output (of which we show the first 20 rows): sname -------------------- Lineas Mary Zelda Ken Mario Brenda Romona Richard Kelly Lujack Reva Elainie Harley Donald Chris Jake Lynette Susan Monica Bill. . . (48 row(s) affected) This query uses EXISTS, but has no correlation. This syntax infers that for each student row, we test the joined Grade_report and Section tables to see whether there is a course number like COSC and a grade of B (which, of course, there is). We unnecessarily ask the subquery question over and over again. The result from this latter, uncorrelated EXISTS query is the same as the following: SELECT s.sname FROM Student s The point is that the correlation is usually necessary when we use EXISTS. Consider another example in which a correlation could be used. Suppose that we want to find the names of all students who have three or more Bs. A first pass at a query might be something like this: SELECT s.sname FROM Student s WHERE "something" IN (SELECT "something" FROM Grade_report WHERE "count of grade = 'B'" > 2) This query can be done with a HAVING clause, as you saw previously (Chapter 9), but we want to show how to do this in yet another way. Suppose we arrange the subquery to use the student number (stno) from the Student table as a filter and count in the subquery only when a row in the Grade_report table correlates to that student. The query (this time with an implied EXISTS) looks like this: SELECT s.sname FROM Student s WHERE 2 < (SELECT COUNT(*) FROM Grade_report gr WHERE gr.student_number = s.stno AND gr.grade = 'B') which results in the following output (8 rows): sname -------------------- Lineas Mary Lujack Reva Chris Hillary Phoebe Holly (8 row(s) affected) Although there is no EXISTS in this query, it is implied. The syntax of the query does not allow an EXISTS, but the sense of the query is "WHERE EXISTS a COUNT of 2 which is less than..." In this correlated subquery, we have to examine the Grade_report table for each member of the Student table to see whether the student has more than two Bs. We test the entire Grade_report table for each student row in the outer query. If it were possible, a subquery without the correlation would be more desirable, because it would appear simpler to understand. The overall query might be as follows: SELECT s.sname FROM Student s WHERE s.stno IN (subquery that defines a set of students who have made 3 Bs) Therefore, we might attempt to write the following query: SELECT s.sname FROM Student s WHERE s.stno IN (SELECT gr.student_number FROM Grade_report gr WHERE gr.grade = 'B') However, as the following output (27 rows) shows, this query would give us only students who earned at least one B: sname -------------------- Lineas Mary Zelda Ken Mario Brenda Kelly Lujack Reva Harley Chris Lynette Hillary Phoebe Holly Sadie Jessica Steve Cedric George Cramer Fraiser Francis Smithly Sebastian Lindsay Stephanie (27 row(s) affected) To get a list of students who have earned at least three Bs, we could try the following query: SELECT s.sname FROM Student s WHERE s.stno IN (SELECT gr.student_number, COUNT(*) FROM Grade_report gr WHERE gr.grade = 'B' GROUP BY gr.student_number HAVING COUNT(*) > 2) However, this approach does not work, because the subquery cannot have two columns in its result set unless the main query has two columns in the WHERE .. IN. Here, the subquery must have only gr.student_number to match s.stno. So, we might try to construct an inline view, as shown in the following query: SELECT s.sname FROM Student s WHERE s.stno IN (SELECT vi.student_number FROM (SELECT student_number, ct = COUNT(*) FROM Grade_report gr WHERE gr.grade = 'B' GROUP BY student_number HAVING COUNT(*) > 2) AS vi) This is an example of the inline view, discussed in Chapter 6. This query succeeds in SQL Server , producing the following output (8 rows): sname -------------------- Lineas Mary Lujack Reva Chris Hillary Phoebe Holly (8 row(s) affected)
As you can see, several ways exist to query the database with SQL. In this case, the correlated subquery may be the easiest to see and perhaps the most efficient. 10.3.2. From IN to EXISTSA simple example of converting from IN to EXISTS--uncorrelated to correlated (or vice versa)--would be to move the set test in the WHERE .. IN of the uncorrelated subquery to the WHERE of the EXISTS in the correlated query. As an example, consider the following uncorrelated subquery: SELECT * FROM Student s WHERE s.stno IN (SELECT g.student_number FROM Grade_report g WHERE grade = 'B') The following is the same query written as a correlated subquery: SELECT * FROM Student s WHERE EXISTS (SELECT g.student_number FROM Grade_report g WHERE grade = 'B' AND s.stno = g.student_number) This query produces 27 rows of output (of which we show the first 15 rows): STNO SNAME MAJOR CLASS BDATE ------ -------------------- ----- ------ ----------------------- 2 Lineas ENGL 1 1980-04-15 00:00:00 3 Mary COSC 4 1978-07-16 00:00:00 5 Zelda COSC NULL 1978-02-12 00:00:00 6 Ken POLY NULL 1980-07-15 00:00:00 7 Mario MATH NULL 1980-08-12 00:00:00 8 Brenda COSC 2 1977-08-13 00:00:00 13 Kelly MATH 4 1980-08-12 00:00:00 14 Lujack COSC 1 1977-02-12 00:00:00 15 Reva MATH 2 1980-06-10 00:00:00 19 Harley POLY 2 1981-04-16 00:00:00 24 Chris ACCT 4 1978-02-12 00:00:00 34 Lynette POLY 1 1981-07-16 00:00:00 121 Hillary COSC 1 1977-07-16 00:00:00 122 Phoebe ENGL 3 1980-04-15 00:00:00 123 Holly POLY 4 1981-01-15 00:00:00. . . (27 row(s) affected) This example gives you a pattern to move from one kind of query to the other kind and to test the efficiency of both kinds of queries. Both of the preceding queries should produce the same output. 10.3.3. NOT EXISTSAs with the IN predicate, which has a NOT IN compliment, EXISTS may also be used with NOT. In some situations, the predicates EXISTS and NOT EXISTS are vital. For example, if we ask a "for all" question, it must be answered by "existence"--actually, the lack thereof (that is, "not existence"). In logic, the statement, "find x for all y" is logically equivalent to "do not find x where there does not exist a y." Or, there is no x for no y. Or, you cannot find an x when there is no y. In SQL, there is no "for all" predicate. Instead, SQL uses the idea of "for all" logic with NOT EXISTS. (A word of caution, howeverSQL is not simply a logic exercise, as you will see.) In this section, we look at how EXISTS and NOT EXISTS work in SQL. In the following section, we address the "for all" problem. Consider the following query: SELECT s.sname FROM Student s WHERE EXISTS (SELECT 'X' FROM Grade_report gr WHERE s.stno = gr.student_number AND gr.grade = 'C') which produces the following output (24 rows): sname -------------------- Zelda Ken Mario Brenda Richard Reva Donald Jake Susan Monica Bill Sadie Jessica Steve Alan Rachel Smithly Sebastian Losmith Genevieve Thornton Gus Benny Lionel (24 row(s) affected) For this correlated subquery, "student names" are SELECTed when:
Both statements must be true for the student row to be SELECTed. Recall that we use SELECT 1 or SELECT 'X' in our inner query, because we want the subquery to return something if the subquery is true. The actual value of the "something" does not matter. true means something is returned; false means nothing was returned from the subquery. Therefore, SELECT .. EXISTS "says" SELECT .. WHERE true. The inner query is true if any row is SELECTed in the inner query. Now consider the preceding query with a NOT EXISTS in it instead of EXISTS for students who do not have a grade of C: SELECT s.sname FROM Student s WHERE NOT EXISTS (SELECT 'X' FROM Grade_report gr WHERE s.stno = gr.student_number AND gr.grade = 'C') This query produces the following output (24 rows): sname -------------------- Lineas Mary Romona Kelly Lujack Elainie Harley Chris Lynette Smith Hillary Phoebe Holly Brad Cedric George Jerry Cramer Fraiser Harrison Francis Lindsay Stephanie Jake (24 row(s) affected) In this query, we are still SELECTing with the pattern SELECT .. WHERE true because all SELECTs with EXISTS work that way. But, the twist is that the subquery has to be false to be SELECTed with NOT EXISTS. If the subquery is false, then NOT EXISTS is true and the outer row is SELECTed. Now, logic implies that if either s.stno <> gr.student_number or gr.grade <> 'C', then the subquery "fails"--that is, it is false for that student row. As the subquery is false, the NOT EXISTS would return a TRue for that row. Unfortunately, this logic is not quite what happens. Recall that we characterized the correlated subquery as follows: LOOP1: For each row in Student s DO LOOP2: For each row in Grade_report DO IF (gr.student_number = s.stno) THEN IF (gr.grade = 'C') THEN TRUE END LOOP2; IF TRUE, THEN student row is SELECTed END LOOP1 Note that LOOP2 is completed before the next student is tested. In other words, just because a student number exists that is not equal, it will not cause the subquery to be false. Rather, the entire subquery table is parsed and the logic is more like this: For the case .. WHERE EXISTS s.stno = gr.student_number ..., is there a gr.grade = 'C'? If, when the student numbers are equal, no C can be found, then the subquery returns no rowsit is false for that student row. So, with NOT EXISTS, we will SELECT students who have student numbers equal in the Grade_report and Student tables, but who have no C in the Grade_report table. The point about "no C in the Grade_report table" can be answered true only by looking at all the rows in the inner query and finding no C for that student. ![]() |