Section 10.2. Correlated Subqueries

10.2. Correlated Subqueries

As stated at the beginning of the chapter, a correlated subquery is an inner subquery whose information is referenced by the main, outer query such that the inner query may be thought of as being executed repeatedly.

Correlated subqueries present a different execution scenario to the database manipulation language (DML) than do ordinary, noncorrelated subqueries. The correlated subquery cannot stand alone, as it depends on the outer query; therefore, completing the subquery prior to execution of the outer query is not an option. The efficiency of the correlated subquery varies; it may be worthwhile to test the efficiency of correlated subqueries versus joins or sets.

One situation in which you cannot avoid correlation is the "for all" query, which is discussed later in this chapter.

To illustrate how a correlated subquery works, the following is an example of the non-correlated subquery from the previous section revised as a correlated subquery:

SELECT  s.sname
FROM    Student s
WHERE   s.stno IN
  (SELECT    gr.student_number
   FROM      Grade_report gr
   WHERE     gr.student_number = s.stno
   AND       gr.grade = 'A')

This query produces the following output (14 rows), which is the same as the output of the noncorrelated subquery (shown earlier):

sname
--------------------
Lineas
Mary
Brenda
Richard
Lujack
Donald
Lynette
Susan
Holly
Sadie
Jessica
Steve
Cedric
Jerry
 
(14 row(s) affected)

In this example, the inner query (the part in parentheses) references the outer oneobserve the use of s.stno in the WHERE clause of the inner query. Rather than thinking of this query as creating a set of student numbers that have As, each row from the outer query can be considered to be SELECTed individually and tested against all rows of the inner query one at a time until it is determined whether a given student number is in the inner set and whether that student earned an A.

This query was illustrated with and without correlation. You might think that a correlated subquery is less efficient than doing a simple subquery, because the simple subquery is done once, whereas the correlated subquery is done once for each outer row. However, the internal handling of how the query executes depends on the SQL and the optimizer for that database engine.

The correlated subquery acts like a nested DO loop in a programming language, where the first row from the Student table is SELECTed and tested against all the rows in the Grade_report table, and then the second Student row is SELECTed and tested against all rows in the Grade_report table. The following is the DO loop in pseudocode:

LOOP1: For each row in Student  s  DO
      LOOP2: For each row in Grade_report  gr  DO
             IF (gr.student_number = s.stno) THEN
                   IF (gr.grade = 'B') THEN TRUE
      END LOOP2;
     IF TRUE, THEN Student row is SELECTed
END LOOP1