Previous Page
Next Page

9.5. Auditing in Subqueries

In this section, we consider a potential problem of using aggregation with subqueries. As with Cartesian products and joins, aggregation hides details and should always be audited. The two tables that follow will be used to illustrate this problem.

Table 9-2 is similar to the Grade_report table and contains a student section identifier (ssec), grades (gd), and student names (sname).

Table 9-2. GG table

ssec        gd   sname
----------- ---- ------------
100         A    Brenda
110         B    Brenda
120         A    Brenda
200         A    Brenda
210         A    Brenda
220         B    Brenda
100         A    Richard
100         B    Doug
200         A    Richard
110         B    Morris
 
(10 row(s) affected)


Tables 9-2 and 9-3 (GG and SS) have not been created for you. You have to create them (and insert the records shown) and then run the queries that follow.


Table 9-3 is similar to the Section table and contains a section identifier (sec) and an instructor name (iname).

Table 9-3. SS table

sec         iname
----------- ------------
100         Jones
110         Smith
120         Jones
200         Adams
210         Jones
 
(5 row(s) affected)


Now suppose that you want to find out how many As each instructor awarded. You might start with a join of the GG and SS tables. A normal equi-join would be as follows:

SELECT *
FROM   GG, SS
WHERE  GG.ssec = SS.sec

This query would produce the following output (nine rows):

ssec        gd   sname        sec         iname
----------- ---- ------------ ----------- ------------
100         A    Brenda       100         Jones
110         B    Brenda       110         Smith
120         A    Brenda       120         Jones
200         A    Brenda       200         Adams
210         A    Brenda       210         Jones
100         A    Richard      100         Jones
100         B    Doug         100         Jones
200         A    Richard      200         Adams
110         B    Morris       110         Smith
 
(9 row(s) affected)

In addition, the following query tells you that there are six As in the GG table:

SELECT COUNT(*) AS [Count of As]
FROM   GG
WHERE  gd = 'A'

giving:

Count of As
------------
6
 
(1 row(s) affected)
 

Now, if you want to find out which instructor gave the As, you would type this query:

SELECT  SS.iname
FROM    SS, GG
WHERE   SS.sec = GG.ssec
  AND   GG.gd = 'A'

You get the following six rows of output:

iname
------------
Jones
Jones
Adams
Jones
Jones
Adams
 
(6 row(s) affected)

Now, to find "how many" As each instructor gave, include a COUNT and GROUP BY as follows:

SELECT  SS.iname AS [iname], COUNT(*) AS [count]
FROM    SS, GG
WHERE   SS.sec = GG.ssec
  AND   GG.gd = 'A'
GROUP BY SS.iname

This query produces the following output:

iname        count
------------ -----------
Adams        2
Jones        4
 
(2 row(s) affected)

This shows that instructor Adams gave two As and instructor Jones gave four As. So far, so good. You should note that the final count/grouping has the same number of As as the original tablesthe sum of the counts equals 6. Now, if you had devised a COUNT query with a sub-SELECT, you could get an answer that looked correct but in fact was not. For example, consider the following subquery version of the preceding join query:

SELECT  SS.iname AS [iname], COUNT(*) AS [count]
FROM    SS
WHERE   SS.sec IN
    (SELECT  GG.ssec
     FROM    GG
     WHERE   GG.gd = 'A')
GROUP BY SS.iname

This query produces the following output:

iname        count
------------ -----------
Adams        1
Jones        3
 
(2 row(s) affected)

The reason that you get this output is that the second query is counting names of instructors and whether an A is present in the set of courses that this instructor teachesnot how many As are in the set, just whether any exist. The previous join query gives you all the As in the joined table and hence gives the correct answer to the question "How many As did each instructor award?" The sub-SELECTed query answers a different question: "In how many sections did the instructor award an A?"

The point in this example is that if you are SELECTing and COUNTing, it is a very good idea to audit your results often. If you want to COUNT the number of As by instructor, begin by first counting how many As there are. Then, you can construct a query to join and count. You should be able to total and reconcile the number of As to the number of As by instructor. The fact that the result makes sense is very useful in determining (albeit not proving) correctness.


Previous Page
Next Page