Previous Page
Next Page

9.2. The GROUP BY Clause

GROUP BY is used in conjunction with aggregate functions to group data on the basis of the same values in a column. GROUP BY returns one row for each value of the column(s) that is grouped. You can use GROUP BY to group by one column or multiple columns.

As an example of how to group by one column, the following statement shows how you can use the aggregate COUNT to extract the number of class groups (number of students in each class) from the Student table:

SELECT       class, COUNT(*) AS [count]
FROM         Student
GROUP BY  class

This query produces the following five rows of output, which is grouped by one column, class:

class count
----- -----------
NULL  10
1     11
2     10
3     7
4     10
 
(5 row(s) affected)

This type of statement gives you a new way to retrieve and organize aggregate data. Other aggregate functions would have a similar syntax.

You have to group by at least the attributes/expressions you are aggregating.


If a GROUP BY clause contains a two-column specification, the result is aggregated and grouped by two columns. For example, the following is COUNT of class and major from the Student table:

SELECT       class, major, COUNT(*)  AS [count]
FROM         Student
GROUP BY  class, major

This query produces the following output (24 rows), which is grouped by class within major:

class major   count
----- -----   -----------
NULL  NULL    3
2     ACCT    1
4     ACCT    4
3     ART     1
3     CHEM    1
4     CHEM    1
NULL  COSC    1
1     COSC    4
2     COSC    2
4     COSC    3
NULL  ENGL    1
1     ENGL    3
2     ENGL    2
3     ENGL    4
NULL  MATH    2
2     MATH    3
3     MATH    1
4     MATH    1
NULL  POLY    2
1     POLY    3
2     POLY    2
4     POLY    1
NULL  UNKN    1
1     UNKN    1
 
(24 row(s) affected)

The sequence of the columns in a GROUP BY clause has the effect of ordering the output. If we change the order of the GROUP BY like this:

SELECT      class, major, COUNT(*)  AS [count]
FROM        Student
GROUP BY major, class

our result will look like this:

class major count
----- ----- -----------
NULL  NULL  3
NULL  COSC  1
NULL  ENGL  1
NULL  MATH  2
NULL  POLY  2
NULL  UNKN  1
1     COSC  4
1     ENGL  3
1     POLY  3
1     UNKN  1
2     ACCT  1
2     COSC  2
2     ENGL  2
2     MATH  3
2     POLY  2
3     ART   1
3     CHEM  1
3     ENGL  4
3     MATH  1
4     ACCT  4
4     CHEM  1
4     COSC  3
4     MATH  1
4     POLY  1
 
(24 row(s) affected)

Here the output is grouped by major within class.

A statement like the following will cause a syntax error, because it says that you are to count both class and major, but GROUP BY class only:

SELECT       class, major, COUNT(*)
FROM         Student
GROUP BY  class

This query results in the following error message:

Msg 8120, Level 16, State 1, Line 1
Column 'Student.MAJOR' is invalid in the select list because it is not contained in either
 an aggregate function or the GROUP BY clause.

To be syntactically and logically correct, you must have all the non aggregate columns of the result set in the GROUP BY clause. For example, let's take a look at the data of Table 9-1.

Table 9-1. Room table

BLDG  ROOM  CAPACITY OHEAD
----- ----- -------- -----
13    101   85       Y
36    123   35       N
58    114   60       NULL
79    179   35       Y
79    174   22       Y
58    112   40       NULL
36    122   25       N
36    121   25       N
36    120   25       N
58    110   NULL     Y
 
(10 row(s) affected)


The following query would be improper, because you must GROUP BY "ohead" to SUM capacities for each ohead value:

SELECT  ohead, SUM(capacity)
FROM    Room

ohead, an attribute in the Room table (in our Student_Course database), is short for rooms with overhead projectors.


This query would produce an error message similar to what we saw previously:

Msg 8120, Level 16, State 1, Line 1
Column 'Room.OHEAD' is invalid in the select list because it is not contained in either an
 aggregate function or the GROUP BY clause.

If you SELECT columns and use an aggregate function, you must GROUP BY the non aggregate attributes. The correct version of the last statement is as follows:

SELECT       ohead, SUM(capacity) AS [sum]
FROM         Room
GROUP BY  ohead

which produces the following three rows of output:

ohead sum
----- -----------
NULL  100
N     110
Y     142
 
Warning: Null value is eliminated by an aggregate or other SET operation.
 
(3 row(s) affected)

This is the sum of room capacities for rooms that have no overhead projectors (N), rooms that have overhead projectors (Y), and rooms in which the overhead projector capacity is unknown (null).

Observe that in the Room table, some rooms have null values for ohead, and the null rows are summed and grouped along with the non-null rows.

9.2.1. GROUP BY and ORDER BY

To enhance the display of a GROUP BY clause, you can combine it with an ORDER BY clause. Consider the following example:

SELECT      class, major, COUNT(*) AS [count]
FROM        Student
GROUP BY class, major

The output for this query was presented earlier in the chapter.

This result set can also be ordered by any other column from the result set using the ORDER BY. For instance, the following example orders the output in descending order by COUNT(*):

SELECT class, major, COUNT(*)  AS [count]
FROM   Student
GROUP BY class, major
ORDER BY COUNT(*) DESC

This query produces the following output (24 rows):

class  major count
------ ----- -----------
4      ACCT  4
1      COSC  4
3      ENGL  4
2      MATH  3
4      COSC  3
1      ENGL  3
NULL   NULL  3
1      POLY  3
2      POLY  2
NULL   POLY  2
2      COSC  2
2      ENGL  2
NULL   MATH  2
3      MATH  1
4      MATH  1
NULL   ENGL  1
2      ACCT  1
3      ART   1
3      CHEM  1
4      CHEM  1
NULL   COSC  1
4      POLY  1
NULL   UNKN  1
1      UNKN  1
 
(24 row(s) affected)

9.2.2. GROUP BY and DISTINCT

When a SELECT clause includes all the columns specified in a GROUP BY clause, the use of the DISTINCT function is unnecessary and inefficient, because the GROUP BY clause groups rows in such a way that the column(s) that are grouped will not have duplicate values.


Previous Page
Next Page