Section 5.1. Aggregate Functions

5.1. Aggregate Functions

An aggregate function (or group function) is a function that returns a result (one number) after calculations based on multiple rows. We use the term "aggregate" (instead of "group"), because it avoids confusion later in the book (we discuss other GROUP functions in Chapter 9). An aggregate function basically combines multiple rows into a single number. Aggregate functions can be used to count the number of rows, find the sum or average of all the values in a given numeric column, and find the largest or smallest of the entries in a given column. In SQL, these aggregate functions are: COUNT, SUM, AVG, MAX, and MIN, respectively. In this section, we examine several of these aggregate functions.

5.1.1. The COUNT Function

The COUNT function is used to count how many (rows) of something there are, or the number of rows in a result set. Following is the general syntax for the COUNT function.

    SELECT COUNT(*)
    FROM   Table-name(s)

COUNT(*) returns a count of the number of rows in the table(s).

The following query counts the number of rows in the table, Grade_report:

    SELECT COUNT(*) AS [Count]
    FROM   Grade_report

The following is its output:

    Count
    -----------
    209

    (1 row(s) affected)

COUNT(*) counts all rows, including rows that have some (or even all) null values in some columns.

In Figure 5-1, we present the table definition of the Grade_report table to remind you of the columns available in the Grade_report table.

Figure 5-1. Table definition of the Grade_report table

Sometimes we want to count how many items we have in a specific column. The general syntax for counting the number of items in a specific column is:

    SELECT COUNT(attribute_name)
    FROM   Table-name(s)

For example, to count the number of grades in the grade column of the Grade_report table, we could type the following:

    SELECT COUNT(grade) AS [Count of Grade]
    FROM   Grade_report

This produces the following output:

    Count of Grade
    --------------
    114

    (1 row(s) affected)

COUNT(column) counts only non null columns. Although the Grade_report table has 209 rows, you get a count of 114 grades rather than 209 grades, because there are some null grades in the grade column.

The COUNT feature can be quite useful because it can save you from unexpectedly long results. Also, you can use it to answer "how many" queries without looking at the row-values themselves. In Chapter 4, which showed how Cartesian products are generated, you learned that SQL does not prevent programmers from asking questions that have very long or even meaningless answers. Thus, when dealing with larger tables, it is good to first ask the question, "How many rows can I expect in my answer?" This question may be vital if a printout is involved. For example, consider the question, "How many rows are there in the Cartesian product of the Student, Section, and Grade_report tables in our database?" This is answered by the query:

    SELECT COUNT(*) AS Count
    FROM   Student, Section, Grade_report

The following output shows the count from this query, which will be equal to the product of the table sizes of the three tables (the Cartesian product of the three tables). Obviously, in this example, it would be a good idea to first find out the number of rows in this result set before printing it.

    Count
    -----------
    321024

    (1 row(s) affected)

Contrast the previous COUNTing-query and its Cartesian product result to this query:

    SELECT  COUNT(*) AS [Count]
    FROM    Student, Grade_report, Section
    WHERE   Student.stno = Grade_report.student_number
    AND     Grade_report.section_id  = Section.section_id

The following is the result of this query:

    Count
    -----------
    209

    (1 row(s) affected)

What is requested here is a count of a three-way equi-join rather than a three-way Cartesian product, the result of which is something you probably would be much more willing to work with. Note also that you expect a count of about 209 from the sizes of the tables involved: Student (48 rows), Grade_report (209 rows), and Section (32 rows). The expected count of a join operation is of the order of magnitude of the larger number of rows in the tables.

SQL syntax will not allow you to count two or more columns at the same time. The following query will not work:

    SELECT COUNT (grade, section_id)
    FROM   Grade_report

You will get the following error message:

    Msg 174, Level 15, State 1, Line 2
    The COUNT function requires 1 argument(s).

5.1.2. The SUM Function

The SUM function totals the values in a numeric column. For example, suppose you have another table called Employee that looks like this:

    names           wage         hours
    --------------- ------------ -----------
    Sumon Bagui     10.0000      40
    Sudip Bagui     15.0000      30
    Priyashi Saha   18.0000      NULL
    Ed Evans        NULL         10
    Genny George    20.0000      40

    (5 row(s) affected)

In this Employee table, names is defined as a NVARCHAR column, wage is defined as a SMALLMONEY column, and hours is defined as SMALLINT.

This Employee table has not been created for you in the Student_course database. You have to create and insert rows into it in order to run the following queries.

To find the sum of hours worked, use the SUM function like this:

    SELECT SUM(Hours) AS [Total hours]
    FROM   Employee

This query produces the following output:

    Total hours
    ---------------------
    120

    Warning: Null value is eliminated by an aggregate or other SET operation.
    (1 row(s) affected)

Columns that contain null values are not included in the SUM function (and not in any aggregate numeric functions except COUNT(*)).

AS [Total hours] is an illustration of an alternative way of giving a title to a column.

5.1.3. The AVG Function

The AVG function calculates the arithmetic mean (the sum of non null values divided by the number of non null values) of a set of values contained in a numeric column (or attribute) in the result set of a query. For example, if you want to find the average hours worked from the Employee table, type:

    SELECT AVG(hours) AS [Average hours]
    FROM   Employee

This produces the following output:

    Average hours
    ---------------------
    30

    Warning: Null value is eliminated by an aggregate or other SET operation.
    (1 row(s) affected)

Again, note that the null value is ignored (not used) in the calculation of the average, so the total hours (120) is divided by 4 rather than 5.

5.1.4. The MIN and MAX Functions

The MIN function finds the minimum value from a column, and the MAX function finds the maximum value (once again, nulls are ignored). For example, to find the minimum and maximum wage from the Employee table, you could type the following:

    SELECT MIN(wage) AS [Minimum Wage], MAX(wage) AS [Maximum Wage]
    FROM   Employee

This query produces the following output:

    Minimum Wage Maximum Wage
    ------------ ------------
    20.0000

    Warning: Null value is eliminated by an aggregate or other SET operation.
    (1 row(s) affected)

The MIN and MAX functions also work with character and datetime columns. For example, if we type:

    SELECT "First name in alphabetical order" = MIN(names)
    FROM   Employee

We will get:

    First name in alphabetical order
    --------------------------------
    Ed Evans

    (1 row(s) affected)

And, if we type:

    SELECT "Last name in alphabetical order" = MAX(names)
    FROM   Employee

We will get:

    Last name in alphabetical order
    -------------------------------
    Sumon Bagui

    (1 row(s) affected)

In the case of strings, the MIN and MAX are related to the collating sequence of the letters in the string. Internally, the column that we are trying to determine the MIN or MAX of is sorted alphabetically. Then, MIN returns the first (top) of the alphabetical list, and MAX returns the last (bottom) of the alphabetical list.