Section 3.1. Data Types in SQL Server

3.1. Data Types in SQL Server

Every column in a table has a data type. The data type of a column specifies what kind of information or values can be stored in the column, and what kind of operations can be performed on those values. It is a matter of mapping the domain values you need to store to the corresponding data type. In selecting a data type, you should avoid wasting storage space, while allowing enough space for a sufficient range of possible values over the life of your application. SQL Server supports 30 different data types. We will discuss the most commonly used data types by dividing the data types into four major categories: numeric, character, date and time, and miscellaneous.

Domain values are the set of all possible values that a column can have. For example, the domain values for a GPA column may be 0 to 4.

Several of the primary data types also have valid synonyms that can be used instead of the regular data types. The synonyms are external names that are intended to make one SQL product compatible with another.

The more specific you are when selecting a data type for a column, the more accurate the information in your database will be. The following sections briefly describe each data type and its valid synonyms.

3.1.1. Numeric Data Types

Numeric data types should be used for storing numeric data, for data on which you want to perform numeric comparisons or arithmetic operations. Numeric data types can be divided into two groups: integers and decimals.

3.1.1.1. Integer data types

Integer data types have no digits after the decimal point, and range in size from 1 to 8 bytes of internal storage. Integer data types in SQL Server include:

BIGINT, which uses 8 bytes of storage and can be used to store numbers from -2⁶³ to 2⁶³ -1. Avoid using the BIGINT data type unless you really need its additional storage capacity.
INT, which uses 4 bytes of storage and can be used to store numbers from -2³¹ to 2³¹ -1.
SMALLINT, which uses 2 bytes of storage and can be used to store numbers from -2¹⁵ to 2¹⁵ -1.
TINYINT, which uses 1 byte of storage and can be used to store numbers from 0 to 255.
MONEY, which uses 8 bytes of storage.
SMALLMONEY, which uses 4 bytes of storage.

MONEY and SMALLMONEY are included among integer types because they are internally stored the same way as integers.

The synonym for INT is INTEGER.

3.1.1.2. Decimal data types

Decimal data types allow a larger range of values as well as a higher degree of accuracy than integer data types . For decimal data types, you can specify a precision and a scale. Precision is the total number of digits stored, and scale is the maximum number of digits to the right of the decimal point. The storage space of decimal data varies according to the precision. Decimals with a precision of 1 to 9 would take up 5 bytes of storage space; decimals with a precision of 10 to 19 would take up 9 bytes of storage, and so on.

Decimal data types include:

REAL, which uses 4 bytes for storage and has a precision of 7 digits. The synonym for REAL is FLOAT[(n)] for n = 1 to 7.
FLOAT, which uses 8 bytes for storage and has a precision of 15 digits. The synonym for FLOAT is DOUBLE PRECISION and FLOAT[(n)] for n = 8 to 15.
DECIMAL, whose storage size varies based on the specified precision and uses 217 bytes for storage. The synonyms for DECIMAL are DEC and NUMERIC.

Rounding errors can occur when using the FLOAT or REAL data types. NUMERIC or DECIMAL are better in such cases, because they give the precision and scale, without the problems of FLOAT or REAL.

When you are trying to select the numeric data type to use, your decision should be based on the maximum range of possible values that you want to store, and the precision and scale that you need. But, at the same time, you have to realize that data types that can store a greater range of values take up more space.

NUMERIC most closely resembles Oracle's NUMBER data type.

3.1.2. Character Data Types

Character data types are used to store any combination of letters, numbers and symbols. Single quotes have to be used when entering character data. SQL Server has five types of character data types: CHAR, VARCHAR, TEXT, NCHAR, NVARCHAR.

3.1.2.1. The CHAR data type

CHAR(n)s are fixed-length single-byte character strings that can be used to store up to 8,000 bytes of data. CHAR data is used when the column length is known and unvarying; for example, a Social Security number could be of CHAR(9) data type. Because CHARs use a fixed storage length, CHARs are accessed faster than VARCHARs (varying length character strings). You can and should specify the maximum byte length of a CHAR(n) data type with a value for n; otherwise, the default size will be used and the default size may be set to a size much higher than what you need. The synonym for CHAR is CHARACTER.

3.1.2.2. The VARCHAR data type

VARCHAR(n)s are variable length single-byte character strings that can also be used to store up to 8000 bytes of data. You can and should also specify the maximum byte length of VARCHARs with n, too; otherwise, as with the CHAR data type, the default size will be used, and the default size may be set to a size much higher than what you need. Variable length means that if less data than the specified n bytes is used, the storage size will be the actual length of the data entered. The synonym for VARCHAR is CHAR VARYING. VARCHAR is the most commonly used character (string) type.

VARCHAR2 is the Oracle equivalent of VARCHAR.

3.1.2.3. The TEXT data type

TEXTs are also variable-length single-byte character strings, but may be used to store more than 8,000 bytes. The TEXT data type, in SQL Server , is a large object data type, better used if you need to store large strings of data. TEXT has extra overhead that drags down performance. Therefore, the use of the TEXT data type is not encouraged.

LONG is the Oracle equivalent of TEXT.

3.1.2.4. The NCHAR data type

NCHARs are fixed-length Unicode character strings . You can also specify the maximum byte length of NCHAR with n. The synonym for NCHAR is NATIONAL CHAR.

3.1.2.5. The NVARCHAR data type

NVARCHARs are variable-length Unicode character strings. You can specify the maximum byte of NVARCHAR length with n. The synonym for NVARCHAR is NATIONAL CHARACTER VARYING.

3.1.2.6. Unicode character strings

Unicode character strings need two bytes for each stored character. Most English and European alphabets can, however, be stored in single-byte characters. Single-byte character strings can store up to 8,000 characters, and Unicode character strings can store up to 4,000 characters.

3.1.2.7. Selecting the character data types

Some general rules that you can follow to determine which character data type to use:

Use the variable-length data types (VARCHAR) over fixed-length data types (CHAR) when you expect a lot of null values or a lot of variation in the size of data.
If a column's data does not vary widely in number of characters, consider using CHAR instead of VARCHAR.
NVARCHAR or NCHAR data types should not be used unless you need to store 16-bit character (Unicode) data. NVARCHARs and NCHARs take up twice as much space as VARCHAR or CHAR data types, reducing I/O performance.

3.1.3. Date and Time Data Types

SQL Server has two data types for storing date and time information: DATETIME and SMALLDATETIME. DATETIME uses 8 bytes. SMALLDATETIME uses 4 bytes of storage. Internally, the DATETIME and SMALLDATETIME values are stored completely differently from how you enter them or how they are displayed. They are stored as two separate components, a date component and a time component.

DATE is the Oracle equivalent of DATETIME.

When creating primary keys, do not consider using the DATETIME and SMALLDATETIME data types. From a performance standpoint, it is better to use a data type that uses less space for a primary key. The less the space used for a primary key, the smaller the table and index, and the less I/O overhead will be required to access the primary key.

Creation of primary keys will be discussed in Chapter 11.

3.1.4. Miscellaneous Data Types

Among other data types available in SQL Server are BINARY, IMAGE, BIT, TABLE, SQL_VARIANT, UNIQUEIDENTIFIER, and the XML data type (one of SQL Server's newest enhancements).

3.1.4.1. The BINARY data type

The BINARY data types are BINARY and VARBINARY.

BINARY data types are used to store strings of bits, and values are entered and displayed using their hexadecimal (hex) representation. The maximum length of the BINARY data type is 8,000 bytes. You can specify the maximum byte length of BINARY data with n.

The VARBINARY data type can store up to 8,000 bytes of variable-length binary data. Once again, you can also specify the maximum byte length with n. The VARBINARY data type should be used (instead of the BINARY data type) when you expect to have null values or a variation in data size.

RAW is the Oracle equivalent of VARBINARY.

3.1.4.2. The IMAGE data type

The IMAGE data type is a large object binary data type that stores more than 8000 bytes. The IMAGE data type is used to store binary values and is also used to store pictures.

LONG RAW is the Oracle equivalent of IMAGE.

3.1.4.3. The BIT data type

The BIT data type is actually an integer data type that can store only a 0 or a 1 and can consume only a single bit of storage space. However, if there is only a one bit column in a table, it will actually take up a whole byte. Up to 8-bit columns are stored in a single byte. The BIT data type is usually used for true/false or yes/no types of data. BIT columns cannot be NULL and cannot have indexes on them.

3.1.4.4. The monetary data types

Monetary data types are generally used to store monetary values. SQL Server has two monetary data types:

MONEY, which uses 8 bytes of storage
SMALLMONEY, which uses 4 bytes of storage

3.1.4.5. The TABLE data type

The TABLE data type can be used to store the result of a function and can be used as the data type of local variables. Columns in tables, however, cannot be of type TABLE. Table variables are sometimes preferable to temporary tables, because table variables are cleaned up automatically at the end of a function or stored procedure.

Temporary tables are covered in Chapter 6. Discussing functions and stored procedures is beyond the scope of this book.

3.1.4.6. The SQL_VARIANT data type

Values stored in a SQL_VARIANT column can be any data type except TEXT or IMAGE. The usage of the SQL_VARIANT data type should be avoided for several reasons: (a) a SQL_VARIANT column cannot be part of a primary or foreign key; (b) a SQL_VARIANT column cannot be part of a computed column; (c) a SQL_VARIANT column can be used in indexes or as other unique keys only if they are shorter than 900 bytes; (d) a SQL_VARIANT column must convert the data to another data type when moving data to objects with other data types.

Foreign keys are discussed in Chapter 11.

3.1.4.7. The UNIQUEIDENTIFIER data type

The UNIQUEIDENTIFIER data type, also referred to as globally unique identifier (GUID) or universal unique identifier (UUID), is a 128-bit generated value that guarantees uniqueness worldwide, even among unconnected computers.

3.1.4.8. The XML data type

The XML data type is a new data type that has been added to SQL Server to handle XML data. XML can model complex data. The XML column can be typed or untyped. Like other data types, the XML data type must meet specific formatting criteria. It must conform to well-formatted XML criteria (which is untyped) and you can optionally add additional conformance criteria by specifying a Schema collection (typed). SQL Server will also allow you to store XML documents associated with multiple schema definitions. The XML data type will allow you to store complete XML documents or fragments of XML documents. XML documents are limited to two gigabytes of data.

3.1.5. Selecting Data Types

Here we present some general rules that you can follow to determine which data type to use to define a column:

Use the smallest possible column sizes. The smaller the column size, the lesser the amount of data that SQL Server has to store and process, and the faster SQL Server will be able to read and write the data. In addition, the narrower the column, the faster a sort will be performed on a column.
Use the smallest possible data type for a column that will hold your data. For example, if you are going to be storing numbers from 1 to 99 in a column, you would be better off selecting the TINYINT data type instead of the INT data type.
For numeric data, it is better to use a numeric data type such as INTEGER, instead of using VARCHAR or CHAR, because numeric data types generally require less space to hold numeric values then character data types. This saves space, and smaller columns can improve performance when the columns are searched, joined with other columns, or sorted.

Joins are discussed in Chapter 4.

FLOATs or REALs should not be used to define primary keys. Integer data types can be used for primary keys.
Avoid selecting the fixed length columns--CHAR or NCHAR--if your column will have a lot of nulls. The NULL in a CHAR or NCHAR field will take up the entire fixed length of 255 characters. This wastes much space and reduces SQL Server's overall performance.
If you are going to be using a column for frequent sorts, consider an integer-based column rather than a character-based column. SQL Server sorts integer data faster than character data. [1]