Summarizing Data in Microsoft SQL Server 2008

The GROUP BY statement to summarize data is almost as old as SQL itself. Microsoft introduced additional constructs of ROLLUP and CUBE to add power to the GROUP BY clause in SQL Server 6.5 itself. What I have found during my experiences while training SQL Server professionals is that awareness about ROLLUP and CUBE is low and consequently professionals spend a lot of time building queries that could have been easily accomplished by using these constructs. I would attempt to explain the two constructs in this article. I would also attempt to explain a new construct introduced in SQL Server 2008, the GROUPING SETS operator. For the purpose of examples I have downloaded the AdventureWorks sample database from the link provided in the box.

A Simple GROUP BY Clause
Let us say you want to analyze the sales AdventureWorks is doing customer wise and product wise. You could write a query like this.

SELECT CustomerID, ProductID, SUM(LineTotal) AS LineTotal
FROM Sales.SalesOrderHeader INNER JOIN Sales.SalesOrderDetail
ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID
GROUP BY CustomerID, ProductID
ORDER BY CustomerID, ProductID

In the sample output of the Group By Clause query, you see the sum on a per-customer and per-product basis but you do not see what the sum of the line total is for one customer across all products. For that you need to use the ROLLUP operator.
Sample output of the Group By Clause query. Here, you see the sum on a per-customer basis on the left and per-product basis on the right.

The ROLLUP operator
The query for the ROLLUP operator can be written as follows:

SELECT CustomerID, ProductID, SUM(LineTotal) AS LineTotal
FROM Sales.SalesOrderHeader INNER JOIN Sales.SalesOrderDetail
ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID
GROUP BY CustomerID, ProductID WITH ROLLUP
ORDER BY CustomerID, ProductID
The output of the query using the CUBE operator. If you compare the statistics on the right hand side of the yellow status bar at the bottom of each screenshot, you will find that the number of rows and the time taken to run the query has been increasing to keep pace with the complexity of the query.

In the screenshot on the left, you can see the output. The first row in the screenshot shows you the sum across all products and customers. The second row is for one particular customer across all products. You will see for the other customers when you scroll down.What do you do if in the same output you want the sum for one product across all customers? You use the CUBE operator.
The output using the Grouping Sets operator. Observe the missing rows here compared to the CUBE operator.

The CUBE Operator
A sample query written for the CUBE operator.

SELECT CustomerID, ProductID, SUM(LineTotal) AS LineTotal
FROM Sales.SalesOrderHeader INNER JOIN Sales.SalesOrderDetail
ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID
GROUP BY CustomerID, ProductID WITH CUBE
ORDER BY CustomerID, ProductID

The screenshot on top right shows the output of the query using the CUBE operator. If you compare the statistics on the right hand side of the yellow status bar at the bottom of each screenshot, note that the number of rows and the time taken to run the query has been increasing to keep pace with the complexity of the query.

I have been summarizing data all along on two dimensions � customer and product. If you take more than two dimensions, for example, customer, product and the sales person, the number of rows in the result of CUBE will start growing exponentially because the possible number of combinations will start growing exponentially.

The GROUPING SETS Operator
Examine the output of the CUBE operator. What if you want the sum only customer-wise and only product-wise and you are not interested in the breakup customer-product-wise? You use the GROUPING SETS operator.

Here is the query:

SELECT CustomerID, ProductID, SUM(LineTotal) AS LineTotal
FROM Sales.SalesOrderHeader INNER JOIN Sales.SalesOrderDetail
ON Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID
GROUP BY GROUPING SETS(CustomerID, ProductID)
ORDER BY CustomerID, ProductID

Summary
Presenting summary information to the level of detail desired is one of the important challenges when the consumer of data is the business decision maker. Not all of us today have the luxury of using decision support systems. In this article, I have tried to introduce an easy to build query, summarizing features into your day-to-day data driven applications.

Advertisements

2 responses to “Summarizing Data in Microsoft SQL Server 2008

  1. Hi there !

    i would like to say that i totally agree with original poster

    chaussures

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s