David Beahm
06/28 / 2007
Introduction
Developers and database administrators have long debated methods for paging recordset results from Microsoft SQL Server, trying to balance ease of use with performance. The simplest methods were less efficient because they retrieved entire datasets from SQL Server before eliminating records which were not to be included, while the best-performing methods handled all paging on the server with more complex scripting. The ROW_NUMBER() function introduced in SQL Server 2005 provides an efficient way to limit results relatively easily.
Paging Efficiency
In order to scale well, most applications only work with a portion of the available data at a given time. Web-based data maintenance applications are the most common example of this, and several data-bindable ASP.NET classes (such as GridView and Datagrid) have built-in support for paging results. While it is possible to handle paging within the web page code, this may require transferring all of the data from the database server to the web server every time the control is updated. To improve performance and efficiency, data which will not be used should be eliminated from processing as early as possible.
Paging Methods
Many popular databases offer functions allowing you to limit which rows are returned for a given query based upon their position within the record set. For example, MySQL provides the LIMIT qualifier, which takes two parameters. The first LIMIT parameter specifies which (zero-based) row number will be the first record returned, and the second parameter specifies the maximum number of records returned. The query: ...will return the 20th through the 32nd records -- assuming at least 33 records are available to return. If fewer than 33 records are available, the query will return all records from record 20 on. If fewer than 20 records are available, none will be returned. SQL Server does not have this functionality, however the 2005 release does have a number of other new tricks. For instance, support for CLR procedures means it is possible to use existing paging methods to write VB.NET or C# code that would execute within the SQL Server environment. Unfortunately, CLR procedures are not as efficient as native Transact SQL. To ensure best performance, queries should still be written in TSQL whenever practical. SELECT * FROM table LIMIT 20,13
Using ROW_NUMBER()
TSQL in the 2005 release includes the ROW_NUMBER() function, which adds an integer field to each record with the record's ordinal result set number. Stated more simply, it adds the record's position within the result set as an additional field so that the first record has a 1, the second a 2, etc. This may appear to be of little value, however by using nested queries we can use this to our advantage. To demonstrate ROW_NUMBER() and to explore how the paging solution works, create a simple salary table and populate it with random data using the following commands: The ROW_NUMBER() function has no parameters - it simply adds the row number to each record in the result set. To ensure the numbering is consistent, however, SQL Server needs to know how to sort the data. Because of this, ROW_NUMBER() must immediately be followed by the OVER() function. OVER() has one required parameter, which is an ORDER BY clause. The basic syntax for querying the Salaries table is: This returns the following result: The Salaries data now appears sorted by person, and it has an extra column indicating each record's position within the results. If for any reason you wanted the results to display in a different order than they were numbered in, you can include a different ORDER BY clause as part of the normal SELECT syntax: This returns the following result: If we want to limit the results displayed to a certain range, we need to nest this SELECT inside another one and provide a name for the ROW_NUMBER() column. To limit our results to records 5 through 9, we can use the following query: This returns the following result: Again, we can change the sort order by adding an ORDER BY clause. This is most easily accomplished by using the outer SELECT statement: This returns the following result: If we want to support the same type of arguments that MySQL's LIMIT() supports, we can create a stored procedure that accepts a beginning point and a maximum number of records to return. ROW_NUMBER requires that the data be sorted, so we will also have a required parameter for the ORDER BY clause. Execute the following statement to create a new stored procedure: The pageSalaries procedure begins with SET NOCOUNT ON to disable the record count message (a common step for optimizing query performance). We then declare two necessary variables, @STMT and @ubound. Because we want to be able to change what ORDER BY argument is used, we need to dynamically generate our query statement by storing it in @STMT. The next lines ensure that only positive numbers are used for the starting position and maximum size, then calculate the range of ROW_NUMBER() values being requested. (If we wanted to be zero-based like MySQL's LIMIT, we could do so with a few minor tweaks.) Once the dynamic SQL command has been strung together, it is executed so that the results are returned. Execute the following statement to test the stored procedure: This returns the following result: If we execute: we receive back: ... because the query goes beyond the number of records available. Taking this one step further, we can make a stored procedure that does a more general form of paging. In fact, it can be generalized to the point that it can be used to return any collection of fields, in any order, with any filtering clause. To create this wunderkind marvel, execute the following command: You may receive the following error message from SQL Server, which you can confidently ignore: The utilPage procedure accepts 6 parameters: The stored procedure needs the name of a data source to query against (such as a table) and one or more fields to sort by (since OVER() requires an ORDER BY clause). If @filter is blank (the default), it will be set to "1 = 1" as a simple way to select all records. If @pageSize is not supplied, the query will run without paging and will not return a record count. If, however, @pageSize is supplied, a version of the query is executed to get the total number of records. In order to have this record count available within the procedure and as a returned value, we use sp_executeSQL to support executing the statement while returning an output parameter. The record count is used to prevent returning empty results when possible, and to support paging interfaces that calculate the number of pages available (such as GridView). If we were calling this stored procedure to populate a GridView, we would return @recct as a ReturnValue parameter instead of using a result set, but we will use a result set for demonstration purposes. The procedure calculates what the actual record positions will be for the requested page. Rather than allow the query to fail, there are safety checks ensuring that @pageSize and @pageNum are greater than zero, and that the result set will not be empty. If the specified page is out of range, this procedure will return the last possible page of records. This is helpful if a user changes more than one setting before refreshing their data, or if a significant amount of data is deleted between requests. The remainder of the procedure is virtually identical to the pageSalaries procedure. To test the utilPAGE stored procedure, execute the following statement: This returns the following two result sets: If we execute: ...we receive back: Even though the request should be for records 36 through 38 - far outside of what is available - the procedure returns the last available page of records. In contrast, requesting the third page with seven records per page using: ...returns the last three records, as the page is not completely out of bounds: All of these examples are based on simple single-table queries, which may not reflect what you need in the real world. While the utilPAGE procedure does not support ad-hoc JOINs, it does work with SQL Views. If you want paging support for multi-table queries, you should create a View (with all of the necessary JOINs) to use as the data source. Using a View follows good design practices as it ensures that your Joins are performed consistently, allows easier ad-hoc querying from the command line, and is much easier to troubleshoot than a stored procedure's dynamic SELECT statement logic. CREATE TABLE [dbo].[Salaries](
[person] [nvarchar](50) NOT NULL,
[income] [money] NOT NULL,
CONSTRAINT [PK_salaries] PRIMARY KEY CLUSTERED(
[person] ASC
)) ON [PRIMARY]
GO
INSERT INTO Salaries VALUES ('Joe', '28000')
INSERT INTO Salaries VALUES ('Sue', '96000')
INSERT INTO Salaries VALUES ('Michael', '45000')
INSERT INTO Salaries VALUES ('John', '67000')
INSERT INTO Salaries VALUES ('Ralph', '18000')
INSERT INTO Salaries VALUES ('Karen', '73000')
INSERT INTO Salaries VALUES ('Waldo', '47000')
INSERT INTO Salaries VALUES ('Eva', '51000')
INSERT INTO Salaries VALUES ('Emerson', '84000')
INSERT INTO Salaries VALUES ('Stanley', '59000')
INSERT INTO Salaries VALUES ('Jorge', '48000')
INSERT INTO Salaries VALUES ('Constance', '51000')
INSERT INTO Salaries VALUES ('Amelia', '36000')
INSERT INTO Salaries VALUES ('Anna', '49000')
INSERT INTO Salaries VALUES ('Danielle', '68000')
INSERT INTO Salaries VALUES ('Stephanie', '47000')
INSERT INTO Salaries VALUES ('Elizabeth', '23000') SELECT ROW_NUMBER() OVER(ORDER BY person), person, income
FROM Salaries
(No column name) person income 1 Amelia 36000.00 2 Anna 49000.00 3 Constance 51000.00 4 Danielle 68000.00 5 Elizabeth 23000.00 6 Emerson 84000.00 7 Eva 51000.00 8 Joe 28000.00 9 John 67000.00 10 Jorge 48000.00 11 Karen 73000.00 12 Michael 45000.00 13 Ralph 18000.00 14 Stanley 59000.00 15 Stephanie 47000.00 16 Sue 96000.00 17 Waldo 47000.00 SELECT ROW_NUMBER() OVER(ORDER BY person), person, income
FROM Salaries
ORDER BY income
(No column name) person income 13 Ralph 18000.00 5 Elizabeth 23000.00 8 Joe 28000.00 1 Amelia 36000.00 12 Michael 45000.00 15 Stephanie 47000.00 17 Waldo 47000.00 10 Jorge 48000.00 2 Anna 49000.00 3 Constance 51000.00 7 Eva 51000.00 14 Stanley 59000.00 9 John 67000.00 4 Danielle 68000.00 11 Karen 73000.00 6 Emerson 84000.00 16 Sue 96000.00 SELECT *
FROM (SELECT ROW_NUMBER() OVER(ORDER BY person) AS
rownum, person, income FROM Salaries) AS Salaries1
WHERE rownum >= 5 AND rownum <= 9
rownum person income 5 Elizabeth 23000.00 6 Emerson 84000.00 7 Eva 51000.00 8 Joe 28000.00 9 John 67000.00 SELECT *
FROM (SELECT ROW_NUMBER() OVER(ORDER BY person) AS
rownum, person, income FROM Salaries) AS Salaries1
WHERE rownum >= 5 AND rownum <= 9
ORDER BY income
rownum person income 5 Elizabeth 23000.00 8 Joe 28000.00 7 Eva 51000.00 9 John 67000.00 6 Emerson 84000.00 CREATE PROCEDURE [dbo].[pageSalaries]
@start int = 1
,@maxct int = 5
,@sort nvarchar(200)
AS
SET NOCOUNT ON
DECLARE
@STMT nvarchar(max), -- SQL statement to execute
@ubound int
IF @start < 1 SET @start = 1
IF @maxct < 1 SET @maxct = 1
SET @ubound = @start + @maxct
SET @STMT = ' SELECT person, income
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY ' + @sort + ') AS row, *
FROM Salaries
) AS tbl
WHERE row >= ' + CONVERT(varchar(9), @start) + ' AND
row < ' + CONVERT(varchar(9), @ubound)
EXEC (@STMT) -- return requested records pageSalaries 4, 7, 'income'
person income Amelia 36000.00 Michael 45000.00 Stephanie 47000.00 Waldo 47000.00 Jorge 48000.00 Anna 49000.00 Constance 51000.00 pageSalaries 13, 7, 'income'
person income John 67000.00 Danielle 68000.00 Karen 73000.00 Emerson 84000.00 Sue 96000.00 CREATE PROCEDURE [dbo].[utilPAGE]
@datasrc nvarchar(200)
,@orderBy nvarchar(200)
,@fieldlist nvarchar(200) = '*'
,@filter nvarchar(200) = ''
,@pageNum int = 1
,@pageSize int = NULL
AS
SET NOCOUNT ON
DECLARE
@STMT nvarchar(max) -- SQL to execute
,@recct int -- total # of records (for GridView paging interface)
IF LTRIM(RTRIM(@filter)) = '' SET @filter = '1 = 1'
IF @pageSize IS NULL BEGIN
SET @STMT = 'SELECT ' + @fieldlist +
'FROM ' + @datasrc +
'WHERE ' + @filter +
'ORDER BY ' + @orderBy
EXEC (@STMT) -- return requested records
END ELSE BEGIN
SET @STMT = 'SELECT @recct = COUNT(*)
FROM ' + @datasrc + '
WHERE ' + @filter
EXEC sp_executeSQL @STMT, @params = N'@recct INT OUTPUT', @recct = @recct OUTPUT
SELECT @recct AS recct -- return the total # of records
DECLARE
@lbound int,
@ubound int
SET @pageNum = ABS(@pageNum)
SET @pageSize = ABS(@pageSize)
IF @pageNum < 1 SET @pageNum = 1
IF @pageSize < 1 SET @pageSize = 1
SET @lbound = ((@pageNum - 1) * @pageSize)
SET @ubound = @lbound + @pageSize + 1
IF @lbound >= @recct BEGIN
SET @ubound = @recct + 1
SET @lbound = @ubound - (@pageSize + 1) -- return the last page of records if -- no records would be on the
-- specified page
END
SET @STMT = 'SELECT ' + @fieldlist + '
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY ' + @orderBy + ') AS row, *
FROM ' + @datasrc + '
WHERE ' + @filter + '
) AS tbl
WHERE
row > ' + CONVERT(varchar(9), @lbound) + ' AND
row < ' + CONVERT(varchar(9), @ubound)
EXEC (@STMT) -- return requested records
ENDCannot add rows to sys.sql_dependencies for the stored procedure because it depends on the missing table 'sp_executeSQL'. The stored procedure will still be created; however, it cannot be successfully executed until the table exists.
@datasrc - the table (or stored procedure, etc.) name @orderBy - the ORDER BY clause @fieldlis - the fields to return (including calculated expressions) @filter - the WHERE clause @pageNum - the page to return (must be greater than or equal to one) @pageSize - the number of records per page utilPAGE 'Salaries', 'person', '*', 'income > 1000', 2, 4
recct 17
row person income 5 Elizabeth 23000 6 Emerson 84000 7 Eva 51000 8 Joe 28000 utilPAGE 'Salaries', 'person', 'person, income', '', 13, 3
recct 17
person income Stephanie 47000 Sue 96000 Waldo 47000 utilPAGE 'Salaries', 'person', 'person, income', '', 3, 7
person income Stephanie 47000 Sue 96000 Waldo 47000
Conclusion
While SQL Server does not have as simple a method for paging results as some other databases, features introduced in the 2005 release have made it possible to page results efficiently more easily than ever before. In the next article in this series, we will go a step further and integrate this paging logic with a GridView through a Data Access Layer.
No comments:
Post a Comment