There are many types of indexes in SQL Server, but not all of them get a lot of use or get used to their fullest potential. One of the less commonly-used types is a Filtered Index. Available beginning in SQL Server 2008, the filtered index feature is worth understanding, so that you know when it should be applied to be most effective. In this blog, I’ll explore the scenarios for using a filtered index.
What’s the Benefit of a Filtered Index?
A filtered index allows you to create an index with a WHERE clause, which reduces the number of rows stored in the index to the ones that are most common for queries in your environment. Like other types of indexes, you still specify your key columns and optionally other included columns, but this way you can keep the index size small if there is a lot of data in the table that isn’t relevant for your most common queries. Before implementing this type of index, you will need to work with your business users to understand their needs, so you can understand whether or not this will help query performance. You want to look for a table that has columns meeting the following criteria:
- Repeating values in a column (think states, countries, and statuses as good examples)
- One or more values in this column are queried frequently
- The values that are queried frequently can eliminate a lot of rows from the table.
To put the filtered index functionality into context, below are a few real-world scenarios where they have been applied:
- Scenario 1: A Minnesota-based company that primarily does business with in-state customers has a table that lists all of its customers. One of the columns in this table includes the customer’s state. This business frequently generates lists of its Minnesota customers since 90% of them are in-state, so this meets our criteria of repeating values and values that are queried frequently, but not other criteria. However, with 10% of the customers outside of Minnesota, a filtered index would contain 90% of the table rows. At that point, the query optimizer may find it more efficient to just scan the entire table. For these reasons, a filtered index may not be helpful in this scenario.
- Scenario 2: A manufacturing company’s database includes an order table in the database that includes an “Order Status” column. This column includes things such as:
- Order Received
- Queued for Production
- In Production
- Preparing for Shipment
- The Production Managers keep very close tabs on orders that are either “Queued for Production” or “In Production” by running reports several times a day. These requirements meet all three of our criteria: there are repeating values, we’re querying for specific values frequently, and a lot of rows are eliminated with the filter. Over time, as the factory completes more and more orders, the percentage of rows that are reported on gets lower and lower. Therefore, a filtered index may be helpful in this scenario.
- Scenario 3: A retail company has a database that includes a list of prospective clients. They purchased this list, which includes prospects from all over the world. However, the company primarily works in the upper-Midwest region of the U.S., and doesn’t frequently seek prospects from outside that region. For safe measure, the entire list was uploaded into the prospect table, in case it is need in the future. In this instance, three of our criteria have been met: it has repeating values (states), they are queried together frequently (our upper-Midwest region), and they eliminate a lot of rows from our worldwide prospect table. For this table, a filtered index would definitely be considered.
Filtered Index: Technical How-To
Now that we understand the situations where a filtered index would be helpful, let’s examine the technical how-to. For this example, I have a table called CustomerList, with a primary key (and therefore clustered index) on CustomerID. The other columns are not surprising for a customer table: FirstName, LastName, AddressLine1, AddressLine2, City, StateOrProvinceCode, and PostalCode. The table has about 19,000 rows, but only 7 rows with a “StateOrProvinceCode” of “MN”.
To show how the query optimizer is handling this, I wrote a query that gets all of our MN customers from the table. I turned on Statistics IO, and showed actual execution plans, so you can see the difference.
Table ‘CustomerList’. Scan count 1, logical reads 287, physical reads 1, read-ahead reads 285, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
As some interpretation of this information, the “Clustered Index Scan” shows that SQL Server had to read through the entire clustered index (which is the table), and look for the StateProvinceCode of MN. The Statistics IO output shows 287 logical reads, which doesn’t necessarily tell us much right now. Once we add the index, we’ll do the same thing and see how it looks for comparison. Below is the syntax for the index added to the table:
“FIStateMN” is simply the name given to the index. You can use your own conventions, but I think the “FI” prefix is a good idea for when you’re optimizing things later. [dbo].[CustomerList] is the table we’re adding the index, and (CustomerID) is the column we’re putting the with the index. We usually don’t want this to be the same thing we are filtering on since they will all have the same value or a limited subset of values after the filter, but columns that identify a row, along with any additional columns you may frequently be looking for should be included here.
Table ‘CustomerList’. Scan count 1, logical reads 16, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
So, the execution plan diagram looks more complicated, but it’s actually better. SQL Server was able to use our FIStateMN index to find the 7 records we needed, then just look those values up in the clustered index, and match the data together to return the results we needed. This is corroborated with the Statistics IO output (4 physical reads from the disk, but now 16 reads from memory rather than 287.)
As with anything, filtered indexes are not without their constraints. For example, the WHERE clause in the index must be definite; it cannot be calculation (such as the last 90 days). It can, however, be a range or a list of values. You cannot create a filtered index on a view, but sometimes adding a filtered index will work better than creating a filtered view. You can find detailed information about filtered indexes on the Microsoft Developer Network.
Filtered indexes are definitely a feature worth exploring and understanding, but it must be used in the correct situations in order for it to be effective.