Multiple statistics sharing the same leading column
SQL Server will not prevent you from creating identical indexes or statistics. You may think that everything is under your control, but have you ever added a missing index that has been reported by the query optimizer, or the DTA? If so, you will certainly have created duplicate statistics. Have a look at the following example:
-- create test table with 3 columns |
Before invoking the following SELECT command, allow the actual execution plan being displayed (Strg-M or selecting Query/Include Actual Execution Plan from the menu).
-- get some data |
If you look at the execution plan, you see that the optimizer complains about a missing index on column c2. The prognosticated improvement is about 99%, so adding the index it's certainly a good idea. Let's do it:
-- add missing index |
Perfect! The query runs much faster now and needs a lot fewer resources. But have a look at the table's statistics:
You see three statistics, one for the primary key, a second one for our created index ix_1, and a third one that was automatically created during execution plan generation for the first SELECT statement. This is the statistics, named _WA_Sys.. If the AUTO CREATE STATISTICS option is set to ON, the optimizer will add missing statistics automatically. In our little experiment, the optimizer had to generate this column statistics on column c2 in order to make some assumptions about the number of rows that had to be processed.
And here's the problem: When creating the index on column c2, a statistics on this column is also created, since every index has a corresponding linked statistics. That's just the way it works. At the time the index was added, the column statistics on c2 (that _WA_Sys. statistics) already existed. If you don't remove it manually, this statistics will remain there forever, although it is needless now. All it's good for is to increase maintenance efforts during statistics updates. You can safely remove this statistics by executing:
drop statistics t1._WA_Sys_... |
If you didn't think about this before, there's a chance that you'll find some of those superfluous statistics duplicates inside your database(s). Here's a query that finds index-related and column-statistics that match on the first column. Looking for matches on the first column is sufficient here, since the optimizer only automatically adds missing single-column statistics.
with all_stats as |
With that query at hand, you may easily find redundant statistics. Here's a sample output:
If you'd like to find out more about SQL Server statistics, you may want to check out my series of two articles, published recently on the Simple-Talk platform.
Part 1: Queries, Damned Queries and Statistics
Part 2: SQL Server Statistics: Problems and Solutions
It's free, so you might want to give it a try. Feel free to vote, if you like it!
Print article | This entry was posted by Holger Schmeling on 04.11.10 at 19:12:15 . Follow any responses to this post through RSS 2.0. |