Tuesday, August 3, 2010

Exadata’s Best Kept Secret: Storage Indexes



via The Pythian Blog by Marc Fielding on 7/20/10

Although many electrons have been expended discussing Exadata's features, storage indexes tend to figure last, with a vague mention of row elimination in heavily clustered data. Even Oracle's Exadata software user guide devotes barely half a page to them. Unlike the better known smart scanning features though, storage indexes have an important advantage: rather than offloading workload to storage cells, they eliminate the need to do the I/O at all.
Here are some sample statistics taken from an actual production system:
SQL>; select name,value from v$sysstat where name in ('physical read total bytes','cell physical IO bytes saved by storage index');  
NAME                                                                       VALUE 
---------------------------------------------------------------- --------------- 
physical read total bytes                                        468779565615616 
cell physical IO bytes saved by storage index                    251319174832128 
That's right, over a third of all I/O was avoided entirely because of storage indexes. Interested now?

Storage indexes work by dividing to storage on each ASM griddisk into chunks. The first time an entire chunk is read (from a full-table scan, for example), the storage server keeps track of the highest and lowest values for each column in the chunk, and stores them in an in-memory structure. Whenever an insert or update comes in, each column value is checked against min and max values, and the values are updated if necessary. Deletes are a bit trickier, but I suspect the storage indexes are left as-is rather than undergoing the overhead of reading the entire chunk to determine new min-max values.
The next time a table scan request comes along, the storage server compares fixed predicated in the WHERE clause against the min and max values, and if the requested rows fall outside the range, they skip I/O against the chunk entirely.
Now you might think that typical column values in a chunk of data 1 MB or larger would have a wide range, and that storage indexes would give very little benefit. And it is true that a chunk with a single very high and a single very low value in a given column will match most queries and thus requiring the whole chunk to be read. However a lot of data, particularly the incrementing ID values so beloved of data architects, naturally tends to be correlated with the time the row was inserted into the database. And if you delete old rows in bulk or not at all, the storage chunks tend to have data inserted in the same time period. This correlation means that many values, ID value especially, tend to be clustered together within storage chunks.
When compared to regular indexes, storage indexes have a few important advantages. They consume no storage space at all, require negligible overhead to maintain or to use, and they cover all commonly queried columns of every table, even in combination.
Storage indexes however do have some drawbacks. First, they are designed for use with full-table scans. Index-based access paths will see fewer benefits because they tend to read fewer rows in the first place. Second, as in-memory structures, storage indexes do not survive a storage server shutdown, and must be rebuilt on subsequent startup. I suspect this drawback will disappear in future versions though, since saving storage index values to disk should be a simple exercise. And third, storage indexes don't store min and max values for encrypted tables, as this could inadvertently disclose some of the data.
But overall storage indexes can provide important performance benefits with almost zero overhead, and, joining Netezza and InfiniDB, expect to see more and more DBMS products offering them.