Thursday, March 15, 2012

Index Clustering Factor

The clustering_factor measures how synchronized an index is with the data in a table. A table with a high clustering factor is out-of-sequence with the rows and large index range scans will consume lots of I/O. Conversely, an index with a low clustering_factor is closely aligned with the table and related rows reside together of each data block, making indexes very desirable for optimal access.

Oracle provides a column called clustering_factor in the dba_indexes view that provides information on how the table rows are synchronized with the index. The table rows are synchronized with the index when the clustering factor is close to the number of data blocks and the column value is not row-ordered when the clustering_factor approaches the number of rows in the table.


select a.index_name, a.num_rows, a.clustering_factor, b.blocks,b.avg_row_len from user_indexes a, user_tables b
where a.num_rows !=0 and a.table_name = b.table_name order by 2 desc,1 desc;


Un-Clustered Table Rows
clustering_factor ~= num_rows

Clustered Table Rows
clustering_factor ~= blocks

------------------------------------------------------------------------------------------------------------------------------------------------
- A good CF is equal (or near) to the values of number of blocks of table.

- A bad CF is equal (or near) to the number of rows of table.

- Rebuilding of index can improve the CF.

Then how to improve the CF?

- To improve the CF, it’s the table that must be rebuilt (and reordered).
- If table has multiple indexes, careful consideration needs to be given by which index to order table.

------------------------------------------------------------------------------------------------------------------------------------------------
Four factors work together to help the CBO decide whether to use an index or a full-table scan: the selectivity of a column value, the db_block_size, the avg_row_len, and the cardinality. An index scan is usually faster if a data column has high selectivity and a low clustering_factor.


when a column has high selectivity, a high clustering_factor, and small avg_row_len, there is still indication that column values are randomly distributed in the table, and an additional I/O will be required to obtain the rows. An index range scan would cause a huge amount of unnecessary I/O as shown in below, thus making a full-table scan more efficient.

---------------------------------------Calculating the Clustering Factor

To calculate the clustering factor of an index during the gathering of index statistics, Oracle does the following.

For each entry in the index Oracle compares the entry's table rowid block with the block of the previous index entry.

If the block is different, Oracle increments the clustering factor by 1.

If the clustering factor is close to the number of entries in the index, then an index range scan of 1000 index entries may require nearly 1000 blocks to be read from the table.

If the clustering factor is close to the number of blocks in the table, then an index range scan of 1000 index entries may require only 50 blocks to be read from the table.

No comments:

Post a Comment

Followers