Saturday, September 1, 2007

Principles for Index Creation

MySQL 5 Certification Study Guide > Chapter 22. Basic Optimizations > 22.2. Using Indexes for Optimization > 22.2.2. Principles for Index Creation
An index helps MySQL perform retrievals more quickly than if no index is used, but indexes can be used with varying degrees of success. Keep the following index-related considerations in mind when designing tables:
  • Declare an indexed column NOT NULL if possible. Although NULL values can be indexed, NULL is a special value that requires additional decisions by the server when performing comparisons on key values. An index without NULL can be processed more simply and thus faster.
  • Avoid over indexing; don't index a column just because you can. If you never refer to a column in comparisons (such as in WHERE, ORDER BY, or GROUP BY clauses), there's no need to index it.
  • Another reason to avoid unnecessary indexing is that every index you create slows down table updates. If you insert a row, an entry must be added to each of the table's indexes. Indexes help when looking up values for UPDATE or DELETE statements, but any change to indexed columns require the appropriate indexes to be updated as well.
  • One strategy the MySQL optimizer uses is that if it estimates that an index will return a large percentage of the records in the table, it will be just as fast to scan the table as to incur the overhead required to process the index. As a consequence, an index on a column that has very few distinct values is unlikely to do much good. Suppose that a column is declared as ENUM('Y','N') and the values are roughly evenly distributed such that a search for either value returns about half of the records. In this case, an index on the column is unlikely to result in faster queries.
  • Choose unique and non-unique indexes appropriately. The choice might be influenced by the data type of a column. If the column is declared as an ENUM, the number of distinct column values that can be stored in it is fixed. This number is equal to the number of enumeration elements, plus one for the '' (empty string) element that is used when you attempt to store an illegal value. Should you choose to index an ENUM column, you likely should create a non-unique index. A PRIMARY KEY would allow only as many rows as the number of distinct enumeration values. A UNIQUE index enforces a similar restriction, except that unless the column is declared NOT NULL, the index allows NULL values.
  • Index a column prefix rather than the entire column. MySQL caches index information in memory whenever possible to avoid reading it from disk repeatedly. Shortening the length of key values can improve performance by reducing the amount of disk I/O needed to read the index and by increasing the number of key values that fit into the key cache. This technique is discussed in Section 22.2.3, "Indexing Column Prefixes."
  • Avoid creating multiple indexes that overlap (have the same initial columns). This is wasteful because MySQL can use a multiple-column index even when a query uses just the initial columns for lookups. For more information, see Section 22.2.4, "Leftmost Index Prefixes."
  • The index creation process itself can be optimized if you are creating more than one index for a given table. ALTER TABLE can add several indexes in the same statement, which is faster than processing each one separately. CREATE INDEX allows only one index to be added or dropped at a time.
For indexed MyISAM or InnoDB tables, keeping the internal index statistics up to date helps the query optimizer process queries more efficiently. You can update the statistics with the ANALYZE TABLE statement. See Section 38.3.2, "Keep Optimizer Information Up to Date."


Powered by ScribeFire.

No comments: