Sometimes you’ll need to run SELECT queries against very large tables (containing millions of records), with several WHERE conditions on columns that have poor index selectivity (i.e., return too many rows for a given WHERE condition) or could not be indexed at all. Common examples include searching for users in a social network and
searching for items on an auction site. Typical search interfaces let the user apply WHERE conditions to 10 or more columns, while requiring the results to be sorted by other columns.

With the proper schema and query optimizations, MySQL can work acceptably for such queries, as long as the WHERE clauses don’t contain too many columns. But as the number of columns grows, the number of indexes required to support all possible searches grows exponentially. Covering all the possible combinations for just four columns strains MySQL’s limits. It becomes very slow and expensive to maintain the indexes, too. This means it’s practically impossible to have all the required indexes for many WHERE conditions, and you have to run the queries without indexes. More importantly, even if you can add indexes, they won’t give much benefit unless they’re selective. The classic example is a gender column, which isn’t much help because it typically selects half of all rows. MySQL will generally revert to a full table scan when the index isn’t selective enough to help it. Sphinx can perform such queries much faster than MySQL. You can build a Sphinx index with only the required columns from the data. Sphinx then allows two types of access to the data: an indexed search on a keyword or a full scan. In both cases, Sphinx applies filters, which are its equivalent of a WHERE clause. Unlike MySQL, which decides internally whether to use an index or a full scan, Sphinx lets you choose which access method to use.

To use a full scan with filters, specify an empty string as the search query. To use an indexed search, add pseudokeywords to your full-text fields while building the index and then search for those keywords. For example, if you wanted to search for items in category 123, you’d add a “category123” keyword to the document during indexing and then perform a full-text search for “category123.” You can either add keywords to one of the existing fields using the CONCAT( ) function, or create a special full-text field for the pseudokeywords for more flexibility. Normally, you should use filters for nonselective values that cover over 30% of the rows, and fake keywords for selective ones that select 10% or less. If the values are in the 10–30% gray zone, your mileage may vary, and you should use benchmarks to find the best solution. Sphinx will perform both indexed searches and scans faster than MySQL. Sometimes Sphinx actually performs a full scan faster than MySQL can perform an index read.

Source of Information : OReIlly High Performance MySQL Second Edition Jun 2008

0 comments


Subscribe to Developer Techno ?
Enter your email address:

Delivered by FeedBurner