Why Use Sphinx - Finding the Top Results in Order

Web applications frequently need the top N results in order. As we discussed in “Optimizing LIMIT and OFFSET” on page 193, this is hard to optimize in MySQL.

The worst case is when the WHERE condition finds many rows (let’s say 1 million) and the ORDER BY columns aren’t indexed. MySQL uses the index to identify all the matching rows, reads the records one by one into the sort buffer with semirandom disk reads, sorts them all with a filesort, and then discards most of them. It will temporarily store and process the entire result, ignoring the LIMIT clause and churning RAM. And if the result set doesn’t fit in the sort buffer, it will need to go to disk, causing even more disk I/O.

This is an extreme case, and you might think it happens rarely in the real world, but in fact the problems it illustrates happen often. MySQL’s limitations on indexes for sorting—using only the leftmost part of the index, not supporting loose index scans, and allowing only a single range condition—mean many real-world queries can’t benefit from indexes. And even when they can, using semirandom disk I/O to retrieve rows is a performance killer.

Paginated result sets, which usually require queries of the form SELECT ... LIMIT N, M, are another performance problem in MySQL. They read N + M rows from disk, causing a large amount of random I/O and wasting memory resources. Sphinx can accelerate such queries significantly by eliminating the two biggest problems:

Memory usage
Sphinx’s RAM usage is always strictly limited, and the limit is configurable. Sphinx supports a result set offset and size similar to the MySQL LIMIT N, M syntax but also has a max_matches option. This controls the equivalent of the “sort buffer” size, on both a per-server and a per-query basis. Sphinx’s RAM footprint is guaranteed to be within the specified limits.

If attributes are stored in RAM, Sphinx does not do any I/O at all. And even if attributes are stored on disk, Sphinx will perform sequential I/O to read them, which is much faster than MySQL’s semirandom retrieval of rows from disks.

You can sort search results by a combination of relevance (weight), attribute values, and (when using GROUP BY) aggregate function values. The sorting clause syntax is similar to a SQL ORDER BY clause:

SetSortMode ( SPH_SORT_EXTENDED, 'price DESC, @weight ASC' );
// more code and Query( ) call here...

In this example, price is a user-specified attribute stored in the index, and @weight is a special attribute, created at runtime, that contains each result’s computed relevance. You can also sort by an arithmetic expression involving attribute values, common math operators, and functions:

SetSortMode ( SPH_SORT_EXPR, '@weight + log(pageviews)*1.5' );
// more code and Query( ) call here...

Source of Information : OReIlly High Performance MySQL Second Edition


Subscribe to Developer Techno ?
Enter your email address:

Delivered by FeedBurner