January 30, 2011

Sphinx v/s Microsoft Search Server - Part 2

In my last post I had compared Sphinx & Microsoft Sharepoint Search with respect to indexing time & sizes for a fresh/full indexing. Sphinx completely dominated Microsoft Sharepoint Search by leaps and bounds. Here are the results for incremental indexing:

Sharepoint Statistics
Test 1
Total Number of Records: 1 million (already indexed) + 50000 new records
Time Taken for new records to be searchable: 2 min:21 sec

Sphinx Statistics
Test 1
Total Number of Records: 1 million + 50000 new records
Time Taken for records to be searchable: 0.2 seconds

Test 2
Total Number of Records: 10 million + 75000 new records
Time Taken for records to be searchable: 0.8 second

Sphinx Incremental Indexing Tests in Detail

2 step process:

1: Incremental Indexing
Sphinx supports "live" (almost real time) index updates and it could be implemented using so called "main+delta" scheme. The idea is to set up two sources and two indexes, with one "main" index for the data, and one "delta" for the new documents. Say for example we have some X Million records so we can keep that as the main index and all the new documents get added to a new table which will act as the delta. This new table can be indexed from time to time (depending on application) and the data gets searchable within seconds.

Tests carried out: Main Index: 10 million records
I created a new table (delta) and 75000 new documents were added in that table.
Time Taken by Sphinx to index and make the delta searchable: 0.8 seconds.

2: Merging
Depending upon our search requirements we can perform the merge of 2 indexes (i.e. main + delta) as and when needed and make the delta table empty.
Merging of above 10 million records & newly added 75000 records took 30 sec.


Sphinx is a great Information Retrieval System! If you love Algorithms you will definitely love to see the Sphinx code (written in C++) as the data structures used and running times are highly optimized. Thanks to Andrew Aksyonoff for creating a wonderful product.

No comments:

Post a Comment