Monday, May 19, 2008

In Search Of Search

My client was having troubles. The site was crashing, seemingly at random. QA was baffled, management concerned. After hunting thru logs, the culprit was found: the ferret search service.

Despite having configured acts_as_ferret to use the DRb server option, the "whole thing" would fall apart on a regular basis whenever the ferret daemon would crash... which was pretty often. The ruby on rails plugin acts_as_ferret would revert to a local mode of operation, which would only make matters worse when competing mongrels would corrupt the index files on disk.

Much has been written already about this by more famous Rubyists then myself, so the time had come to make the switch to Sphinx like all the rest of the cool kids. However, there were a few tricky issues that awaited.

One is that the client site was using PostgreSQL instead of the more common MySQL. This meant you had to have already downloaded the pgsql client libs, and also when compiling the sphinx server use the --with-pgsql flag. However, what they didn't mention was that you would need to also download and include the MySQL client libs AS WELL, even if you were not using MySQL. Good to know... seems like a big of baggage to carry, but whatever.

That was one obstacle avoided. The next would prove to be more subtle and complex. There are four different choices of Ruby on Rails clients for Sphinx:



I looked briefly at acts_as_sphinx, but my superficial prejudices against anything named "acts_as_" caused me to continue my search for search. Next, I tried Sphincter, just to prove that I could be accepting about project names. However, I was not successful as getting it working correctly, and project development seemed like it had slowed to near non-existence. The search moved on.

My search was getting desperate as half of the options had been eliminated. I briefly perused UltraSphinx and I started to get a little excited. Development was active, and proper functionality with Postgres was claimed. But there was just one little problem: this client's site is still Rails 1.2.x.

Now, before you all start throwing various disparaging comments my way, consider how much work some clients are willing to pay far vs. something that they can actually see. Multiply that by how long since they started their project, and you are starting to get the idea of where this thing is at. I probably should have MADE them upgrade at that point, but I'm just not that kind of guy.

So that left Thinking Sphinx. I dove in with the manic desperation of someone looking to avoid major amounts of work. That rarely works out well, but in this case it wasn't so bad. The latest trunk of thinking sphinx had comments that said "basic postgres functionality" so perhaps it was possible after all.

Several hours and some major hacking of the original plug-in to actually handle proper psql syntax later, and something was working! I will try to come up with patch to thinking-sphinx for freelancing-god, cause his work was a great help.

Lastly, the delta index. By adding a single field (boolean for MySQL, integer for Postgres) you can remove the need to reindex all data every time a record is updated. Sphinx allows storing the changed records in a separate index (delta) that it will search along with the main index. This way, you do not have to wait for reindexing to be able to search, but you also do not have to reindex the entire data set every time any data within it is changed. You still need to do a complete reindex, but that can occur at off-peak hours, instead of constantly.

The code was deployed, and now all was well with the world. My search was complete.

3 comments:

Unknown said...

Why didn't you just use PostgreSQL's built-in full text search functionality?

Unknown said...

OK, maybe that sounded a little obnoxious. Not intentional. Let me rephrase. I'm curious to know why you went through all that trouble instead of using PostgreSQL's built in search. I could understand why you wouldn't want to use MySQL's (various limitations described in MySQL docs). But PostgreSQL's seems pretty good. Is it deficient in some way I'm not aware of? I'm wondering because I'm building an app that requires search and sizing up options.

Ron Evans said...

No offense taken, it is a legit question! There were a few reasons why PostgreSQL tsearch full-text search seemed like a more difficult approach:
- Although there is a Rails plugin for Postgres called acts_as_tsearch it is in a very early stage of development, and does not support the latest Postgres 8.3 release which is where tsearch is built-in.

- The client is not running the 8.3 release anyhow, which means that the TSearch capability would need to be installed on the database cluster, which we do not have direct access to (being a shared cluster which is part of a managed hosting service).

- The full text syntax of pgsql to just write SQL to perform the search is rather ugly, and would probably be even harder to maintain, than what is generated for Sphinx.

- It is nice to pretend that we have database independent code