Published on 2009-04-10 by John Collins. Socials: YouTube - X - Spotify - Amazon Music - Apple Podcast |
Lately I have been doing a lot of work with Lucene search engines and related search indexes. I came to the conclusion that for large high-volume deployments, you really need to separate out search from the main website. My main reason for this conclusion was that for the project I was working on, Lucene was taking up so much resources that it was actually impacting on the the main website of the application in a dramatic fashion.
The best way to separate out search is to think of search as a service: by placing our search functionality onto a separate server, wrapped in a REST or SOAP API (take your pick), we then have our main website act as a consumer of that service by issuing search requests and awaiting search result responses, which the website can then present to the user in a nice manner.
Luckily a lot of other people have already come to the same conclusion. For Java, if you want to stick with Lucene as the underlying technology, the Solr project looks interesting:
Introduction to The Solr Enterprise Search Server
For PHP, there is Majory:
Majory is also using Lucene indexes, but uses the PHP implementation from Zend. Solr is the more mature project, but Majory might be a good option for those on shared hosting where PHP is always a better option than Java, so it is worth keeping an eye on Majory to see how it develops.
Updated 2021 : note that the above post was originally published in 2009, but is left here for archival purposes. I have fixed several of the external links that were broken. Solr is still going strong, but that Majory client for PHP has not been updated since 2009, so Solarium would be a better option in 2021.