Lately I have been doing a lot of work with Lucene search engines and related search indexes. I came to the conclusion that for large high-volume deployments, you really need to separate out search from the main website. My main reason for this conclusion was that for the project I was working on, Lucene was taking up so much resources that it was actually impacting on the the main website of the application in a dramatic fashion.
The best way to separate out search is to think of search as a service: by placing our search functionality onto a separate server, wrapped in a REST or SOAP API (take your pick), we then have our main website act as a consumer of that service by issuing search requests and awaiting search result responses, which the website can then present to the user in a nice manner.
Luckily a lot of other people have already come to the same conclusion. For Java, if you want to stick with Lucene as the underlying technology, the Solr project looks interesting:
For PHP, there is Majory:
Majory is also using Lucene indexes, but uses the PHP implementation from Zend. Solr is the more mature project, but Majory might be a good option for those on shared hosting where PHP is always a better option than Java, so it is worth keeping an eye on Majory to see how it develops.