When the visitor accesses the homepage, the region is determined from their IP address so only the ads for that region are displayed (the visitor can change the region) and every search is performed only for the ads on that region as well.
At first everything was working great, but when some regions started to have more than 500.000 ads, that’s when we started to get nervous. Every search on these regions took more than 30 seconds, which is obviously unacceptable. We tried everything: create a full-text field on the ads table, optimized the database queries, but could not bring the search time down to a reasonable amount. At this point I was considering fleeing the country, changing my name and closing all my email accounts so that our client couldn’t find me, and start a new life selling bananas in Brazil. I’m glad I didn’t do it (I think).
After a lot of research over the internet, we came across a miracle. That miracle is called Sphinx Search (http://www.sphinxsearch.com). This is an open-source solution for all your MySQL full-text search problems.
How does it work? You simply install the software on your server, create a configuration file for the table you want to perform the searches on and run the indexing process. This will create a dictionary file that allows you to perform searches in less than a second, on huge tables. You can also provide filters on any of the fields of the table as well as perform sorting on the result set. This software also provides a PHP API so that you can access it from your PHP pages and process the result set.
An important side note: since now all searches are performed on the data dictionary (which is created by a process than can take several minutes, so it’s advisable to run it only once or twice a day) and users can be adding new rows to the original table, the data dictionary will not have all the current data.
How can you solve this issue? Easy, you also create a delta dictionary file with the new rows and schedule a process to rebuild it many times a day (depending on how often new rows are added to your table). This delta dictionary is also included on the same configuration file as the master dictionary, but using a different query.
How do you combine both dictionaries? You can have a process that runs at midnight rebuilding the master dictionary with the overall query (e.g. SELECT * FROM ad) and another that runs every fifteen minutes rebuilding the delta dictionary with the new rows added that day. When the process runs again at midnight to rebuild the master index, the data from the delta dictionary will be incorporated to the master index and the delta index will be reset.
Now every search on the site is working great and we are even using it to obtain all the ads from a specific category without a keyword, because it also takes less than a second. It also allows you to extend the search filtering by fields, grouping and sorting.
Here’s an example configuration file for my classified ads site:
source ads {
type = mysql
sql_host = localhost
sql_user = user
sql_pass = password
sql_db = myDb
sql_port = 3306
sql_query = \
SELECT \
id, title, descr, \
catid, UNIX_TIMESTAMP(hw_added) as hw_added, \
UNIX_TIMESTAMP(exp_date) as exp_date \
FROM \
ad;
sql_attr_str2ordinal = v_title
sql_attr_uint = catid
sql_attr_timestamp = hw_added
sql_attr_timestamp = exp_date
sql_query_info = SELECT link_id FROM ec4_ad WHERE link_id=$id
}
source delta : ads
{
SELECT \
id, title, descr, \
catid, UNIX_TIMESTAMP(hw_added) as hw_added, \
UNIX_TIMESTAMP(exp_date) as exp_date \
FROM \
ad \
WHERE (TO_DAYS(hw_added) = TO_DAYS(NOW()) ;
}
index ads {
source = ads
path = /home/classifieds/sphinx/main/
# wordforms = /home/classifieds/wordforms.txt
# morphology = stem_en
min_word_len = 3
min_prefix_len = 0
min_infix_len = 3
}
index delta : ads {
source = delta
path = /home/classifieds/sphinx/delta/
}
indexer {
mem_limit = 256M
}
searchd {
port = 3312
log = /home/classifieds/searchd.log
query_log = /home/classifieds/query.log
pid_file = /home/classifieds/searchd.pid
}
I hope this will be useful for you and keeps you from fleeing the country!