Thursday, September 25, 2008

MySQL full-text search optimization

A while ago we where developing a classified ads site for one of our most important clients. In this site, every ad belongs to a region (city, country) and a category (e.g. vehicles>buy/sell>cars).

When the visitor accesses the homepage, the region is determined from their IP address so only the ads for that region are displayed (the visitor can change the region) and every search is performed only for the ads on that region as well.

At first everything was working great, but when some regions started to have more than 500.000 ads, that’s when we started to get nervous. Every search on these regions took more than 30 seconds, which is obviously unacceptable. We tried everything: create a full-text field on the ads table, optimized the database queries, but could not bring the search time down to a reasonable amount. At this point I was considering fleeing the country, changing my name and closing all my email accounts so that our client couldn’t find me, and start a new life selling bananas in Brazil. I’m glad I didn’t do it (I think).

After a lot of research over the internet, we came across a miracle. That miracle is called Sphinx Search (http://www.sphinxsearch.com). This is an open-source solution for all your MySQL full-text search problems.

How does it work? You simply install the software on your server, create a configuration file for the table you want to perform the searches on and run the indexing process. This will create a dictionary file that allows you to perform searches in less than a second, on huge tables. You can also provide filters on any of the fields of the table as well as perform sorting on the result set. This software also provides a PHP API so that you can access it from your PHP pages and process the result set.

An important side note: since now all searches are performed on the data dictionary (which is created by a process than can take several minutes, so it’s advisable to run it only once or twice a day) and users can be adding new rows to the original table, the data dictionary will not have all the current data.

How can you solve this issue? Easy, you also create a delta dictionary file with the new rows and schedule a process to rebuild it many times a day (depending on how often new rows are added to your table). This delta dictionary is also included on the same configuration file as the master dictionary, but using a different query.

How do you combine both dictionaries? You can have a process that runs at midnight rebuilding the master dictionary with the overall query (e.g. SELECT * FROM ad) and another that runs every fifteen minutes rebuilding the delta dictionary with the new rows added that day. When the process runs again at midnight to rebuild the master index, the data from the delta dictionary will be incorporated to the master index and the delta index will be reset.

Now every search on the site is working great and we are even using it to obtain all the ads from a specific category without a keyword, because it also takes less than a second. It also allows you to extend the search filtering by fields, grouping and sorting.

Here’s an example configuration file for my classified ads site:


source ads {
type = mysql

sql_host = localhost
sql_user = user
sql_pass = password
sql_db = myDb
sql_port = 3306

sql_query = \
SELECT \
id, title, descr, \
catid, UNIX_TIMESTAMP(hw_added) as hw_added, \
UNIX_TIMESTAMP(exp_date) as exp_date \
FROM \
ad;

sql_attr_str2ordinal = v_title
sql_attr_uint = catid
sql_attr_timestamp = hw_added
sql_attr_timestamp = exp_date

sql_query_info = SELECT link_id FROM ec4_ad WHERE link_id=$id
}

source delta : ads
{
SELECT \
id, title, descr, \
catid, UNIX_TIMESTAMP(hw_added) as hw_added, \
UNIX_TIMESTAMP(exp_date) as exp_date \
FROM \
ad \
WHERE (TO_DAYS(hw_added) = TO_DAYS(NOW()) ;
}

index ads {
source = ads
path = /home/classifieds/sphinx/main/
# wordforms = /home/classifieds/wordforms.txt
# morphology = stem_en
min_word_len = 3
min_prefix_len = 0
min_infix_len = 3
}

index delta : ads {
source = delta
path = /home/classifieds/sphinx/delta/
}

indexer {
mem_limit = 256M
}

searchd {
port = 3312
log = /home/classifieds/searchd.log
query_log = /home/classifieds/query.log
pid_file = /home/classifieds/searchd.pid
}


I hope this will be useful for you and keeps you from fleeing the country!

Wednesday, September 10, 2008

Combining PHP and HTML pages

A while ago, one of my clients gave me the source code for one of his websites for me to make some adjustments. The first thing that shocked me when I was going through this code, was the fact that all of the HTML code was inside the PHP pages.

This is just awful. It’s almost like having a toilet inside of your office (although many of you would like that). PHP pages must only contain the logic of your website and HTML pages must be used for the GUI (Graphical user interface).

There is an easy way to combine PHP and HTML into a single output: Smarty (http://www.smarty.net). Smarty is a template engine which allows you to parse a HTML page from a PHP page. You don’t need to install anything. Just download the PHP code and include it on your project.

Here’s a quick example:

index.php
 
<?php

// Smarty initialization
require_once('Smarty.class.php');
$smarty = new Smarty();

$smarty->display(‘index.html’);

?>

index.html

<html>
<head>
<title>Smarty test</title>
</head>
<body>
This is a test!
</body>
</html>

Of course, there are many other many advanced uses for this tool which you can read about from the Smarty Manual. Here are some examples:

Example 1: Variables

Let’s say you want the title of your page to be a variable which you can change from a PHP script. This is the way to do it using Smarty:

index.php

<?php

// Smarty initialization
require_once('Smarty.class.php');
$smarty = new Smarty();

$smarty->assign(“title”, “My title”);
$smarty->display(‘index.html’);

?>

index.html

<html>
<head>
<title>{title}</title>
</head>
<body>
This is a test!
</body>
</html>

You can also assign arrays or objects and use them on you HTML page as well.

Example 2: Loop sentences

Imagine you want to display a list of all your friends, which are stored in an array. This is how you could do that with Smarty:

index.php

<?php

// Smarty initialization
require_once('Smarty.class.php');
$smarty = new Smarty();

$myFriends = array(“Mike”, “Paul”, “Peter”, “John”);
$smarty->assign(“myFriends”, $myFriends);
$smarty->display(‘index.html’);

?>
index.html

<html>
<head>
<title>My Friends</title>
</head>
<body>
My friends are:
<ul>
{foreach from=$myFriends item=friend}
<li>{$friend}</li>
{/foreach}
</ul>
</body>
</html>
I hope this brief tutorial was useful for you. If you have any questions, please leave a comment and I’ll try to reply as soon as possible.