Ralf Eggert
2008-01-02 15:40:54 UTC
Hi,
I would like to start using Zend_Search_Lucene for the website search
engine of our travel community. We have a couple of different areas
which need to be searchable: forum, articles, gallery, destinations, and
maybe members. It should be possible to just search in one area, for
example the forum or the destinations. But it should also be possible to
search in all areas in one step.
Another requirement is to limit the results based on the current
selected destination. For example, if the user is searching in the
Greece forum, the index should be searched for all documents from type
forum and for all destinations in Greece. The destination data structure
is kept as a binary tree in a MySQL database.
The last important requirement is to limit the search results for
pagination, so on page 1 I only want to show results 1 to 10, on page 2
the results 11 to 20, and so on.
My first idea to solve this is to add two additional fields to each
document:
a) field 'area' which can only have one of the values 'forum',
'article', 'gallery', 'destination' or 'member'
b) field 'destination' which will be filled with a string that combines
the destination hierarchie of the destination primary keys, e.g.
'Athens' has key 322, 'Greece' has key 44 and 'South Europe' key 40,
so for an article about 'Athens' this field would be filled with the
value '40-44-322-'. If I want to search for all 'Greece' articles I
will search for '40-44-*'
Besides these fields I want to add some further fields.
c) the document text to be indexed and searched
d) the document url
e) the page title
f) the date of the document (last changed)
Finally, here are my questions:
1) Does my approach for the limitation of the area and the destination
fields make sense or did I overlook something?
2) I am not quite sure which field types I should use for the six
fields mentioned above. Any suggestions?
3) Does it make sense to create one index for each area to improve
performance? If yes, I might forget about the all-area search
facility.
4) It might be a slight overhead to use Zend_Search_Lucene to search
for a destination which basically only consists of the destination
name. So using a simple search directly in the MySQL database for
this area might be faster and would not need any indexing. What do
others think about this?
5) The documentaion shows a way to limit the total amount of results.
But I did not find a way how to set an offset to limit the results
for pagination. Do I really need to fetch all results and the handle
the pagination in my controller, which would mean that each request
will return all results?
Thanks for your comments and help.
Best Regards,
Ralf
I would like to start using Zend_Search_Lucene for the website search
engine of our travel community. We have a couple of different areas
which need to be searchable: forum, articles, gallery, destinations, and
maybe members. It should be possible to just search in one area, for
example the forum or the destinations. But it should also be possible to
search in all areas in one step.
Another requirement is to limit the results based on the current
selected destination. For example, if the user is searching in the
Greece forum, the index should be searched for all documents from type
forum and for all destinations in Greece. The destination data structure
is kept as a binary tree in a MySQL database.
The last important requirement is to limit the search results for
pagination, so on page 1 I only want to show results 1 to 10, on page 2
the results 11 to 20, and so on.
My first idea to solve this is to add two additional fields to each
document:
a) field 'area' which can only have one of the values 'forum',
'article', 'gallery', 'destination' or 'member'
b) field 'destination' which will be filled with a string that combines
the destination hierarchie of the destination primary keys, e.g.
'Athens' has key 322, 'Greece' has key 44 and 'South Europe' key 40,
so for an article about 'Athens' this field would be filled with the
value '40-44-322-'. If I want to search for all 'Greece' articles I
will search for '40-44-*'
Besides these fields I want to add some further fields.
c) the document text to be indexed and searched
d) the document url
e) the page title
f) the date of the document (last changed)
Finally, here are my questions:
1) Does my approach for the limitation of the area and the destination
fields make sense or did I overlook something?
2) I am not quite sure which field types I should use for the six
fields mentioned above. Any suggestions?
3) Does it make sense to create one index for each area to improve
performance? If yes, I might forget about the all-area search
facility.
4) It might be a slight overhead to use Zend_Search_Lucene to search
for a destination which basically only consists of the destination
name. So using a simple search directly in the MySQL database for
this area might be faster and would not need any indexing. What do
others think about this?
5) The documentaion shows a way to limit the total amount of results.
But I did not find a way how to set an offset to limit the results
for pagination. Do I really need to fetch all results and the handle
the pagination in my controller, which would mean that each request
will return all results?
Thanks for your comments and help.
Best Regards,
Ralf