Maxim Savenko
2008-07-24 11:48:14 UTC
Hi everybody,
I have a problem with searching russian strings, utf8 encoded, with
Zend_Search_Lucene. Here is my short sample code:
<?php
require_once 'ZendInit.php';
require_once 'Zend/Search/Lucene.php';
require_once 'Zend/Search/Lucene/Document.php';
// Create index
$index = Zend_Search_Lucene::create('data/index');
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Text('samplefield', '§â§å§ã§ã§Ü§Ú§Û §ä§Ö§Ü§ã§ä;
english text', 'utf-8'));
$index->addDocument($doc);
$index->commit();
// Open index and search:
$index = Zend_Search_Lucene::open('data/index');
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene::setDefaultSearchField('samplefield');
// Query the index:
$queryStr = 'english';
$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');
$hits = $index->find($query);
foreach ($hits as $hit) {
/*@var $hit Zend_Search_Lucene*/
$doc = $hit->getDocument();
echo $doc->getField('samplefield')->value, PHP_EOL;
}
The 'samplefield' of the document contain string in too languages šC russian
and english(see code). If we'll search 'english' it's all fine - we
successfully find the document, but if we'll try to find russian part of
field( set $queryStr to '§â§å§ã§ã§Ü§Ú§Û') then we don't find any document.
What is a problem with my code? Help me find solution...
Thank you guys
Maxim Savenko ***@gmail.com
I have a problem with searching russian strings, utf8 encoded, with
Zend_Search_Lucene. Here is my short sample code:
<?php
require_once 'ZendInit.php';
require_once 'Zend/Search/Lucene.php';
require_once 'Zend/Search/Lucene/Document.php';
// Create index
$index = Zend_Search_Lucene::create('data/index');
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Text('samplefield', '§â§å§ã§ã§Ü§Ú§Û §ä§Ö§Ü§ã§ä;
english text', 'utf-8'));
$index->addDocument($doc);
$index->commit();
// Open index and search:
$index = Zend_Search_Lucene::open('data/index');
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene::setDefaultSearchField('samplefield');
// Query the index:
$queryStr = 'english';
$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');
$hits = $index->find($query);
foreach ($hits as $hit) {
/*@var $hit Zend_Search_Lucene*/
$doc = $hit->getDocument();
echo $doc->getField('samplefield')->value, PHP_EOL;
}
The 'samplefield' of the document contain string in too languages šC russian
and english(see code). If we'll search 'english' it's all fine - we
successfully find the document, but if we'll try to find russian part of
field( set $queryStr to '§â§å§ã§ã§Ü§Ú§Û') then we don't find any document.
What is a problem with my code? Help me find solution...
Thank you guys
Maxim Savenko ***@gmail.com