Discussion:
8EB-F5F Zend_Lucene + UTF8 search problem... Help!
Maxim Savenko
2008-07-24 12:12:37 UTC
Permalink
8EB-F5F

Hi everybody,

I have a problem with searching russian strings, utf8 encoded, with
Zend_Search_Lucene. Here is my short sample code:

-----------------code---------------
<?php
require_once 'ZendInit.php';
require_once 'Zend/Search/Lucene.php';
require_once 'Zend/Search/Lucene/Document.php';

// Create index
$index = Zend_Search_Lucene::create('data/index');
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Text('samplefield', '���������ܧڧ�
���֧ܧ���; english text', 'utf-8'));
$index->addDocument($doc);
$index->commit();

// Open index and search:
$index = Zend_Search_Lucene::open('data/index');
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene::setDefaultSearchField('samplefield');

// Query the index:
$queryStr = 'english';
$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');
$hits = $index->find($query);
foreach ($hits as $hit) {
/*@var $hit Zend_Search_Lucene*/
$doc = $hit->getDocument();
echo $doc->getField('samplefield')->value, PHP_EOL;
}
-----------------code---------------

The 'samplefield' of the document contain string in too languages �C
russian and english(see code). If we'll search 'english' it's all fine
- we successfully find the document, but if we'll try to find russian
part of field( set $queryStr to '���������ܧڧ�') then we don't find any
document.

What is a problem with my code? Help me find solution...

Thank you guys

Ma
Maxim Savenko
2008-07-24 12:17:42 UTC
Permalink
Hi everybody,

I have a problem with searching russian strings, utf8 encoded, with
Zend_Search_Lucene. Here is my short sample code:

require_once 'ZendInit.php';
require_once 'Zend/Search/Lucene.php';
require_once 'Zend/Search/Lucene/Document.php';

// Create index
$index = Zend_Search_Lucene::create('data/index');
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Text('samplefield', '���������ܧڧ�
���֧ܧ���; english text', 'utf-8'));
$index->addDocument($doc);
$index->commit();

// Open index and search:
$index = Zend_Search_Lucene::open('data/index');
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene::setDefaultSearchField('samplefield');

// Query the index:
$queryStr = 'english';
$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');
$hits = $index->find($query);
foreach ($hits as $hit) {
/*@var $hit Zend_Search_Lucene*/
$doc = $hit->getDocument();
echo $doc->getField('samplefield')->value, PHP_EOL;
}

The 'samplefield' of the document contain string in too languages �C
russian and english(see code). If we'll search 'english' it's all fine
- we successfully find the document, but if we'll try to find russian
part of field( set $queryStr to '���������ܧڧ�') then we don't find any
document.

What is a problem with my code? Help me find soluti
Wil Sinclair
2008-07-24 18:38:35 UTC
Permalink
Post by Maxim Savenko
8EB-F5F
I think this code is to get through one user's spam filter. I recommend
mailing the listee directly and not using the codes in mail to the list.
I'm sure these codes could be pretty confusing for those reading the
messages in the archive. :)

,Wil

Loading...