Discussion:
Problems indexing many documents
Lucio Torrico
2008-04-14 14:14:04 UTC
Permalink
Hi!

I try with this:
require_once 'Zend/Search/Lucene.php';
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new
Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num());
$index = new Zend_Search_Lucene('myindex', true);

$indexSourceDir = 'dirofhtmlsource';
$dir = opendir($indexSourceDir);

while (($file = readdir($dir)) !== false) {
if (is_dir($indexSourceDir . '/' . $file)) {
continue;
};
$doc = Zend_Search_Lucene_Document_Html::loadHTMLFile($indexSourceDir
.'/'.$file,true); // I need parameter TRUE
$index->addDocument($doc);
flush();
}
closedir($dir);


It works fine with 50 documents.htm.
But It not work with 1500 douments (I need that or more).
Why ?

Thanks and regards
Lucio Torrico
Eric Marden
2008-04-14 17:46:56 UTC
Permalink
What does your server infrastructure look like? This operation is slow on windows (due to File I/O bottlenecks in the O/S) and is probably just tripping your memory/execution time limits, which if you can, you can adjust in php.ini

- Eric Marden


-----Original Message-----
From: Lucio Torrico [mailto:luciotorrico-***@public.gmane.org]
Sent: Mon 4/14/2008 10:14 AM
To: fw-formats-***@public.gmane.org
Subject: [fw-formats] Problems indexing many documents

Hi!

I try with this:
require_once 'Zend/Search/Lucene.php';
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new
Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num());
$index = new Zend_Search_Lucene('myindex', true);

$indexSourceDir = 'dirofhtmlsource';
$dir = opendir($indexSourceDir);

while (($file = readdir($dir)) !== false) {
if (is_dir($indexSourceDir . '/' . $file)) {
continue;
};
$doc = Zend_Search_Lucene_Document_Html::loadHTMLFile($indexSourceDir
.'/'.$file,true); // I need parameter TRUE
$index->addDocument($doc);
flush();
}
closedir($dir);


It works fine with 50 documents.htm.
But It not work with 1500 douments (I need that or more).
Why ?

Thanks and regards
Lucio Torrico

Continue reading on narkive:
Loading...