Discussion:
Leftover .sti files in the search index
Marin Krkač
2007-06-20 08:31:35 UTC
Permalink
I'm using CLucene to optimize my search index. Optimization is done
after each update and updating is done in batches of up to several
hundred documents. If a document is updated (i.e., if it needs to be
removed from the index and re-indexed), I'm also using CLucene to remove
it from the search index prior to re-indexing it with Zend_Search_Lucene.

If I empty the search index and re-index everything, there is only one
big .cfs file in the search index. Later on, as documents are getting
updated (removed and re-indexed), my index just keeps growing. If I look
inside it, I see dozens of smaller .sti files in addition to one big
.cfs file.

I would appreciate any help in figuring out what's causing this (it
might be CLucene rather than Zend_Search_Lucene).
Alexander Veremyev
2007-06-20 22:22:11 UTC
Permalink
Hi Marin,

You can remove these .sti files.
These files are not part of Lucene index format and used for increasing
search speed.

This files are auto-generated for the case if index is prepared or
optimized with another Lucene implementation.


PS I already have an idea to add directory clean up functionality...

With best regards,
Alexander Veremyev.
Post by Marin Krkač
I'm using CLucene to optimize my search index. Optimization is done
after each update and updating is done in batches of up to several
hundred documents. If a document is updated (i.e., if it needs to be
removed from the index and re-indexed), I'm also using CLucene to remove
it from the search index prior to re-indexing it with Zend_Search_Lucene.
If I empty the search index and re-index everything, there is only one
big .cfs file in the search index. Later on, as documents are getting
updated (removed and re-indexed), my index just keeps growing. If I look
inside it, I see dozens of smaller .sti files in addition to one big
.cfs file.
I would appreciate any help in figuring out what's causing this (it
might be CLucene rather than Zend_Search_Lucene).
Marin Krkač
2007-06-21 21:35:48 UTC
Permalink
Post by Alexander Veremyev
You can remove these .sti files.
These files are not part of Lucene index format and used for increasing
search speed.
This files are auto-generated for the case if index is prepared or
optimized with another Lucene implementation.
Could you please clarify it some more, I'm not sure if I understood it
correctly.

If an index is created or optimized using another Lucene implementation,
Zend_Search_Lucene creates these files to speed up *its* search over
such an index.

Does it mean that CLucene (or some other implementation) and
Zend_Search_Lucene are not fully compatible?

If I'm using Zend_Search_Lucene to search the index and CLucene to
optimize it, won't Zend_Search_Lucene be slower if I remove these files?

I have many .sti files. One of them has the same name as the .cfs file.
Should I at least keep that one (I'm assuming that it might still be
useful and that the other ones are outdated)?
Alexander Veremyev
2007-06-22 13:14:56 UTC
Permalink
Post by Marin Krkač
Post by Alexander Veremyev
You can remove these .sti files.
These files are not part of Lucene index format and used for
increasing search speed.
This files are auto-generated for the case if index is prepared or
optimized with another Lucene implementation.
Could you please clarify it some more, I'm not sure if I understood it
correctly.
If an index is created or optimized using another Lucene implementation,
Zend_Search_Lucene creates these files to speed up *its* search over
such an index.
Yes. These files are used by Zend_Search_Lucene to increase speed of
terms dictionary index loading. It's actually loaded dictionary index
dumped with serialize() method.

If these files don't exist Zend_Search_Lucene loads info from .cfs files
and creates .sti files for next search request.

It increases overall index size, but reduces search time.
Post by Marin Krkač
Does it mean that CLucene (or some other implementation) and
Zend_Search_Lucene are not fully compatible?
No.
Index folder is used as storage container for these files which are in
principle not a part of the index. It's some kind of cache for
preprocessed data.
If Zend_Search_Lucene can't find these files in index directory it
generates them to be fully compatible with other Lucene implementations.
Post by Marin Krkač
If I'm using Zend_Search_Lucene to search the index and CLucene to
optimize it, won't Zend_Search_Lucene be slower if I remove these files?
No. Necessary files wil be generated at first find request.
Post by Marin Krkač
I have many .sti files. One of them has the same name as the .cfs file.
Should I at least keep that one (I'm assuming that it might still be
useful and that the other ones are outdated)?
Yes .sti files which have corresponding .cfs file are still used by
Zend_Search_Lucene. So you can keep them.
But you can also simply remove all .sti files. Zend_Search_Lucene will
automatically generate necessary .sti files at next find request.


With best regards,
Alexander Veremyev.
Marin Krkač
2007-06-22 22:00:48 UTC
Permalink
Thank you for the clarification.

Loading...