Swank Wiki
Recently Visited

Swank v0.04.04

Swank::Lucene

All indexing and searching capabilites are provided by the Lucene class.

The default location for the lucene index files is just outside the document root.  For example, if the document root is "/var/www/swank/root", the lucene index files will be stored in "/var/www/swank/lucene/root".

Special Features:

Any page with a "noindex" field set to true will not be indexed.

Link href values are indexed in a "_link" field, so links to a specific page can be found by doing $sys->search(' _link:"/my/page" ').

Code is indexed in a "_code" field, so you can search for pages containing certain code strings.

Dates are detected by the iso format, and are indexed by day only (without the time) for effeciency.

TODO: there are plans to support per-field indexing options (private, unindexed, keyword, etc), but that depends on field-level meta data being implemented. 

Also, lucene has no syntax for searching in non-parsed keyword fields.  If you really need a specific value, without additional data in front or behind, you must check the values returned by the search to be sure.

Advanced and/or dubious searches

See the Java documentation for the full query syntax documentation.

Date search:   date:[YYYYMMDD YYYYMMDD]

All records with "field" defined (not sure if/why this works):  field:[0 TO 0]

All records without "field" defined, by getting all pages and removing those with fields (all pages have a path field):  path:[0 TO 0] AND NOT field:[0 TO 0]

Field defined but empty???

Requires:

Swank::Lucene is based on the perl Lucene module, which uses clucene-0.9.20, which is compatible with Java Lucene 1.9.1

Lucene perl module.

clucene-core-0.9.20

Provides:

search( 'search string', [ options ... ] )

Provides the search function for the Swank system. Returns a Swank::Lucene::Results object.

'search string' is a lucene search string.  Syntax summary:

word  -- does a full text search for word in any field

field:word  -- does a search for word in the given field only

word AND word  -- does a boolean search. AND, OR, and NOT must be upper case

"a phrase" / field:"a phrase"  -- searches for an exact phrase

[begin end] / field:[begin end]  -- does a range search

options may be:

sort => 'fieldname [desc], fieldname ...'  -- sorts results by the given field name. The default is to sort by RELEVANCE.

sort => \&sub  -- a sort subroutine may also be given, subject to the restrictions for sort subroutines being passed to other perl classes.  This means $a and $b will not work; use this syntax instead:

                     sub ($$) { $_[0] cmp $_[1] } 

refresh => 1 -- will close and reopen the internal lucene objects.

lucene()  -- to access the Swank::Lucene::Lucene helper class.

Overrides:

write()  -- indexes page objects after they are written.

delete()  -- de-indexes deleted objects.

Support pages:

/search

/searchbox

Swank::Lucene::Lucene

This helper class does the actual searching and indexing.  It is usually not be necessary to access it directory for any reason.

API:

index()  -- called by Swank::Lucene::write to index a page.

delete() -- called by Swank::Lucene::delete to un-index a page.

search() -- called by Swank::Lucene::search to do searches.

reindex() -- clears the index and reindexes all pages returned by $sys->storage->_enumerate

optimize() -- optimizes the lucene index.

Swank::Lucene::Results

Encapsulates the results from a lucene search.

API:

length() -- number of hits returned by the search.

next() -- returns the page object for the next hit in the search.

get( index )  -- returns a specific hit in the search, numbered 0 .. length()-1

all() -- returns a list of page objects for all hits from the search.