org.exist.indexing.impl
Class NGramIndexWorker

java.lang.Object
  extended by org.exist.indexing.impl.NGramIndexWorker
All Implemented Interfaces:
IndexWorker

public class NGramIndexWorker
extends Object
implements IndexWorker

Each index entry maps a key (collectionId, ngram) to a list of occurrences, which has the following structure:

[docId : int, nameType: byte, occurrenceCount: int, entrySize: long, [id: NodeId, offset: int, ...]* ]


Nested Class Summary
 class NGramIndexWorker.NGramMatch
           
 
Constructor Summary
NGramIndexWorker(NGramIndex index)
           
 
Method Summary
 Object configure(IndexController controller, NodeList configNodes, Map namespaces)
          Read an index configuration from an collection.xconf configuration document.
 void flush()
          Flush the index.
 String[] getDistinctNGrams(CharSequence text)
           
 Index getIndex()
           
 String getIndexId()
          Returns an ID which uniquely identifies this index.
 String getIndexName()
          Returns an name which uniquely identifies this index.
 StreamListener getListener(int mode, DocumentImpl document)
          Return a stream listener to index the specified document in the specified mode.
 MatchListener getMatchListener(NodeProxy proxy)
          Returns a MatchListener, which can be used to filter (and manipulate) the XML output generated by the serializer when serializing query results.
 int getN()
           
 StoredNode getReindexRoot(StoredNode node, NodePath path, boolean includeSelf)
          When adding or removing nodes to or from the document tree, it might become necessary to reindex some parts of the tree, in particular if indexes are defined on mixed content nodes.
 void removeCollection(Collection collection)
          Remove all indexes for the given collection, its subcollections and all resources..
 Occurrences[] scanIndex(DocumentSet docs)
           
 NodeSet search(int contextId, DocumentSet docs, List qnames, String ngram, XQueryContext context, NodeSet contextSet, int axis)
           
 void setDocument(DocumentImpl document, int newMode)
          Notify this worker to operate on the specified document, using the mode given.
 String[] tokenize(CharSequence text)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

NGramIndexWorker

public NGramIndexWorker(NGramIndex index)
Method Detail

getIndexId

public String getIndexId()
Description copied from interface: IndexWorker
Returns an ID which uniquely identifies this index. This will usually be the class name.

Specified by:
getIndexId in interface IndexWorker
Returns:
a unique ID identifying this index.

getIndexName

public String getIndexName()
Description copied from interface: IndexWorker
Returns an name which uniquely identifies this index.

Specified by:
getIndexName in interface IndexWorker
Returns:
a unique name identifying this index.

getIndex

public Index getIndex()

getN

public int getN()

configure

public Object configure(IndexController controller,
                        NodeList configNodes,
                        Map namespaces)
                 throws DatabaseConfigurationException
Description copied from interface: IndexWorker
Read an index configuration from an collection.xconf configuration document. This method is called by the CollectionConfiguration while reading the collection.xconf configuration file for a given collection. The configNodes parameter lists all top-level child nodes below the <index> element in the collection.xconf. The IndexWorker should scan this list and handle those elements it understands. The returned Object will be stored in the collection configuration structure associated with each collection. It can later be retrieved from the collection configuration, e.g. to check if a given node should be indexed or not.

Specified by:
configure in interface IndexWorker
configNodes - lists the top-level child nodes below the <index> element in collection.xconf
namespaces - the active prefix/namespace map
Returns:
an arbitrary configuration object to be kept for this index in the collection configuration
Throws:
DatabaseConfigurationException - if a configuration error occurs

flush

public void flush()
Description copied from interface: IndexWorker
Flush the index. This method will be called when indexing a document. The implementation should immediately process all data it has buffered (if there is any), release as many memory resources as it can and prepare for being reused for a different job.

Specified by:
flush in interface IndexWorker

removeCollection

public void removeCollection(Collection collection)
Description copied from interface: IndexWorker
Remove all indexes for the given collection, its subcollections and all resources..

Specified by:
removeCollection in interface IndexWorker

search

public NodeSet search(int contextId,
                      DocumentSet docs,
                      List qnames,
                      String ngram,
                      XQueryContext context,
                      NodeSet contextSet,
                      int axis)
               throws TerminatedException
Throws:
TerminatedException

scanIndex

public Occurrences[] scanIndex(DocumentSet docs)
Specified by:
scanIndex in interface IndexWorker

getListener

public StreamListener getListener(int mode,
                                  DocumentImpl document)
Description copied from interface: IndexWorker
Return a stream listener to index the specified document in the specified mode. There will never be more than one StreamListener being used per thread, so it is safe for the implementation to reuse a single StreamListener. Parameter mode specifies the type of the current operation.

Specified by:
getListener in interface IndexWorker
Parameters:
mode - one of StreamListener.STORE, StreamListener.REMOVE_NODES or StreamListener.REMOVE_ALL_NODES.
document - the document to be indexed.
Returns:
a StreamListener

getMatchListener

public MatchListener getMatchListener(NodeProxy proxy)
Description copied from interface: IndexWorker
Returns a MatchListener, which can be used to filter (and manipulate) the XML output generated by the serializer when serializing query results. The method should return null if the implementation is not interested in receiving serialization events.

Specified by:
getMatchListener in interface IndexWorker
Parameters:
proxy - the NodeProxy which is being serialized
Returns:
a MatchListener or null if the implementation does not want to receive serialization events

getReindexRoot

public StoredNode getReindexRoot(StoredNode node,
                                 NodePath path,
                                 boolean includeSelf)
Description copied from interface: IndexWorker
When adding or removing nodes to or from the document tree, it might become necessary to reindex some parts of the tree, in particular if indexes are defined on mixed content nodes. This method will call IndexWorker.getReindexRoot(org.exist.dom.StoredNode, org.exist.storage.NodePath, boolean) on each configured index. It will then return the top-most root.

Specified by:
getReindexRoot in interface IndexWorker
Parameters:
node - the node to be modified.
path - path the NodePath of the node
includeSelf - if set to true, the current node itself will be included in the check
Returns:
the top-most root node to be reindexed

tokenize

public String[] tokenize(CharSequence text)

getDistinctNGrams

public String[] getDistinctNGrams(CharSequence text)

setDocument

public void setDocument(DocumentImpl document,
                        int newMode)
Description copied from interface: IndexWorker
Notify this worker to operate on the specified document, using the mode given. mode will be one of StreamListener.STORE, StreamListener.REMOVE_NODES or StreamListener.REMOVE_ALL_NODES.

Specified by:
setDocument in interface IndexWorker
Parameters:
document - the document which is processed
newMode - the current operation mode


Copyright (C) Wolfgang Meier. All rights reserved.