Package org.carrot2.text.preprocessing
Class PreprocessedDocumentScanner
java.lang.Object
org.carrot2.text.preprocessing.PreprocessedDocumentScanner
Iterates over tokenized documents in
PreprocessingContext
.-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final com.carrotsearch.hppc.predicates.ShortPredicate
Predicate for splitting on document separator.static final com.carrotsearch.hppc.predicates.ShortPredicate
Predicate for splitting on field separator.static final com.carrotsearch.hppc.predicates.ShortPredicate
Predicate for splitting on sentence separator. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected void
document
(PreprocessingContext context, int start, int length) Invoked for each document.static final com.carrotsearch.hppc.predicates.ShortPredicate
equalTo
(short t) Return a newShortPredicate
returningtrue
if the argument equals a given value.protected void
field
(PreprocessingContext context, int start, int length) Invoked for each document's field.final void
iterate
(PreprocessingContext context) Iterate over all documents, fields and sentences inPreprocessingContext.allTokens
.protected void
sentence
(PreprocessingContext context, int start, int length) Invoked for each document's sentence.
-
Field Details
-
ON_DOCUMENT_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_DOCUMENT_SEPARATORPredicate for splitting on document separator. -
ON_FIELD_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_FIELD_SEPARATORPredicate for splitting on field separator. -
ON_SENTENCE_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_SENTENCE_SEPARATORPredicate for splitting on sentence separator.
-
-
Constructor Details
-
PreprocessedDocumentScanner
public PreprocessedDocumentScanner()
-
-
Method Details
-
equalTo
public static final com.carrotsearch.hppc.predicates.ShortPredicate equalTo(short t) Return a newShortPredicate
returningtrue
if the argument equals a given value. -
iterate
Iterate over all documents, fields and sentences inPreprocessingContext.allTokens
. -
document
Invoked for each document. Splits further into fields. -
field
Invoked for each document's field. Splits further into sentences. -
sentence
Invoked for each document's sentence.
-