Package org.carrot2.text.preprocessing
Class DocumentAssigner
java.lang.Object
org.carrot2.attrs.AttrComposite
org.carrot2.text.preprocessing.DocumentAssigner
- All Implemented Interfaces:
AcceptingVisitor
Assigns document to label candidates. For each label candidate from
PreprocessingContext.AllLabels.featureIndex
an BitSet
with the assigned documents is constructed. The
assignment algorithm is rather simple: in order to be assigned to a label, a document must
contain at least one occurrence of each non-stop word from the label.
This class saves the following results to the PreprocessingContext
:
This class requires that InputTokenizer
, CaseNormalizer
, StopListMarker
, PhraseExtractor
and LabelFilterProcessor
be invoked first.
-
Field Summary
FieldsModifier and TypeFieldDescriptionOnly exact phrase assignments.Minimum required number of documents in each cluster.Fields inherited from class org.carrot2.attrs.AttrComposite
attributes
-
Constructor Summary
Constructors -
Method Summary
Methods inherited from class org.carrot2.attrs.AttrComposite
accept
-
Field Details
-
exactPhraseAssignment
Only exact phrase assignments. When set totrue
, clusters will contain only the documents that contain the cluster's label in its original form, including the order of words. Enabling this option will cause fewer documents to be put in clusters, increasing the precision of assignment, but also increasing the "Other Topics" group. Disabling this option will cause more documents to be put in clusters, which will make the "Other Topics" cluster smaller, but also lower the precision of cluster-document assignments. -
minClusterSize
Minimum required number of documents in each cluster. Clusters containing fewer documents will not be created.
-
-
Constructor Details
-
DocumentAssigner
public DocumentAssigner()
-