public class Segmentation extends AbstractParser
analyzer, cntManager
Constructor and Description |
---|
Segmentation()
TODO some documentation...
|
Modifier and Type | Method and Description |
---|---|
void |
close() |
void |
createBlankTrainingData(java.io.File file,
java.lang.String pathFullText,
java.lang.String pathTEI,
int id)
Get the content of the pdf and produce a blank training data TEI file, i.e.
|
void |
createTrainingSegmentation(java.lang.String inputFile,
java.lang.String pathFullText,
java.lang.String pathTEI,
int id)
Process the content of the specified pdf and format the result as training data.
|
java.lang.String |
getAllLinesFeatured(Document doc)
Addition of the features at line level for the complete document.
|
Document |
prepareDocument(Document doc) |
Document |
processing(DocumentSource documentSource,
GrobidAnalysisConfig config)
Segment a PDF document into high level zones: cover page, document header,
page footer, page header, body, page numbers, biblio section and annexes.
|
Document |
processing(java.lang.String text) |
java.lang.StringBuffer |
trainingExtraction(java.lang.String result,
java.util.List<LayoutToken> tokenizations,
Document doc)
Extract results from a labelled full text in the training format without any string modification.
|
label, label
public Document processing(DocumentSource documentSource, GrobidAnalysisConfig config)
documentSource
- document sourcepublic Document processing(java.lang.String text)
public java.lang.String getAllLinesFeatured(Document doc)
public void createTrainingSegmentation(java.lang.String inputFile, java.lang.String pathFullText, java.lang.String pathTEI, int id)
inputFile
- input filepathFullText
- path to fulltextpathTEI
- path to TEIid
- idpublic void createBlankTrainingData(java.io.File file, java.lang.String pathFullText, java.lang.String pathTEI, int id)
inputFile
- input filepathFullText
- path to fulltextpathTEI
- path to TEIid
- idpublic java.lang.StringBuffer trainingExtraction(java.lang.String result, java.util.List<LayoutToken> tokenizations, Document doc)
result
- reulttokenizations
- tokspublic void close() throws java.io.IOException
close
in interface java.io.Closeable
close
in interface java.lang.AutoCloseable
close
in class AbstractParser
java.io.IOException