public class Segmentation extends AbstractParser
analyzer, cntManager| Constructor and Description |
|---|
Segmentation()
TODO some documentation...
|
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
void |
createBlankTrainingData(java.io.File file,
java.lang.String pathFullText,
java.lang.String pathTEI,
int id)
Get the content of the pdf and produce a blank training data TEI file, i.e.
|
void |
createTrainingSegmentation(java.lang.String inputFile,
java.lang.String pathFullText,
java.lang.String pathTEI,
int id)
Process the content of the specified pdf and format the result as training data.
|
java.lang.String |
getAllLinesFeatured(Document doc)
Addition of the features at line level for the complete document.
|
Document |
prepareDocument(Document doc) |
Document |
processing(DocumentSource documentSource,
GrobidAnalysisConfig config)
Segment a PDF document into high level zones: cover page, document header,
page footer, page header, body, page numbers, biblio section and annexes.
|
Document |
processing(java.lang.String text) |
java.lang.StringBuffer |
trainingExtraction(java.lang.String result,
java.util.List<LayoutToken> tokenizations,
Document doc)
Extract results from a labelled full text in the training format without any string modification.
|
label, labelpublic Document processing(DocumentSource documentSource, GrobidAnalysisConfig config)
documentSource - document sourcepublic Document processing(java.lang.String text)
public java.lang.String getAllLinesFeatured(Document doc)
public void createTrainingSegmentation(java.lang.String inputFile,
java.lang.String pathFullText,
java.lang.String pathTEI,
int id)
inputFile - input filepathFullText - path to fulltextpathTEI - path to TEIid - idpublic void createBlankTrainingData(java.io.File file,
java.lang.String pathFullText,
java.lang.String pathTEI,
int id)
inputFile - input filepathFullText - path to fulltextpathTEI - path to TEIid - idpublic java.lang.StringBuffer trainingExtraction(java.lang.String result,
java.util.List<LayoutToken> tokenizations,
Document doc)
result - reulttokenizations - tokspublic void close()
throws java.io.IOException
close in interface java.io.Closeableclose in interface java.lang.AutoCloseableclose in class AbstractParserjava.io.IOException