public class MonographParser extends AbstractParser
analyzer, cntManager| Constructor and Description |
|---|
MonographParser()
TODO some documentation...
|
| Modifier and Type | Method and Description |
|---|---|
Document |
createTrainingFromPDF(java.io.File inputFile,
java.lang.String pathRaw,
java.lang.String pathTEI,
int id)
Process the specified pdf and format the result as training data for the monograph model.
|
java.lang.String |
getAllBlocksFeatured(Document doc)
Addition of the features at block level for the complete document.
|
java.lang.String |
getAllLinesFeatured(Document doc)
Addition of the features at line level for the complete document.
|
Document |
prepareDocument(Document doc) |
Document |
processing(DocumentSource documentSource,
GrobidAnalysisConfig config)
Segment a PDF document into high level subdocuments.
|
close, label, labelpublic Document processing(DocumentSource documentSource, GrobidAnalysisConfig config)
documentSource - document sourcepublic java.lang.String getAllLinesFeatured(Document doc)
public java.lang.String getAllBlocksFeatured(Document doc)
public Document createTrainingFromPDF(java.io.File inputFile, java.lang.String pathRaw, java.lang.String pathTEI, int id)
inputFile - input PDF filepathFullText - path to raw monograph featured sequencepathTEI - path to TEIid - id