public class MonographParser extends AbstractParser
analyzer, cntManager
Constructor and Description |
---|
MonographParser()
TODO some documentation...
|
Modifier and Type | Method and Description |
---|---|
Document |
createTrainingFromPDF(java.io.File inputFile,
java.lang.String pathRaw,
java.lang.String pathTEI,
int id)
Process the specified pdf and format the result as training data for the monograph model.
|
java.lang.String |
getAllBlocksFeatured(Document doc)
Addition of the features at block level for the complete document.
|
java.lang.String |
getAllLinesFeatured(Document doc)
Addition of the features at line level for the complete document.
|
Document |
prepareDocument(Document doc) |
Document |
processing(DocumentSource documentSource,
GrobidAnalysisConfig config)
Segment a PDF document into high level subdocuments.
|
close, label, label
public Document processing(DocumentSource documentSource, GrobidAnalysisConfig config)
documentSource
- document sourcepublic java.lang.String getAllLinesFeatured(Document doc)
public java.lang.String getAllBlocksFeatured(Document doc)
public Document createTrainingFromPDF(java.io.File inputFile, java.lang.String pathRaw, java.lang.String pathTEI, int id)
inputFile
- input PDF filepathFullText
- path to raw monograph featured sequencepathTEI
- path to TEIid
- id