Engine

java.lang.Object
- org.grobid.core.engines.Engine

All Implemented Interfaces:

java.io.Closeable, java.lang.AutoCloseable
```
public class Engine
extends java.lang.Object
implements java.io.Closeable
```
Class for managing the extraction of bibliographical information from PDF documents or raw text.

Constructor Summary

Constructors
Constructor and Description

Engine(boolean loadModels)
Constructor for the Grobid engine instance.

Constructors
Constructor and Description
`Engine(boolean loadModels)` Constructor for the Grobid engine instance.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`addAcceptedLanguages(java.lang.String lang)` Add a language to the list of accepted languages.
`java.lang.String`	`annotateAllCitationsInPDFPatent(java.lang.String pdfPath, int consolidateCitations, boolean includeRawCitations)` Extract and parse both patent and non patent references within a patent in PDF format.
`int`	`batchCreateTraining(java.lang.String directoryPath, java.lang.String resultPath, int ind)` Process all the PDF in a given directory with a segmentation process and produce the corresponding training data format files for manual correction.
`int`	`batchCreateTrainingBlank(java.lang.String directoryPath, java.lang.String resultPath, int ind)` Process all the PDF in a given directory with a pdf extraction and produce blank training data, i.e.
`int`	`batchCreateTrainingMonograph(java.lang.String directoryPath, java.lang.String resultPath, int ind)` Process all the PDF in a given directory with a monograph process and produce the corresponding training data format files for manual correction.
`int`	`batchCreateTrainingPatentcitations(java.lang.String directoryPath, java.lang.String resultPath)` Process all the XML patent documents in a given directory with a patent citation extraction and produce the corresponding training data format files for manual correction.
`void`	`close()`
`void`	`createTraining(java.io.File inputFile, java.lang.String pathRaw, java.lang.String pathTEI, int id)` Create training data for all models based on the application of the current full text model on a new PDF
`void`	`createTrainingBlank(java.io.File inputFile, java.lang.String pathRaw, java.lang.String pathTEI, int id)` Generate blank training data from provided directory of PDF documents, i.e.
`void`	`createTrainingMonograph(java.io.File inputFile, java.lang.String pathRaw, java.lang.String pathTEI, int id)` Create training data for the monograph model based on the application of the current monograph text model on a new PDF
`void`	`createTrainingPatentCitations(java.lang.String pathXML, java.lang.String resultPath)` Process an XML patent document with a patent citation extraction and produce the corresponding training data format files for manual correction.
`java.lang.String`	`downloadPDF(java.lang.String url, java.lang.String dirName, java.lang.String name)` Download a PDF file.
`java.util.List<ChemicalEntity>`	`extractChemicalEntities(java.lang.String text)` Extract chemical names from text.
`java.lang.String`	`fullTextToTEI(java.io.File inputFile, GrobidAnalysisConfig config)` //TODO: remove invalid JavaDoc once refactoring is done and tested (left for easier reference) Parse and convert the current article into TEI, this method performs the whole parsing and conversion process.
`Document`	`fullTextToTEIDoc(DocumentSource documentSource, GrobidAnalysisConfig config)`
`Document`	`fullTextToTEIDoc(java.io.File inputFile, GrobidAnalysisConfig config)`
`java.lang.String`	`getAbstract(Document doc)` Print the abstract content.
`java.util.List<java.lang.String>`	`getAcceptedLanguages()` Give the list of languages for which an extraction is allowed.
`static CntManager`	`getCntManager()`
`static Engine`	`getEngine(boolean preload)`
`EngineParsers`	`getParsers()`
`static java.lang.String`	`header2BibTeX(BiblioItem resHeader)` Get the BibTeX string corresponding to the recognized header text
`static java.lang.String`	`header2TEI(BiblioItem resHeader)` Get the TEI XML string corresponding to the recognized header text
`java.lang.String`	`printRefTitles(java.util.List<BibDataSet> resBib)` Return all the reference titles.
`java.util.List<Affiliation>`	`processAffiliation(java.lang.String addressBlock)` Parse a text block corresponding to an affiliation+address.
`java.util.List<java.util.List<Affiliation>>`	`processAffiliations(java.util.List<java.lang.String> addressBlocks)` Parse a list of text blocks corresponding to an affiliation+address.
`java.lang.String`	`processAllCitationsInPatent(java.lang.String text, java.util.List<BibDataSet> nplResults, java.util.List<PatentItem> patentResults, int consolidateCitations, boolean includeRawCitations)` Extract and parse both patent and non patent references within a patent text.
`java.lang.String`	`processAllCitationsInPDFPatent(java.lang.String pdfPath, java.util.List<BibDataSet> nplResults, java.util.List<PatentItem> patentResults, int consolidateCitations, boolean includeRawCitations)` Extract and parse both patent and non patent references within a patent in PDF format.
`java.lang.String`	`processAllCitationsInXMLPatent(java.lang.String xmlPath, java.util.List<BibDataSet> nplResults, java.util.List<PatentItem> patentResults, int consolidateCitations, boolean includeRawCitations)` Extract and parse both patent and non patent references within a patent in ST.36 format.
`java.util.List<Person>`	`processAuthorsCitation(java.lang.String authorSequence)` Parse a sequence of authors from a citation, i.e.
`java.util.List<java.util.List<Person>>`	`processAuthorsCitationLists(java.util.List<java.lang.String> authorSequences)` Parse a list of independent sequences of authors from citations.
`java.util.List<Person>`	`processAuthorsHeader(java.lang.String authorSequence)` Parse a sequence of authors from a header, i.e.
`java.util.List<Date>`	`processDate(java.lang.String dateBlock)` Parse a raw string containing dates.
`java.lang.String`	`processHeader(java.lang.String inputFile, GrobidAnalysisConfig config, BiblioItem result)`
`java.lang.String`	`processHeader(java.lang.String inputFile, int consolidate, BiblioItem result)` Apply a parsing model for the header of a PDF file based on CRF, using dynamic range of pages as header
`java.lang.String`	`processHeader(java.lang.String inputFile, int consolidate, boolean includeRawAffiliations, BiblioItem result)` Apply a parsing model for the header of a PDF file based on CRF, using first three pages of the PDF
`BiblioItem`	`processRawReference(java.lang.String reference, int consolidate)` Apply a parsing model for a given single raw reference string based on CRF
`java.util.List<BiblioItem>`	`processRawReferences(java.util.List<java.lang.String> references, int consolidate)` Apply a parsing model for a set of raw reference text based on CRF
`java.util.List<BibDataSet>`	`processReferences(java.io.File inputFile, int consolidate)` Apply a parsing model to the reference block of a PDF file based on CRF
`static java.lang.String`	`reference2BibTeX(java.lang.String path, java.util.List<BibDataSet> resBib, int i)` Get the BibTeX string corresponding to the recognized citation section for a given citation
`static java.lang.String`	`reference2TEI(java.lang.String path, java.util.List<BibDataSet> resBib, int i)` Get the TEI XML string corresponding to the recognized citation section for a particular citation
`java.lang.String`	`references2BibTeX(java.lang.String path, java.util.List<BibDataSet> resBib)` Get the BibTeX string corresponding to the recognized citation section
`static java.lang.String`	`references2TEI(java.lang.String path, java.util.List<BibDataSet> resBib)` Get the TEI XML string corresponding to the recognized citation section, with pointers and advanced structuring
`static java.lang.String`	`references2TEI2(java.lang.String path, java.util.List<BibDataSet> resBib)` Get the TEI XML string corresponding to the recognized citation section
`Language`	`runLanguageId(java.lang.String filePath)` Basic run for language identification, default is on the body of the current document.
`Language`	`runLanguageId(java.lang.String filePath, java.lang.String ext)` Perform a language identification
`java.lang.String`	`segmentAndProcessHeader(java.io.File inputFile, int consolidate, BiblioItem result)` Use the segmentation model to identify the header section of a PDF file, then apply a parsing model for the header based on CRF
`static void`	`setCntManager(CntManager cntManager)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - Engine
```
public Engine(boolean loadModels)
```
    Constructor for the Grobid engine instance.
- Method Detail
  - processAuthorsHeader
```
public java.util.List<Person> processAuthorsHeader(java.lang.String authorSequence)
                                            throws java.lang.Exception
```
    Parse a sequence of authors from a header, i.e. containing possibly reference markers.
    
    Parameters:
    
    authorSequence - - the string corresponding to a raw sequence of names
    
    Returns:
    
    the list of structured author object
    
    Throws:
    
    java.lang.Exception
  - processAuthorsCitation
```
public java.util.List<Person> processAuthorsCitation(java.lang.String authorSequence)
                                              throws java.lang.Exception
```
    Parse a sequence of authors from a citation, i.e. containing no reference markers.
    
    Parameters:
    
    authorSequence - - the string corresponding to a raw sequence of names
    
    Returns:
    
    the list of structured author object
    
    Throws:
    
    java.lang.Exception
  - processAuthorsCitationLists
```
public java.util.List<java.util.List<Person>> processAuthorsCitationLists(java.util.List<java.lang.String> authorSequences)
                                                                   throws java.lang.Exception
```
    Parse a list of independent sequences of authors from citations.
    
    Parameters:
    
    authorSequences - - the list of strings corresponding each to a raw sequence of names.
    
    Returns:
    
    the list of all recognized structured author objects for each sequence of authors.
    
    Throws:
    
    java.lang.Exception
  - processAffiliation
```
public java.util.List<Affiliation> processAffiliation(java.lang.String addressBlock)
                                               throws java.io.IOException
```
    Parse a text block corresponding to an affiliation+address.
    
    Parameters:
    
    addressBlock - - the string corresponding to a raw affiliation+address
    
    Returns:
    
    the list of all recognized structured affiliation objects.
    
    Throws:
    
    java.io.IOException
  - processAffiliations
```
public java.util.List<java.util.List<Affiliation>> processAffiliations(java.util.List<java.lang.String> addressBlocks)
                                                                throws java.lang.Exception
```
    Parse a list of text blocks corresponding to an affiliation+address.
    
    Parameters:
    
    addressBlocks - - the list of strings corresponding each to a raw affiliation+address.
    
    Returns:
    
    the list of all recognized structured affiliation objects for each sequence of affiliation + address block.
    
    Throws:
    
    java.lang.Exception
  - processDate
```
public java.util.List<Date> processDate(java.lang.String dateBlock)
                                 throws java.io.IOException
```
    Parse a raw string containing dates.
    
    Parameters:
    
    dateBlock - - the string containing raw dates.
    
    Returns:
    
    the list of all structured date objects recognized in the string.
    
    Throws:
    
    java.io.IOException
  - processRawReference
```
public BiblioItem processRawReference(java.lang.String reference,
                                      int consolidate)
```
    Apply a parsing model for a given single raw reference string based on CRF
    
    Parameters:
    
    reference - the reference string to be processed
    
    consolidate - the consolidation option allows GROBID to exploit Crossref web services for improving header information. 0 (no consolidation, default value), 1 (consolidate the citation and inject extra metadata) or 2 (consolidate the citation and inject DOI only)
    
    Returns:
    
    the recognized bibliographical object
  - processRawReferences
```
public java.util.List<BiblioItem> processRawReferences(java.util.List<java.lang.String> references,
                                                       int consolidate)
                                                throws java.lang.Exception
```
    Apply a parsing model for a set of raw reference text based on CRF
    
    Parameters:
    
    references - the list of raw reference strings to be processed
    
    consolidate - the consolidation option allows GROBID to exploit Crossref web services for improving header information. 0 (no consolidation, default value), 1 (consolidate the citation and inject extra metadata) or 2 (consolidate the citation and inject DOI only)
    
    Returns:
    
    the list of recognized bibliographical objects
    
    Throws:
    
    java.lang.Exception
  - processReferences
```
public java.util.List<BibDataSet> processReferences(java.io.File inputFile,
                                                    int consolidate)
```
    Apply a parsing model to the reference block of a PDF file based on CRF
    
    Parameters:
    
    inputFile - the path of the PDF file to be processed
    
    consolidate - the consolidation option allows GROBID to exploit Crossref web services for improving header information. 0 (no consolidation, default value), 1 (consolidate the citation and inject extra metadata) or 2 (consolidate the citation and inject DOI only)
    
    Returns:
    
    the list of parsed references as bibliographical objects enriched with citation contexts
  - downloadPDF
```
public java.lang.String downloadPDF(java.lang.String url,
                                    java.lang.String dirName,
                                    java.lang.String name)
```
    Download a PDF file.
    
    Parameters:
    
    url - URL of the PDF to download
    
    dirName - directory where to store the downloaded PDF
    
    name - file name
  - getAcceptedLanguages
```
public java.util.List<java.lang.String> getAcceptedLanguages()
```
    Give the list of languages for which an extraction is allowed. If null, any languages will be processed
    
    Returns:
    
    the list of languages to be processed coded in ISO 3166.
  - addAcceptedLanguages
```
public void addAcceptedLanguages(java.lang.String lang)
```
    Add a language to the list of accepted languages.
    
    Parameters:
    
    lang - the language in ISO 3166 to be added
  - runLanguageId
```
public Language runLanguageId(java.lang.String filePath,
                              java.lang.String ext)
```
    Perform a language identification
    
    Parameters:
    
    ext - part
    
    Returns:
    
    language
  - runLanguageId
```
public Language runLanguageId(java.lang.String filePath)
```
    Basic run for language identification, default is on the body of the current document.
    
    Returns:
    
    language id
  - processHeader
```
public java.lang.String processHeader(java.lang.String inputFile,
                                      int consolidate,
                                      boolean includeRawAffiliations,
                                      BiblioItem result)
```
    Apply a parsing model for the header of a PDF file based on CRF, using first three pages of the PDF
    
    Parameters:
    
    inputFile - the path of the PDF file to be processed
    
    consolidate - the consolidation option allows GROBID to exploit Crossref web services for improving header information. 0 (no consolidation, default value), 1 (consolidate the citation and inject extra metadata) or 2 (consolidate the citation and inject DOI only)
    
    result - bib result
    
    Returns:
    
    the TEI representation of the extracted bibliographical information
  - processHeader
```
public java.lang.String processHeader(java.lang.String inputFile,
                                      int consolidate,
                                      BiblioItem result)
```
    Apply a parsing model for the header of a PDF file based on CRF, using dynamic range of pages as header
    
    Parameters:
    
    inputFile - : the path of the PDF file to be processed
    
    result - bib result
    
    Returns:
    
    the TEI representation of the extracted bibliographical information
  - processHeader
```
public java.lang.String processHeader(java.lang.String inputFile,
                                      GrobidAnalysisConfig config,
                                      BiblioItem result)
```
  - segmentAndProcessHeader
```
public java.lang.String segmentAndProcessHeader(java.io.File inputFile,
                                                int consolidate,
                                                BiblioItem result)
```
    Use the segmentation model to identify the header section of a PDF file, then apply a parsing model for the header based on CRF
    
    Parameters:
    
    inputFile - the path of the PDF file to be processed
    
    consolidate - the consolidation option allows GROBID to exploit Crossref web services for improving header information. 0 (no consolidation, default value), 1 (consolidate the citation and inject extra metadata) or 2 (consolidate the citation and inject DOI only)
    
    result - bib result
    
    Returns:
    
    the TEI representation of the extracted bibliographical information
  - createTrainingMonograph
```
public void createTrainingMonograph(java.io.File inputFile,
                                    java.lang.String pathRaw,
                                    java.lang.String pathTEI,
                                    int id)
```
    Create training data for the monograph model based on the application of the current monograph text model on a new PDF
    
    Parameters:
    
    inputFile - : the path of the PDF file to be processed
    
    pathRaw - : the path where to put the CRF feature file
    
    pathTEI - : the path where to put the annotated TEI representation (the file to be corrected for gold-level training data)
    
    id - : an optional ID to be used in the TEI file and the full text file, -1 if not used
  - createTrainingBlank
```
public void createTrainingBlank(java.io.File inputFile,
                                java.lang.String pathRaw,
                                java.lang.String pathTEI,
                                int id)
```
    Generate blank training data from provided directory of PDF documents, i.e. where TEI files are text only without tags. This can be used to start from scratch any new model.
    
    Parameters:
    
    inputFile - : the path of the PDF file to be processed
    
    pathRaw - : the path where to put the CRF feature file
    
    pathTEI - : the path where to put the annotated TEI representation (the file to be annotated for "from scratch" training data)
    
    id - : an optional ID to be used in the TEI file and the full text file, -1 if not used
  - createTraining
```
public void createTraining(java.io.File inputFile,
                           java.lang.String pathRaw,
                           java.lang.String pathTEI,
                           int id)
```
    Create training data for all models based on the application of the current full text model on a new PDF
    
    Parameters:
    
    inputFile - : the path of the PDF file to be processed
    
    pathRaw - : the path where to put the CRF feature file
    
    pathTEI - : the path where to put the annotated TEI representation (the file to be corrected for gold-level training data)
    
    id - : an optional ID to be used in the TEI file, -1 if not used
  - fullTextToTEI
```
public java.lang.String fullTextToTEI(java.io.File inputFile,
                                      GrobidAnalysisConfig config)
                               throws java.lang.Exception
```
    //TODO: remove invalid JavaDoc once refactoring is done and tested (left for easier reference) Parse and convert the current article into TEI, this method performs the whole parsing and conversion process. If onlyHeader is true, than only the tei header data will be created.
    
    Parameters:
    
    inputFile - - absolute path to the pdf to be processed
    
    config - - Grobid config
    
    Returns:
    
    the resulting structured document as a TEI string.
    
    Throws:
    
    java.lang.Exception
  - fullTextToTEIDoc
```
public Document fullTextToTEIDoc(java.io.File inputFile,
                                 GrobidAnalysisConfig config)
                          throws java.lang.Exception
```
    Throws:
    
    java.lang.Exception
  - fullTextToTEIDoc
```
public Document fullTextToTEIDoc(DocumentSource documentSource,
                                 GrobidAnalysisConfig config)
                          throws java.lang.Exception
```
    Throws:
    
    java.lang.Exception
  - batchCreateTraining
```
public int batchCreateTraining(java.lang.String directoryPath,
                               java.lang.String resultPath,
                               int ind)
```
    Process all the PDF in a given directory with a segmentation process and produce the corresponding training data format files for manual correction. The goal of this method is to help to produce additional traning data based on an existing model.
    
    Parameters:
    
    directoryPath - - the path to the directory containing PDF to be processed.
    
    resultPath - - the path to the directory where the results as XML files shall be written.
    
    ind - - identifier integer to be included in the resulting files to identify the training case. This is optional: no identifier will be included if ind = -1
    
    Returns:
    
    the number of processed files.
  - batchCreateTrainingMonograph
```
public int batchCreateTrainingMonograph(java.lang.String directoryPath,
                                        java.lang.String resultPath,
                                        int ind)
```
    Process all the PDF in a given directory with a monograph process and produce the corresponding training data format files for manual correction. The goal of this method is to help to produce additional traning data based on an existing model.
    
    Parameters:
    
    directoryPath - - the path to the directory containing PDF to be processed.
    
    resultPath - - the path to the directory where the results as XML files and CRF feature files shall be written.
    
    ind - - identifier integer to be included in the resulting files to identify the training case. This is optional: no identifier will be included if ind = -1
    
    Returns:
    
    the number of processed files.
  - batchCreateTrainingBlank
```
public int batchCreateTrainingBlank(java.lang.String directoryPath,
                                    java.lang.String resultPath,
                                    int ind)
```
    Process all the PDF in a given directory with a pdf extraction and produce blank training data, i.e. TEI files with text only without tags. This can be used to start from scratch any new model.
    
    Parameters:
    
    directoryPath - - the path to the directory containing PDF to be processed.
    
    resultPath - - the path to the directory where the results as XML files and default CRF feature files shall be written.
    
    ind - - identifier integer to be included in the resulting files to identify the training case. This is optional: no identifier will be included if ind = -1
    
    Returns:
    
    the number of processed files.
  - header2TEI
```
public static java.lang.String header2TEI(BiblioItem resHeader)
```
    Get the TEI XML string corresponding to the recognized header text
  - header2BibTeX
```
public static java.lang.String header2BibTeX(BiblioItem resHeader)
```
    Get the BibTeX string corresponding to the recognized header text
  - references2TEI2
```
public static java.lang.String references2TEI2(java.lang.String path,
                                               java.util.List<BibDataSet> resBib)
```
    Get the TEI XML string corresponding to the recognized citation section
  - references2TEI
```
public static java.lang.String references2TEI(java.lang.String path,
                                              java.util.List<BibDataSet> resBib)
```
    Get the TEI XML string corresponding to the recognized citation section, with pointers and advanced structuring
  - references2BibTeX
```
public java.lang.String references2BibTeX(java.lang.String path,
                                          java.util.List<BibDataSet> resBib)
```
    Get the BibTeX string corresponding to the recognized citation section
  - reference2TEI
```
public static java.lang.String reference2TEI(java.lang.String path,
                                             java.util.List<BibDataSet> resBib,
                                             int i)
```
    Get the TEI XML string corresponding to the recognized citation section for a particular citation
  - reference2BibTeX
```
public static java.lang.String reference2BibTeX(java.lang.String path,
                                                java.util.List<BibDataSet> resBib,
                                                int i)
```
    Get the BibTeX string corresponding to the recognized citation section for a given citation
  - processAllCitationsInPatent
```
public java.lang.String processAllCitationsInPatent(java.lang.String text,
                                                    java.util.List<BibDataSet> nplResults,
                                                    java.util.List<PatentItem> patentResults,
                                                    int consolidateCitations,
                                                    boolean includeRawCitations)
                                             throws java.lang.Exception
```
    Extract and parse both patent and non patent references within a patent text. Result are provided as a BibDataSet with offset position instanciated relative to input text and as PatentItem containing both "WISIWIG" results (the patent reference attributes as they appear in the text) and the attributes in DOCDB format (format according to WIPO and ISO standards). Patent references' offset positions are also given in the PatentItem object.
    
    Parameters:
    
    text - the string corresponding to the text body of the patent.
    
    nplResults - the list of extracted and parsed non patent references as BiblioItem object. This list must be instantiated before calling the method for receiving the results.
    
    patentResults - the list of extracted and parsed patent references as PatentItem object. This list must be instantiated before calling the method for receiving the results.
    
    consolidateCitations - the consolidation option allows GROBID to exploit Crossref web services for improving header information. 0 (no consolidation, default value), 1 (consolidate the citation and inject extra metadata) or 2 (consolidate the citation and inject DOI only)
    
    Returns:
    
    the list of extracted and parserd patent and non-patent references encoded in TEI.
    
    Throws:
    
    java.lang.Exception
  - processAllCitationsInXMLPatent
```
public java.lang.String processAllCitationsInXMLPatent(java.lang.String xmlPath,
                                                       java.util.List<BibDataSet> nplResults,
                                                       java.util.List<PatentItem> patentResults,
                                                       int consolidateCitations,
                                                       boolean includeRawCitations)
                                                throws java.lang.Exception
```
    Extract and parse both patent and non patent references within a patent in ST.36 format. Result are provided as a BibDataSet with offset position instantiated relative to input text and as PatentItem containing both "WISIWIG" results (the patent reference attributes as they appear in the text) and the attributes in DOCDB format (format according to WIPO and ISO standards). Patent references' offset positions are also given in the PatentItem object.
    
    Parameters:
    
    nplResults - the list of extracted and parsed non patent references as BiblioItem object. This list must be instanciated before calling the method for receiving the results.
    
    patentResults - the list of extracted and parsed patent references as PatentItem object. This list must be instanciated before calling the method for receiving the results.
    
    consolidateCitations - the consolidation option allows GROBID to exploit Crossref web services for improving header information. 0 (no consolidation, default value), 1 (consolidate the citation and inject extra metadata) or 2 (consolidate the citation and inject DOI only)
    
    Returns:
    
    the list of extracted and parserd patent and non-patent references encoded in TEI.
    
    Throws:
    
    java.lang.Exception - if sth. went wrong
  - processAllCitationsInPDFPatent
```
public java.lang.String processAllCitationsInPDFPatent(java.lang.String pdfPath,
                                                       java.util.List<BibDataSet> nplResults,
                                                       java.util.List<PatentItem> patentResults,
                                                       int consolidateCitations,
                                                       boolean includeRawCitations)
                                                throws java.lang.Exception
```
    Extract and parse both patent and non patent references within a patent in PDF format. Result are provided as a BibDataSet with offset position instanciated relative to input text and as PatentItem containing both "WISIWIG" results (the patent reference attributes as they appear in the text) and the attributes in DOCDB format (format according to WIPO and ISO standards). Patent references' offset positions are also given in the PatentItem object.
    
    Parameters:
    
    pdfPath - pdf path
    
    nplResults - the list of extracted and parsed non patent references as BiblioItem object. This list must be instanciated before calling the method for receiving the results.
    
    patentResults - the list of extracted and parsed patent references as PatentItem object. This list must be instanciated before calling the method for receiving the results.
    
    consolidateCitations - the consolidation option allows GROBID to exploit Crossref web services for improving header information. 0 (no consolidation, default value), 1 (consolidate the citation and inject extra metadata) or 2 (consolidate the citation and inject DOI only)
    
    Returns:
    
    the list of extracted and parserd patent and non-patent references encoded in TEI.
    
    Throws:
    
    java.lang.Exception - if sth. went wrong
  - annotateAllCitationsInPDFPatent
```
public java.lang.String annotateAllCitationsInPDFPatent(java.lang.String pdfPath,
                                                        int consolidateCitations,
                                                        boolean includeRawCitations)
                                                 throws java.lang.Exception
```
    Extract and parse both patent and non patent references within a patent in PDF format. Results are provided as JSON annotations with coordinates of the annotations in the orignal PDF and reference informations in DOCDB format (format according to WIPO and ISO standards).
    
    Parameters:
    
    pdfPath - pdf path
    
    consolidateCitations - the consolidation option allows GROBID to exploit Crossref web services for improving header information. 0 (no consolidation, default value), 1 (consolidate the citation and inject extra metadata) or 2 (consolidate the citation and inject DOI only)
    
    Returns:
    
    JSON annotations with extracted and parsed patent and non-patent references together with coordinates in the original PDF.
    
    Throws:
    
    java.lang.Exception
  - createTrainingPatentCitations
```
public void createTrainingPatentCitations(java.lang.String pathXML,
                                          java.lang.String resultPath)
                                   throws java.lang.Exception
```
    Process an XML patent document with a patent citation extraction and produce the corresponding training data format files for manual correction. The goal of this method is to help to produce additional traning data based on an existing model.
    
    Parameters:
    
    pathXML - - the path to the XML patent document to be processed.
    
    resultPath - - the path to the directory where the results as XML files shall be written.
    
    Throws:
    
    java.lang.Exception
  - batchCreateTrainingPatentcitations
```
public int batchCreateTrainingPatentcitations(java.lang.String directoryPath,
                                              java.lang.String resultPath)
                                       throws java.lang.Exception
```
    Process all the XML patent documents in a given directory with a patent citation extraction and produce the corresponding training data format files for manual correction. The goal of this method is to help to produce additional traning data based on an existing model.
    
    Parameters:
    
    directoryPath - - the path to the directory containing XML files to be processed.
    
    resultPath - - the path to the directory where the results as XML files shall be written.
    
    Returns:
    
    the number of processed files.
    
    Throws:
    
    java.lang.Exception
  - extractChemicalEntities
```
public java.util.List<ChemicalEntity> extractChemicalEntities(java.lang.String text)
                                                       throws java.lang.Exception
```
    Extract chemical names from text.
    
    Parameters:
    
    text - - text to be processed.
    
    Returns:
    
    List of chemical entites as POJO.
    
    Throws:
    
    java.lang.Exception
  - getAbstract
```
public java.lang.String getAbstract(Document doc)
                             throws java.lang.Exception
```
    Print the abstract content. Useful for term extraction.
    
    Throws:
    
    java.lang.Exception
  - printRefTitles
```
public java.lang.String printRefTitles(java.util.List<BibDataSet> resBib)
                                throws java.lang.Exception
```
    Return all the reference titles. Maybe useful for term extraction.
    
    Throws:
    
    java.lang.Exception
  - close
```
public void close()
           throws java.io.IOException
```
    Specified by:
    
    close in interface java.io.Closeable
    
    Specified by:
    
    close in interface java.lang.AutoCloseable
    
    Throws:
    
    java.io.IOException
  - setCntManager
```
public static void setCntManager(CntManager cntManager)
```
  - getCntManager
```
public static CntManager getCntManager()
```
  - getParsers
```
public EngineParsers getParsers()
```
  - getEngine
```
public static Engine getEngine(boolean preload)
```
    Returns:
    
    a new engine from GrobidFactory if the execution is parallel, else return the instance of engine.

Class Engine

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

Engine

Method Detail

processAuthorsHeader

processAuthorsCitation

processAuthorsCitationLists

processAffiliation

processAffiliations

processDate

processRawReference

processRawReferences

processReferences

downloadPDF

getAcceptedLanguages

addAcceptedLanguages

runLanguageId

runLanguageId

processHeader

processHeader

processHeader

segmentAndProcessHeader

createTrainingMonograph

createTrainingBlank

createTraining

fullTextToTEI

fullTextToTEIDoc

fullTextToTEIDoc

batchCreateTraining

batchCreateTrainingMonograph

batchCreateTrainingBlank

header2TEI

header2BibTeX

references2TEI2

references2TEI

references2BibTeX

reference2TEI

reference2BibTeX

processAllCitationsInPatent

processAllCitationsInXMLPatent

processAllCitationsInPDFPatent

annotateAllCitationsInPDFPatent

createTrainingPatentCitations

batchCreateTrainingPatentcitations

extractChemicalEntities

getAbstract

printRefTitles

close

setCntManager

getCntManager

getParsers

getEngine