HeaderParser

java.lang.Object
- org.grobid.core.engines.AbstractParser
- - org.grobid.core.engines.HeaderParser

All Implemented Interfaces:

java.io.Closeable, java.lang.AutoCloseable, GenericTagger
```
public class HeaderParser
extends AbstractParser
```

Field Summary
- Fields inherited from class org.grobid.core.engines.AbstractParser
  analyzer, cntManager

Constructor Summary

Constructors
Constructor and Description

HeaderParser(EngineParsers parsers)

HeaderParser(EngineParsers parsers, CntManager cntManager)

Constructors
Constructor and Description
`HeaderParser(EngineParsers parsers)`
`HeaderParser(EngineParsers parsers, CntManager cntManager)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`close()`
`BiblioItem`	`consolidateHeader(BiblioItem resHeader, int consolidate)` Consolidate an existing list of recognized citations based on access to external internet bibliographic databases.
`Document`	`createTrainingHeader(java.lang.String inputFile, java.lang.String pathHeader, java.lang.String pathTEI)` Process the header of the specified pdf and format the result as training data.
`<any>`	`getSectionHeaderFeatured(Document doc, java.util.SortedSet<DocumentPiece> documentHeaderParts, boolean withRotation)` Return the header section with features to be processed by the CRF model
`<any>`	`processing(java.io.File input, BiblioItem resHeader, GrobidAnalysisConfig config)` Processing with application of the segmentation model
`<any>`	`processing2(java.lang.String pdfInput, BiblioItem resHeader, GrobidAnalysisConfig config)` Processing without application of the segmentation model, regex are used to identify the header zone.
`java.lang.String`	`processingHeaderBlock(GrobidAnalysisConfig config, Document doc, BiblioItem resHeader)` Header processing after identification of the header blocks with heuristics (old approach)
`java.lang.String`	`processingHeaderSection(GrobidAnalysisConfig config, Document doc, BiblioItem resHeader)` Header processing after application of the segmentation model (new approach)
`BiblioItem`	`resultExtraction(java.lang.String result, boolean intro, java.util.List<LayoutToken> tokenizations, BiblioItem biblio, Document doc)` Extract results from a labelled header.
`java.lang.StringBuilder`	`trainingExtraction(java.lang.String result, boolean intro, java.util.List<LayoutToken> tokenizations)` Extract results from a labelled header in the training format without any string modification.

Methods inherited from class org.grobid.core.engines.AbstractParser
label, label

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

HeaderParser

public HeaderParser(EngineParsers parsers,
                    CntManager cntManager)

HeaderParser

public HeaderParser(EngineParsers parsers)

Method Detail

processing

public <any> processing(java.io.File input,
                        BiblioItem resHeader,
                        GrobidAnalysisConfig config)

Processing with application of the segmentation model

processing2

public <any> processing2(java.lang.String pdfInput,
                         BiblioItem resHeader,
                         GrobidAnalysisConfig config)

Processing without application of the segmentation model, regex are used to identify the header zone.

processingHeaderBlock

public java.lang.String processingHeaderBlock(GrobidAnalysisConfig config,
                                              Document doc,
                                              BiblioItem resHeader)
                                       throws java.lang.Exception

Header processing after identification of the header blocks with heuristics (old approach)

Throws:: java.lang.Exception

processingHeaderSection

public java.lang.String processingHeaderSection(GrobidAnalysisConfig config,
                                                Document doc,
                                                BiblioItem resHeader)

Header processing after application of the segmentation model (new approach)

getSectionHeaderFeatured

public <any> getSectionHeaderFeatured(Document doc,
                                      java.util.SortedSet<DocumentPiece> documentHeaderParts,
                                      boolean withRotation)

Return the header section with features to be processed by the CRF model

createTrainingHeader

public Document createTrainingHeader(java.lang.String inputFile,
                                     java.lang.String pathHeader,
                                     java.lang.String pathTEI)

Process the header of the specified pdf and format the result as training data.

Parameters:: inputFile - path to input file; pathHeader - path to header; pathTEI - path to TEI

resultExtraction

public BiblioItem resultExtraction(java.lang.String result,
                                   boolean intro,
                                   java.util.List<LayoutToken> tokenizations,
                                   BiblioItem biblio,
                                   Document doc)

Extract results from a labelled header. If boolean intro is true, the extraction is stopped at the first "intro" tag identified (this tag marks the begining of the description).

Parameters:: result - result; intro - if intro; tokenizations - list of tokens; biblio - biblio item
Returns:: a biblio item

trainingExtraction

public java.lang.StringBuilder trainingExtraction(java.lang.String result,
                                                  boolean intro,
                                                  java.util.List<LayoutToken> tokenizations)

Extract results from a labelled header in the training format without any string modification.

Parameters:: result - result; intro - if intro; tokenizations - list of tokens
Returns:: a result

consolidateHeader
```
public BiblioItem consolidateHeader(BiblioItem resHeader,
                                    int consolidate)
```
Consolidate an existing list of recognized citations based on access to external internet bibliographic databases.

Parameters:

resHeader - original biblio item

Returns:

consolidated biblio item

close
```
public void close()
           throws java.io.IOException
```
Specified by:

close in interface java.io.Closeable

Specified by:

close in interface java.lang.AutoCloseable

Overrides:

close in class AbstractParser

Throws:

java.io.IOException

Class HeaderParser

Field Summary

Fields inherited from class org.grobid.core.engines.AbstractParser

Constructor Summary

Method Summary

Methods inherited from class org.grobid.core.engines.AbstractParser

Methods inherited from class java.lang.Object

Constructor Detail

HeaderParser

HeaderParser

Method Detail

processing

processing2

processingHeaderBlock

processingHeaderSection

getSectionHeaderFeatured

createTrainingHeader

resultExtraction

trainingExtraction

consolidateHeader

close