public class UnicodeUtil
extends java.lang.Object
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
bullet_chars |
static java.lang.String |
close_parenthesis |
static java.lang.String |
horizontal_low_lines_chars |
static java.lang.String |
my_whitespace_chars |
static java.lang.String |
new_line_chars |
static java.lang.String |
open_parenthesis |
static java.lang.String |
vertical_lines_chars |
static java.lang.String |
whitespace_chars |
Constructor and Description |
---|
UnicodeUtil() |
Modifier and Type | Method and Description |
---|---|
static java.lang.String |
normaliseText(java.lang.String text)
Normalise the space, EOL and punctuation unicode characters.
|
static java.lang.String |
normaliseTextAndRemoveSpaces(java.lang.String text)
Unicode normalisation of the token text.
|
public static java.lang.String whitespace_chars
public static java.lang.String my_whitespace_chars
public static java.lang.String horizontal_low_lines_chars
public static java.lang.String vertical_lines_chars
public static java.lang.String new_line_chars
public static java.lang.String bullet_chars
public static java.lang.String open_parenthesis
public static java.lang.String close_parenthesis
public static java.lang.String normaliseText(java.lang.String text)
text
- to be normalisedpublic static java.lang.String normaliseTextAndRemoveSpaces(java.lang.String text)
normaliseText(java.lang.String)
, but in addition also removes spacestext
- to be normalised