public class EntityTweetTokeniser extends Object implements Iterable<gov.sandia.cognition.text.token.Token>
Constructor and Description |
---|
EntityTweetTokeniser(String s) |
Modifier and Type | Method and Description |
---|---|
List<String> |
getProtectedStringTokens() |
List<String> |
getStringTokens() |
List<gov.sandia.cognition.text.token.Token> |
getTokens() |
List<String> |
getUnprotectedStringTokens() |
static boolean |
isValid(Locale locale)
Check whether this locale is supported by this tokeniser.
|
static boolean |
isValid(String locale)
Check whether this locale (specified by the two letter country code,
Locale ) is
supported by this tokeniser. |
Iterator<gov.sandia.cognition.text.token.Token> |
iterator() |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
forEach, spliterator
public EntityTweetTokeniser(String s) throws UnsupportedEncodingException, TweetTokeniserException
s
- Tokenise this stringUnsupportedEncodingException
TweetTokeniserException
public static boolean isValid(Locale locale)
locale
- public static boolean isValid(String locale)
Locale
) is
supported by this tokeniser. The unsupported languages are those which don't need space
characters to delimit words, namely the CJK languages.locale
- public List<String> getStringTokens()
public List<String> getProtectedStringTokens()
public List<String> getUnprotectedStringTokens()