public class EntityTweetTokeniser extends Object implements Iterable<gov.sandia.cognition.text.token.Token>
| Constructor and Description |
|---|
EntityTweetTokeniser(String s) |
| Modifier and Type | Method and Description |
|---|---|
List<String> |
getProtectedStringTokens() |
List<String> |
getStringTokens() |
List<gov.sandia.cognition.text.token.Token> |
getTokens() |
List<String> |
getUnprotectedStringTokens() |
static boolean |
isValid(Locale locale)
Check whether this locale is supported by this tokeniser.
|
static boolean |
isValid(String locale)
Check whether this locale (specified by the two letter country code,
Locale) is
supported by this tokeniser. |
Iterator<gov.sandia.cognition.text.token.Token> |
iterator() |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitforEach, spliteratorpublic EntityTweetTokeniser(String s) throws UnsupportedEncodingException, TweetTokeniserException
s - Tokenise this stringUnsupportedEncodingExceptionTweetTokeniserExceptionpublic static boolean isValid(Locale locale)
locale - public static boolean isValid(String locale)
Locale) is
supported by this tokeniser. The unsupported languages are those which don't need space
characters to delimit words, namely the CJK languages.locale - public List<String> getStringTokens()
public List<String> getProtectedStringTokens()
public List<String> getUnprotectedStringTokens()