EntityTweetTokeniser (OpenIMAJ master project 1.3.10 API)

java.lang.Object
- org.openimaj.text.nlp.EntityTweetTokeniser

All Implemented Interfaces:

Iterable<gov.sandia.cognition.text.token.Token>
```
public class EntityTweetTokeniser
extends Object
implements Iterable<gov.sandia.cognition.text.token.Token>
```
A tokeniser built to work with short text, like that found in twitter. Protects various elements of the text with an assumption that if the user made the mark, it was an important mark that carries meaning because of the relatively high premium of each key stroke. Based on the twokenise by Brendan O'Connor

Author:

Sina Samangooei (ss@ecs.soton.ac.uk)

Constructor Summary

Constructors
Constructor and Description

EntityTweetTokeniser(String s)

Constructors
Constructor and Description
`EntityTweetTokeniser(String s)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`List<String>`	`getProtectedStringTokens()`
`List<String>`	`getStringTokens()`
`List<gov.sandia.cognition.text.token.Token>`	`getTokens()`
`List<String>`	`getUnprotectedStringTokens()`
`static boolean`	`isValid(Locale locale)` Check whether this locale is supported by this tokeniser.
`static boolean`	`isValid(String locale)` Check whether this locale (specified by the two letter country code, `Locale`) is supported by this tokeniser.
`Iterator<gov.sandia.cognition.text.token.Token>`	`iterator()`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface java.lang.Iterable
forEach, spliterator

- Constructor Detail
  - EntityTweetTokeniser
```
public EntityTweetTokeniser(String s)
                     throws UnsupportedEncodingException,
                            TweetTokeniserException
```
    Parameters:
    
    s - Tokenise this string
    
    Throws:
    
    UnsupportedEncodingException
    
    TweetTokeniserException
- Method Detail
  - isValid
```
public static boolean isValid(Locale locale)
```
    Check whether this locale is supported by this tokeniser. The unsupported languages are those which don't need space characters to delimit words, namely the CJK languages.
    
    Parameters:
    
    locale -
    
    Returns:
    
    true if the local is supported
  - isValid
```
public static boolean isValid(String locale)
```
    Check whether this locale (specified by the two letter country code, Locale) is supported by this tokeniser. The unsupported languages are those which don't need space characters to delimit words, namely the CJK languages.
    
    Parameters:
    
    locale -
    
    Returns:
    
    true if the local is supported
  - iterator
```
public Iterator<gov.sandia.cognition.text.token.Token> iterator()
```
    Specified by:
    
    iterator in interface Iterable<gov.sandia.cognition.text.token.Token>
  - getTokens
```
public List<gov.sandia.cognition.text.token.Token> getTokens()
```
  - getStringTokens
```
public List<String> getStringTokens()
```
  - getProtectedStringTokens
```
public List<String> getProtectedStringTokens()
```
  - getUnprotectedStringTokens
```
public List<String> getUnprotectedStringTokens()
```

Class EntityTweetTokeniser

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface java.lang.Iterable

Constructor Detail

EntityTweetTokeniser

Method Detail

isValid

isValid

iterator

getTokens

getStringTokens

getProtectedStringTokens

getUnprotectedStringTokens