This tutorial will show you how to extract and analyse images posted on Twitter in real-time. To get started, create a new project using the OpenIMAJ archetype and in the main method, create a connection to the live Twitter sample stream as follows:
TwitterAPIToken token = DefaultTokenFactory.get(TwitterAPIToken.class); TwitterStreamDataset stream = new TwitterStreamDataset(token);
At this point, run the code. The first time you run it you will be
instructed to register as a Twitter developer to get an API key, which
you will then need to enter at the prompt. If you’ve followed the
Image Datasets tutorial you should be familiar with this process from
using the FlickrImageDataset
and
BingImageDataset
classes; as with those classes you
could also construct a TwitterAPIToken
and set its
fields manually. You’ll notice that not much happens other than a few
debug messages. Also, notice that the program doesn’t end; this is
because there is a thread running in the background reading tweets
into your stream
object. Manually stop the program
running.
To demonstrate that Tweets are really being added into your
stream
object, add the following code to print the
text content of each Tweet and run it again:
stream.forEach(new Operation<Status>() { public void perform(Status status) { System.out.println(status.getText()); } });
You should now see a large volume of Tweet messages being written to
your console. Stop the program and remove the above
forEach
loop. We’ll now look at how we can get
images out of the Tweets.
Tweets themselves do not contain images; rather tweets might contain
URLs, which might correspond to images, or web-sites where an image is
hosted. URLs might be in the textual body of the Tweets and/or
in special fields that form part of the status objects. OpenIMAJ makes
it easy to extract the URLs by mapping a stream
of Twitter Status
objects to URL
objects:
Stream<URL> urlStream = stream.map(new TwitterURLExtractor());
The ImageSiteURLExtractor
class can be used to
process the URLs and return just those that correspond to images. The
ImageSiteURLExtractor
is also aware of a number of
standard image hosting sites, and is able to resolve the actual image
URL from the web-page URL that would normally appear in a Tweet. The
ImageSiteURLExtractor
is just another
Function
so can be applied as another map:
Stream<URL> imageUrlStream = urlStream.map(new ImageSiteURLExtractor(false));
The boolean in the ImageSiteURLExtractor
constructor disables support for the Tumblr hosting site; you can
enable it, but you’ll need to create a Tumblr API key (which you will
be prompted for). Now to get the images, we can apply a static
instance of function class called ImageFromURL
which has been configured to read MBFImages
:
Stream<MBFImage> imageStream = imageUrlStream.map(ImageFromURL.MBFIMAGE_EXTRACTOR);
Now let’s display the images in the stream as they arrive:
imageStream.forEach(new Operation<MBFImage>() { public void perform(MBFImage image) { DisplayUtilities.displayName(image, "image"); } });
If at this point you run the program, you should see be able to see the images that are currently being shared on Twitter.
Now let’s modify the code so we can generate a visualisation of the
faces that appear on Twitter. Add an extra inline
map function before the previous forEach
loop, so
that the code looks like this:
imageStream.map(new MultiFunction<MBFImage, MBFImage>() { public List<MBFImage> apply(MBFImage in) { } }).forEach(new Operation<MBFImage>() { public void perform(MBFImage image) { DisplayUtilities.displayName(image, "image"); } });
and add the following to the apply method:
List<DetectedFace> detected = detector.detectFaces(in.flatten()); List<MBFImage> faces = new ArrayList<MBFImage>(); for (DetectedFace face : detected) faces.add(in.extractROI(face.getBounds())); return faces;
and finally add the following just before the apply method:
HaarCascadeDetector detector = HaarCascadeDetector.BuiltInCascade.frontalface_default.load();
Now, if you run the software, you should be able to to see the faces of people who are being tweeted at the current time.
The TwitterSearchStream
class repeatedly calls the Twitter search API with a specific query in order to produce a stream of Tweets related to a specific
subject. Try using the TwitterSearchStream
to find and display tweeted images of your favourite animal.
Can you make a program that continuously shows the average colour of the last 100 tweeted images?