Chapter 11. Twitter Streams and Images

This tutorial will show you how to extract and analyse images posted on Twitter in real-time. To get started, create a new project using the OpenIMAJ archetype and in the main method, create a connection to the live Twitter sample stream as follows:

TwitterAPIToken token = DefaultTokenFactory.get(TwitterAPIToken.class);
TwitterStreamDataset stream = new TwitterStreamDataset(token);

At this point, run the code. The first time you run it you will be instructed to register as a Twitter developer to get an API key, which you will then need to enter at the prompt. If you’ve followed the Image Datasets tutorial you should be familiar with this process from using the FlickrImageDataset and BingImageDataset classes; as with those classes you could also construct a TwitterAPIToken and set its fields manually. You’ll notice that not much happens other than a few debug messages. Also, notice that the program doesn’t end; this is because there is a thread running in the background reading tweets into your stream object. Manually stop the program running.

To demonstrate that Tweets are really being added into your stream object, add the following code to print the text content of each Tweet and run it again:

stream.forEach(new Operation<Status>() {
    public void perform(Status status) {
        System.out.println(status.getText());
    }
});

You should now see a large volume of Tweet messages being written to your console. Stop the program and remove the above forEach loop. We’ll now look at how we can get images out of the Tweets.

Tweets themselves do not contain images; rather tweets might contain URLs, which might correspond to images, or web-sites where an image is hosted. URLs might be in the textual body of the Tweets and/or in special fields that form part of the status objects. OpenIMAJ makes it easy to extract the URLs by mapping a stream of Twitter Status objects to URL objects:

Stream<URL> urlStream = stream.map(new TwitterURLExtractor());

The ImageSiteURLExtractor class can be used to process the URLs and return just those that correspond to images. The ImageSiteURLExtractor is also aware of a number of standard image hosting sites, and is able to resolve the actual image URL from the web-page URL that would normally appear in a Tweet. The ImageSiteURLExtractor is just another Function so can be applied as another map:

Stream<URL> imageUrlStream = urlStream.map(new ImageSiteURLExtractor(false));

The boolean in the ImageSiteURLExtractor constructor disables support for the Tumblr hosting site; you can enable it, but you’ll need to create a Tumblr API key (which you will be prompted for). Now to get the images, we can apply a static instance of function class called ImageFromURL which has been configured to read MBFImages:

Stream<MBFImage> imageStream = imageUrlStream.map(ImageFromURL.MBFIMAGE_EXTRACTOR);

Now let’s display the images in the stream as they arrive:

imageStream.forEach(new Operation<MBFImage>() {
    public void perform(MBFImage image) {
        DisplayUtilities.displayName(image, "image");
    }
});

If at this point you run the program, you should see be able to see the images that are currently being shared on Twitter.

Now let’s modify the code so we can generate a visualisation of the faces that appear on Twitter. Add an extra inline map function before the previous forEach loop, so that the code looks like this:

imageStream.map(new MultiFunction<MBFImage, MBFImage>() {
    public List<MBFImage> apply(MBFImage in) {

    }
}).forEach(new Operation<MBFImage>() {
    public void perform(MBFImage image) {
        DisplayUtilities.displayName(image, "image");
    }
});

and add the following to the apply method:

List<DetectedFace> detected = detector.detectFaces(in.flatten());
	
List<MBFImage> faces = new ArrayList<MBFImage>();
for (DetectedFace face : detected)
    faces.add(in.extractROI(face.getBounds()));

return faces;

and finally add the following just before the apply method:

	HaarCascadeDetector detector = HaarCascadeDetector.BuiltInCascade.frontalface_default.load();
	

Now, if you run the software, you should be able to to see the faces of people who are being tweeted at the current time.

11.1. Exercises

11.1.1. Exercise 1: The TwitterSearchStream

The TwitterSearchStream class repeatedly calls the Twitter search API with a specific query in order to produce a stream of Tweets related to a specific subject. Try using the TwitterSearchStream to find and display tweeted images of your favourite animal.

11.1.2. Exercise 2: The colour of Twitter

Can you make a program that continuously shows the average colour of the last 100 tweeted images?

11.1.3. Exercise 3: Trending images

Images are often (re)-tweeted or shared. Using the histogram features and similarity measures you learned about in the Global image features tutorial, can you make a program that computes which images have been re-tweeted the most over a fixed number of tweets?