Sub-modules containing support for analysing and processing web-pages.
This project has declared the following modules:
Name | Description |
---|---|
core-web | Implementation of a programatic offscreen web browser and utility functions. |
webpage-analysis | Utilities for analysing the content and visual layout of a web-page. |
readability4j | Readability4J is a partial re-implementation of the original readability.js script in Java. Many modifications have been made however. |
The twitter project contains tools with which to read JSON data from the twitter API and process the data. | |
data-scraping | Utility methods and classes for extracting data and information from the web |