org.erowid.sperowider

Overview

Package

Class

Tree

Deprecated

Index

Help

Home Sourceforge

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES

Package org.erowid.sperowider

Provides the core Sperowider functionality of downloading, spidering, rectifying, and indexing (for the SperoSearch applet) a website.

See:
Description

Interface Summary
IInitializableObject	Objects that can get automatically instantiated by config implement this.
ISperowiderModel	This interface defines the core model for data tracking.
IThrottle	The interface for ensuring that file downloads do not happen too rapidly.

Class Summary
AHandler	Interface that all download handlers must implement
ASpiderBase	Downloads files to the local drive.
BasicSperowiderModel	An in-memory implementation of `ISperowiderModel`.
Downloader	Downloads files to the local drive.
DownloaderRobotsFilter	Provides robots.txt filtering for the Downloader.
DownloadRunner	Does the downloading, using repeated calls to a Downloader class.
FileNameManager	Maps URLs to file names.
FileUtils	Simple file utilities.
GenericHandler	This class downloads generically.
HandlerPool	A pool of `AHandler` objects, and a map from MIME types and file extensions to those objects.
Indexer	Even though it would be more efficient to do this as part of rectification, I'm breaking this out so it can be run stand-alone.
IndexerRunner	Runs the Sperowider indexing.
NonThrottle	A concrete implementation of `IThrottle` that does not ever block.
PatternMatchingHandler	Uses the contents of a Sperowider custom tag inside of the passed in file to identify a regex pattern as the mongling policy.
Rectifier	Once the files are downloaded, the rectifier does a second pass and converts all of the URLs to local URLs, flattening redirects, making them all relative, etc.
RectifierRunner	Loops the the files to be rectified, and rectifies them using `Rectifier` objects.
SperoLog	Centralized logging location.
Sperowider	The core class for Sperowider, this class is configured by a SperowiderRunner and then run.
SperowiderCommandInterpreter	This class is used to perform certain transforms to comments in HTML, if they match the Sperowider command syntax.
SperowiderContext	This class holds references to all of the high level "global" objects used in Sperowider.
SperowiderRunner	The main class for the Sperowider, this class handles reading and using the configuration file to configure the `Sperowider` class, and then delegating to that class.
SummaryReportGenerator	Generates a report to an html doc after a download run.
TextCssHandler	A Handler for dealing with CSS files, it replaces URLs inside url().
TextHtmlHandler	This class does the downloading and spidering of HTML files.
Throttle	A concrete implementation of `IThrottle`, this class is constructed with the minimum number of milliseconds that must pass between consecutive times that `Throttle.throttle()` will unblock.

Package org.erowid.sperowider Description