Package org.erowid.sperowider.urlfilter

URL filter intefaces and implementations to allow for control over Simple Spider spidering decisions, and Sperowider spidering, rectifying, downloading, and indexing decisions.

See:
          Description

Interface Summary
IDownloadFilter A filter for indicating if a Url should be downloaded.
IIndexFilter Indicates if a Url or filename should be indexed
ISperowiderFilter A convenience interface that wraps all required Sperowider filter interfaces.
ISpiderFilter A filtering for spidering urls found in webpages.
 

Class Summary
ADumbIndexFilter A temporary way of implementing index filtering for Sperowider.
AIncludeExcludeFilter Provides a basic frame for file/url filtering.
BlocksAllFilter A URL Filter that says "no" to every candidate URL.
NoHopRegexSperowiderFilter This class functions as filter to implement No-Hop logic using Regex for downloading and spidering.
NoHopSimpleSperowiderFilter This class functions as filter to implement No-Hop logic using Regex for downloading and spidering.
OneHopRegexSperowiderFilter This class functions as filter to implement One-Hop logic using Regex for downloading and spidering.
OneHopSimpleSperowiderFilter This class functions as filter to implement One-Hop logic using Regex for downloading and spidering.
PatternMatcher Does Regex style pattern matching in support of regex based URL filters like RegexFilter.
RegexFilter A regex based implementation of AIncludeExcludeFilter.
RegexURLFilter Deprecated. Use NoHopRegexSperowiderFilter instead of this class.
SimpleFilter An implementation of AIncludeExcludeFilter that uses the filter rules from SimpleMatcher.
SimpleMatcher Does simple style pattern matching in support of simple based URL filters like SimpleFilter.
SimpleURLFilter Deprecated. Use NoHopSimpleSperowiderFilter instead of this class.
URLFilter Deprecated. Use NoHopSimpleSperowiderFilter instead.
 

Package org.erowid.sperowider.urlfilter Description

URL filter intefaces and implementations to allow for control over Simple Spider spidering decisions, and Sperowider spidering, rectifying, downloading, and indexing decisions.

The place to really start in this package is with the NoHopRegexSperowiderFilter, which is the most commonly used URL filter. It uses regex to select what URLs should be downloaded and indexes all files that have been downloaded.


spero logo small Sperowider is
© 2005 Erowid.org