org.erowid.sperowider
Class ASpiderBase

java.lang.Object
  extended byorg.erowid.sperowider.ASpiderBase
Direct Known Subclasses:
Downloader, SimplePageSpider

public abstract class ASpiderBase
extends Object

Downloads files to the local drive.

Version:
$Header: /cvsroot/sperowider/SPEROWIDER_MODULE/javasource/org/erowid/sperowider/ASpiderBase.java,v 1.7 2005/04/21 15:19:23 gurustu Exp $
Author:
Stu Statman

Field Summary
static int ALREADY_GRABBED
          Returned by spider(String) to indicate that the download was not done because this file was downloaded under another name.
static int BAD_HTTP_RESPONSE
          Returned by spider(String) to indicate that the download failed because of an HTTP error (a 404, for example).
static int EXCEPTION
          Returned by spider(String) to indicate that the download failed because of a thrown exception.
static int FILTER_FAILURE
          Returned by spider(String) to indicate that the download was not done because it was blocked by a filter (either robots.txt or the Sperowider filter itself).
static String SPEROWIDER_USER_AGENT
          The actual user agent sent to websites : "Sperowider/1.1"
static String SPEROWIDER_USER_AGENT_NAME
          The name of the user agent, without the version number : "Sperowider"
static String SPEROWIDER_USER_AGENT_VERSION
          The current version number of the Sperowider : "1.1"
static int SUCCESS
          Returned by spider(String) to indicate that the download succeeded.
 
Constructor Summary
ASpiderBase()
          Instantiates a spider base with a non-throttle.
ASpiderBase(IThrottle throttle)
          Instantiates a spider base, with a given throttle.
 
Method Summary
 int getDownloadStatisticCount(int downloadStatus)
          Returns the number of downloads that have resulted in the passed in status.
 int getHttpResponseCodeCount(int httpResponseCode)
          Returns the number of Http responses of each kind.
 int getTotalDownloadAttempts()
          Returns the total number of download attempts.
 int getTotalHttpAttempts()
          Returns the total number of download attempts.
abstract  int handleConnection(String sourceUrl, HttpURLConnection connection)
          When a connection is actually established, this method will be called.
abstract  void handleConnectionException(String sourceUrl, Throwable e)
          This is called by spider(String) when an exception is found when an attempt to load the URL is hit.
 int spider(String url)
          Downloads and spiders the passed in URL.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SPEROWIDER_USER_AGENT_NAME

public static final String SPEROWIDER_USER_AGENT_NAME
The name of the user agent, without the version number : "Sperowider"

See Also:
Constant Field Values

SPEROWIDER_USER_AGENT_VERSION

public static final String SPEROWIDER_USER_AGENT_VERSION
The current version number of the Sperowider : "1.1"

See Also:
Constant Field Values

SPEROWIDER_USER_AGENT

public static final String SPEROWIDER_USER_AGENT
The actual user agent sent to websites : "Sperowider/1.1"

See Also:
Constant Field Values

SUCCESS

public static final int SUCCESS
Returned by spider(String) to indicate that the download succeeded.
The value of this is : 0

See Also:
Constant Field Values

FILTER_FAILURE

public static final int FILTER_FAILURE
Returned by spider(String) to indicate that the download was not done because it was blocked by a filter (either robots.txt or the Sperowider filter itself).
The value of this is : 1

See Also:
Constant Field Values

EXCEPTION

public static final int EXCEPTION
Returned by spider(String) to indicate that the download failed because of a thrown exception.
The value of this is : 2

See Also:
Constant Field Values

BAD_HTTP_RESPONSE

public static final int BAD_HTTP_RESPONSE
Returned by spider(String) to indicate that the download failed because of an HTTP error (a 404, for example).
The value of this is : 3

See Also:
Constant Field Values

ALREADY_GRABBED

public static final int ALREADY_GRABBED
Returned by spider(String) to indicate that the download was not done because this file was downloaded under another name.
The value of this is : 4

See Also:
Constant Field Values
Constructor Detail

ASpiderBase

public ASpiderBase(IThrottle throttle)
Instantiates a spider base, with a given throttle.


ASpiderBase

public ASpiderBase()
Instantiates a spider base with a non-throttle.

Method Detail

handleConnection

public abstract int handleConnection(String sourceUrl,
                                     HttpURLConnection connection)
When a connection is actually established, this method will be called.


handleConnectionException

public abstract void handleConnectionException(String sourceUrl,
                                               Throwable e)
This is called by spider(String) when an exception is found when an attempt to load the URL is hit.


spider

public int spider(String url)
Downloads and spiders the passed in URL.


getDownloadStatisticCount

public int getDownloadStatisticCount(int downloadStatus)
Returns the number of downloads that have resulted in the passed in status. Valid values are current : SUCCESS, ALREADY_GRABBED, BAD_HTTP_RESPONSE, EXCEPTION, FILTER_FAILURE, SUCCESS.


getHttpResponseCodeCount

public int getHttpResponseCodeCount(int httpResponseCode)
Returns the number of Http responses of each kind.


getTotalHttpAttempts

public int getTotalHttpAttempts()
Returns the total number of download attempts. This will be more that getTotalDownloadAttempts(), because this counts each 302 as an Http attempt.


getTotalDownloadAttempts

public int getTotalDownloadAttempts()
Returns the total number of download attempts. This is one per filename found while spidering.


spero logo small Sperowider is
© 2005 Erowid.org