|
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.erowid.sperowider.Sperowider
The core class for Sperowider, this class is configured by a SperowiderRunner and then run. THis fires off the whole download-spider-rectify-index cycle.
Field Summary | |
static int |
MINIMUM_THROTTLE
The system won't allow a smaller throttle than 100. |
Constructor Summary | |
Sperowider(SperowiderConfiguration configuration)
Constructs a new Sperowider on the basis of an SperowiderConfiguration . |
Method Summary | |
SperowiderContext |
getContext()
Returns the SperowiderContext . |
int |
getDownloadStatisticCount(int downloadStatus)
Returns the number of files downloaded per download status ( ASpiderBase.ALREADY_GRABBED ,
ASpiderBase.BAD_HTTP_RESPONSE , ASpiderBase.EXCEPTION ,
ASpiderBase.FILTER_FAILURE , ASpiderBase.SUCCESS . |
int |
getFileRectifyCount()
Gets the number of files rectified. |
int |
getGrabbedUrlCount()
The count of URLs that have been grabbed for download. |
int |
getHttpResponseCodeCount(int httpResponseCode)
Gets the number of responses per HTTP code. |
int |
getIndexedFileCount()
Gets the number of files indexed |
int |
getInvalidURLCount()
The count of all bad URLs, both found and real. |
int |
getRectifiedHTMLFileCount()
The count of all HTML files that have been "rectified", that have been processed to replace all found URLs with relative URLs to the mapped file names. |
int |
getTotalDownloadAttempts()
Gets the total number of download attempts. |
int |
getTotalHttpAttempts()
This is higher than the number of downloads, because each 302 counts here as well. |
int |
getUncheckedUrlCount()
A count of URLs that have not yet been checked. |
int |
getUnRectifiedFileCount()
The count of downloaded HTML files that are not yet rectified. |
void |
run()
Downloads, spiders, rectifies, and indexes based on the previous calls to the various setters and setShouldDownload(boolean) ,
and setShouldIndex(boolean) and setShouldRectify(boolean) . |
void |
setConfigurationSource(String configurationSource)
Sets an arbitrary string that is the source of the configuration |
void |
setLimit(int limit)
|
void |
setShouldDownload(boolean val)
Set this to true if you want downloading to happen when run() is called. |
void |
setShouldIndex(boolean val)
Set this to true if you want indexing to happen when run() is called. |
void |
setShouldRectify(boolean val)
Set this to true if you want rectifying to happen when run() is called. |
void |
setSummaryFileName(String summaryFileName)
|
void |
setSummaryFooterFileName(String summaryFileFooter)
|
void |
setSummaryHeaderFileName(String summaryFileHeader)
|
void |
setThrottle(long throttle)
Sets the throttle length, in milliseconds. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final int MINIMUM_THROTTLE
Constructor Detail |
public Sperowider(SperowiderConfiguration configuration) throws SperowiderInstantiationException
SperowiderConfiguration
.
Method Detail |
public void setThrottle(long throttle)
MINIMUM_THROTTLE
,
it will be set to MINIMUM_THROTTLE
.
public void setShouldDownload(boolean val)
run()
is called.
By default, this is false.
public void setShouldIndex(boolean val)
run()
is called.
By default, this is false.
public void setShouldRectify(boolean val)
run()
is called.
By default, this is false.
public void setConfigurationSource(String configurationSource)
public void setLimit(int limit)
limit
- The limit to set.public void setSummaryFooterFileName(String summaryFileFooter)
summaryFileFooter
- The summaryFileFooter to set.public void setSummaryHeaderFileName(String summaryFileHeader)
summaryFileHeader
- The summaryFileHeader to set.public void setSummaryFileName(String summaryFileName)
summaryFileName
- The summaryFileName to set.public void run() throws IOException
setShouldDownload(boolean)
,
and setShouldIndex(boolean)
and setShouldRectify(boolean)
.
IOException
public int getDownloadStatisticCount(int downloadStatus)
ASpiderBase.ALREADY_GRABBED
,
ASpiderBase.BAD_HTTP_RESPONSE
, ASpiderBase.EXCEPTION
,
ASpiderBase.FILTER_FAILURE
, ASpiderBase.SUCCESS
.
public int getHttpResponseCodeCount(int httpResponseCode)
public int getTotalDownloadAttempts()
public int getTotalHttpAttempts()
public int getIndexedFileCount()
public int getFileRectifyCount()
public int getUncheckedUrlCount()
public int getGrabbedUrlCount()
public int getInvalidURLCount()
public int getUnRectifiedFileCount()
public int getRectifiedHTMLFileCount()
public SperowiderContext getContext()
SperowiderContext
.
|
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |