org.erowid.sperowider
Class BasicSperowiderModel

java.lang.Object
  extended byorg.erowid.sperowider.BasicSperowiderModel
All Implemented Interfaces:
IInitializableObject, ISperowiderModel

public class BasicSperowiderModel
extends Object
implements ISperowiderModel, IInitializableObject

An in-memory implementation of ISperowiderModel.

Version:
$Header: /cvsroot/sperowider/SPEROWIDER_MODULE/javasource/org/erowid/sperowider/BasicSperowiderModel.java,v 1.21 2005/04/19 08:05:54 gurustu Exp $
Author:
Stu Statman

Constructor Summary
BasicSperowiderModel()
           
 
Method Summary
 void addFileToRectificationQueue(String fileName)
          Adds a filename to the rectification queue
 void addFoundURL(String foundIn, String found, boolean excludeFromDownloadQueue)
          The Downloader calls this when it finds a URL in a downloaded page.
 void destroy()
          Called by the Sperowider to close all open resources
 String getFileForRectifying()
          Returns a file to be rectified; this will be done after the downloads are all done
 String getFileNameForURL(String url)
          Returns the filename for a mapped URL.
 List getFoundURLs(String sourceURL)
          This is too expensive for the BasicSperowiderModel, in terms of memory.
 int getGrabbedUrlCount()
          The count of URLs that have been grabbed for download.
 int getInvalidURLCount()
          The count of all bad URLs, both found and real.
 Collection getInvalidURLs()
          Returns the list of invalid URLs
 String getRealURLForFoundURL(String foundURL)
          Returns the mapping data as set by mapFoundURLToRealURL(String, String)
 int getRectifiedHTMLFileCount()
          The count of all HTML files that have been "rectified", that have been processed to replace all found URLs with relative URLs to the mapped file names.
 List getSourceURLs(String foundURL)
          This is too expensive for the BasicSperowiderModel, in terms of memory.
 int getUncheckedUrlCount()
          A count of URLs that have not yet been checked.
 int getUnRectifiedFileCount()
          The count of downloaded HTML files that are not yet rectified.
 String getUnspideredUrl()
          Returns a URL that has yet to be downloaded
 boolean grabForSpidering(String url)
          If this URL has already been downloaded, return false.
 void init(Element configNode)
          Initiale this class with the passed in XML configuration element.
 boolean isSpiderMapSupported()
          Returns false, because the BasicSperowiderModel does not support getFoundURLs(String) or getSourceURLs(String).
 void mapFoundURLToRealURL(String foundURL, String realURL)
          Maps a found URL to a "real URL".
 void mapRealURLToFileName(String realURL, String fileName)
          Maps a "real" URL to a file name.
 void markInvalidURL(String url, int http_code, String http_message)
          Mark a URL as invalid
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BasicSperowiderModel

public BasicSperowiderModel()
Method Detail

addFoundURL

public void addFoundURL(String foundIn,
                        String found,
                        boolean excludeFromDownloadQueue)
Description copied from interface: ISperowiderModel
The Downloader calls this when it finds a URL in a downloaded page. The excludeFromDownloadQueue flag is used to indicate URLs that are not to be downloaded, typically because of a filter' failure. This method is still called, so that spider mapping could happen.

Note that just because excludeFromDownloadQueue is set to false does not mean that the URL need be added to the queue. If the URL has already been downloaded, or is already in the queue, this request can be ignored.

Specified by:
addFoundURL in interface ISperowiderModel

getUnspideredUrl

public String getUnspideredUrl()
Description copied from interface: ISperowiderModel
Returns a URL that has yet to be downloaded

Specified by:
getUnspideredUrl in interface ISperowiderModel

mapFoundURLToRealURL

public void mapFoundURLToRealURL(String foundURL,
                                 String realURL)
Description copied from interface: ISperowiderModel
Maps a found URL to a "real URL". A "real URL" is the final URL after all 302s and server processing is done.

Specified by:
mapFoundURLToRealURL in interface ISperowiderModel

mapRealURLToFileName

public void mapRealURLToFileName(String realURL,
                                 String fileName)
Description copied from interface: ISperowiderModel
Maps a "real" URL to a file name. These file names will be important for "rectifying" the downloaded files.

Specified by:
mapRealURLToFileName in interface ISperowiderModel

addFileToRectificationQueue

public void addFileToRectificationQueue(String fileName)
Description copied from interface: ISperowiderModel
Adds a filename to the rectification queue

Specified by:
addFileToRectificationQueue in interface ISperowiderModel

grabForSpidering

public boolean grabForSpidering(String url)
Description copied from interface: ISperowiderModel
If this URL has already been downloaded, return false. Otherwise, mark it as already downloaded and return true. This method really should be synchronized in the implementation.

Specified by:
grabForSpidering in interface ISperowiderModel

markInvalidURL

public void markInvalidURL(String url,
                           int http_code,
                           String http_message)
Description copied from interface: ISperowiderModel
Mark a URL as invalid

Specified by:
markInvalidURL in interface ISperowiderModel

getFileForRectifying

public String getFileForRectifying()
Description copied from interface: ISperowiderModel
Returns a file to be rectified; this will be done after the downloads are all done

Specified by:
getFileForRectifying in interface ISperowiderModel

getRealURLForFoundURL

public String getRealURLForFoundURL(String foundURL)
Description copied from interface: ISperowiderModel
Returns the mapping data as set by ISperowiderModel.mapFoundURLToRealURL(String, String)

Specified by:
getRealURLForFoundURL in interface ISperowiderModel

getFileNameForURL

public String getFileNameForURL(String url)
Description copied from interface: ISperowiderModel
Returns the filename for a mapped URL. Note that this will not attempt to get the real URL from a found URL.

Specified by:
getFileNameForURL in interface ISperowiderModel

init

public void init(Element configNode)
          throws SperowiderInstantiationException
Description copied from interface: IInitializableObject
Initiale this class with the passed in XML configuration element.

Specified by:
init in interface IInitializableObject
Throws:
SperowiderInstantiationException

destroy

public void destroy()
Description copied from interface: ISperowiderModel
Called by the Sperowider to close all open resources

Specified by:
destroy in interface ISperowiderModel

getFoundURLs

public List getFoundURLs(String sourceURL)
                  throws UnsupportedOperationException
This is too expensive for the BasicSperowiderModel, in terms of memory. So this method throws an UnsupportedOperationException, and returns false for isSpiderMapSupported()

Specified by:
getFoundURLs in interface ISperowiderModel
Throws:
UnsupportedOperationException - If the model does not support this method

getSourceURLs

public List getSourceURLs(String foundURL)
                   throws UnsupportedOperationException
This is too expensive for the BasicSperowiderModel, in terms of memory. So this method throws an UnsupportedOperationException, and returns false for isSpiderMapSupported()

Specified by:
getSourceURLs in interface ISperowiderModel
Throws:
UnsupportedOperationException - If the model does not support this method

isSpiderMapSupported

public boolean isSpiderMapSupported()
Returns false, because the BasicSperowiderModel does not support getFoundURLs(String) or getSourceURLs(String).

Specified by:
isSpiderMapSupported in interface ISperowiderModel

getInvalidURLs

public Collection getInvalidURLs()
Returns the list of invalid URLs

Specified by:
getInvalidURLs in interface ISperowiderModel

getGrabbedUrlCount

public int getGrabbedUrlCount()
Description copied from interface: ISperowiderModel
The count of URLs that have been grabbed for download. These URLs are "real", which is to say that all 302s have been followed, and thus are a good indicator of URLs downloaded.

Specified by:
getGrabbedUrlCount in interface ISperowiderModel

getInvalidURLCount

public int getInvalidURLCount()
Description copied from interface: ISperowiderModel
The count of all bad URLs, both found and real.

Specified by:
getInvalidURLCount in interface ISperowiderModel

getRectifiedHTMLFileCount

public int getRectifiedHTMLFileCount()
Description copied from interface: ISperowiderModel
The count of all HTML files that have been "rectified", that have been processed to replace all found URLs with relative URLs to the mapped file names.

Specified by:
getRectifiedHTMLFileCount in interface ISperowiderModel

getUncheckedUrlCount

public int getUncheckedUrlCount()
Description copied from interface: ISperowiderModel
A count of URLs that have not yet been checked. There are likely to be duplicates included, but it represents a good measure of the queue size.

Specified by:
getUncheckedUrlCount in interface ISperowiderModel

getUnRectifiedFileCount

public int getUnRectifiedFileCount()
Description copied from interface: ISperowiderModel
The count of downloaded HTML files that are not yet rectified.

Specified by:
getUnRectifiedFileCount in interface ISperowiderModel

spero logo small Sperowider is
© 2005 Erowid.org