Overview

Sperowider is an application and an applet written in Java that mirrors a site, builds an index, and replaces the native site search with an applet based search. Sperowider is the short-hand term we use to describe the SperoTools suite which consists of the Sperowider server spider and the Sperosearch client-side search applet.

News

Reason for Development

The design purpose of Sperowider is to make high-quality, very useable flat-HTML versions of websites which use (simple) javascript DHTML. Although there are quite a few spiders and archivers available, we were unable to get any to produce reliable archives of complex sites. Many archivers, in fact, lose their brains on large, complex sites. We wanted an archiver that could produce a flat-version of sites with dynamic content that would be suitable to put on a CD or other local media to be used off-line. Another major feature we wanted was a client-side search applet that would provide some basic ability to search the offline archive.

Design Features

Open-source, platform-independent, java-based, throttleable, combining multiple spidering sessions, client-side search applet, command line and configuration controllable.

System Requirements

Sperowider: To run the spider: java 1.4.x. Sperowider has only been tested on *nix platforms. We have no plans to test it on windows.

To View the HTML: CSS-compatible HTML browser (Mozilla / Firebird, OSX Safari, Internet Explorer 5+, Opera 7+, etc)

SperoSearch: HTML & Search Applet: You must have installed and your browser MUST support java plugin 1.4+. Note that on OSX, as of March 10, 2004, only Safari supports this.

Status

Sperowider released 1.3 on June 3, 2005.

1.3 represents a hundred hours of testing and a number of new features, including a gzip-compressed file system for the search index, making sperosearch much smaller and faster.

Developers

1.x was primarily the work of GuruStu and Earth. If you are interested in helping with either coding, documentation, or design, please use the SourceForge forums to let us know.

Downloads

Sperowider is available via CVS and SourceForge Downloads.

SourceForge Page

http://sourceforge.net/projects/sperowider/

Documents

Sperowider HowTo
A brief introduction to how to use Sperowider to spider your site
Sourcecode Documentation
JavaDoc code documentation for the software.
Roadmap for Development
A description of the future development goals.

Software Components

Sperowider is built using other software packages compatible the the BSD license : Log4j, Lucene, Hypersonic SQL, JDOM

This product includes software developed by the Apache Software Foundation (http://www.apache.org/).
This product includes Hypersonic SQL.
This product includes software developed by the JDOM Project (http://www.jdom.org/).

Help Wanted

Sperowider is functional at 1.x, but there are several things we can use help with: mirror sites for sperowider flat file versions, download sites for sperowidered archives of erowid, logo design help, dedicated java programmers

Related Projects

Heritrix : Archive.org's java based web crawler, active development late 2004

License & Copyright Summary

Generally, Sperowider's source code is free for use as long as attribution and copyright notice are retained. Sperowider's source code is compliant with the Creative Commons Attribution License, but there are some additional restrictions around the use of the name Sperowider. Absolutely no warranty of any kind is implied. Use of the name Sperowider or other unique names for this project in advertising is prohibited without prior approval. Erowid is a registered trademark of Erowid.org and Erowid retains control of the name Sperowider. For more information about the licensing issues, see the Sperowider License.

Erowid is a non-commercial project, supported by donations, dedicated to improving the quality, quantity, and availability of information.