REVIEW - Programming Spiders, Bots, and Aggregators in Java

Title:

Programming Spiders, Bots, and Aggregators in Java

Author:

Jeff Heaton

ISBN:

0782140408

Publisher:

John Wiley and Sons (2002)

Pages:

516pp

Reviewer:

Mathew Davies

Reviewed:

June 2003

Rating:

3 out of 5

After reading the book, I feel fairly confident that I could build an application to cruise around the Web and gather information on my behalf.

Prior to reading this book, I knew very little about how e-mail works or the mechanics of browsing the World Wide Web. After reading the book, I feel fairly confident that I could build an application to cruise around the Web and gather information on my behalf.

I am no expert on networks; nor am I one of the Java cognoscenti. I was therefore relieved to find that the opening chapters of this book provide a basic introduction to how PCs communicate with remote servers. What's more, the author provides a series of small Java applications to illustrate the process. For example, the book's initial description of SMTP includes a Java application that lets you watch the conversation between your own machine and the server at the other end. Call me childish if you will, but I was hooked.

As the book progresses, so the author focuses in on the HTTP (and HTTPS) protocols, showing how to access, parse and then use the contents of various (e.g. HTML) files distributed around the World Wide Web. This leads naturally to the development of those bots, aggregators and spiders that are mentioned in the book's title. What are they? A bot will collect data from a web site on your behalf; an aggregator will collect, combine and process data for you from a number of different sites; while a spider will carry out an autonomous trawl around the Web on your behalf.

As the book's title suggests, the author uses Java as his language of choice to demonstrate working applications. There are a couple of important points to be made here: first, you don't have to use Java to build bots, aggregators and spiders; and second, the CD that accompanies the book includes not only the full Java source for the applications in the main text but also the author's complete package for building applications of this type. On this latter point, each chapter of the main text includes a detailed explanation of the corresponding classes from the Java package, while there is a separate appendix dedicated to a description of the package as a whole.

All in all, I consider this to be an informative and well written book. If you're interested in building applications to ferret around the Internet on your behalf, I can wholeheartedly recommend this one; even if, like me, you are not a Java guru!


Book cover image courtesy of Open Library.