REVIEW - Programming Spiders, Bots, and Aggregators in Java


Programming Spiders, Bots, and Aggregators in Java


Jeff Heaton



John Wiley and Sons (2002)




Mathew Davies


June 2003



After reading the book, I feel fairly confident that I could build an application to cruise around the Web and gather information on my behalf.

Prior to reading this book, I knew very little about how e-mail works or the mechanics of browsing the World Wide Web. After reading the book, I feel fairly confident that I could build an application to cruise around the Web and gather information on my behalf.

I am no expert on networks; nor am I one of the Java cognoscenti. I was therefore relieved to find that the opening chapters of this book provide a basic introduction to how PCs communicate with remote servers. What's more, the author provides a series of small Java applications to illustrate the process. For example, the book's initial description of SMTP includes a Java application that lets you watch the conversation between your own machine and the server at the other end. Call me childish if you will, but I was hooked.

As the book progresses, so the author focuses in on the HTTP (and HTTPS) protocols, showing how to access, parse and then use the contents of various (e.g. HTML) files distributed around the World Wide Web. This leads naturally to the development of those bots, aggregators and spiders that are mentioned in the book's title. What are they? A bot will collect data from a web site on your behalf; an aggregator will collect, combine and process data for you from a number of different sites; while a spider will carry out an autonomous trawl around the Web on your behalf.

As the book's title suggests, the author uses Java as his language of choice to demonstrate working applications. There are a couple of important points to be made here: first, you don't have to use Java to build bots, aggregators and spiders; and second, the CD that accompanies the book includes not only the full Java source for the applications in the main text but also the author's complete package for building applications of this type. On this latter point, each chapter of the main text includes a detailed explanation of the corresponding classes from the Java package, while there is a separate appendix dedicated to a description of the package as a whole.

All in all, I consider this to be an informative and well written book. If you're interested in building applications to ferret around the Internet on your behalf, I can wholeheartedly recommend this one; even if, like me, you are not a Java guru!

Book cover image courtesy of Open Library.

Your Privacy

By clicking "Accept Non-Essential Cookies" you agree ACCU can store non-essential cookies on your device and disclose information in accordance with our Privacy Policy and Cookie Policy.

Current Setting: Non-Essential Cookies REJECTED

By clicking "Include Third Party Content" you agree ACCU can forward your IP address to third-party sites (such as YouTube) to enhance the information presented on this site, and that third-party sites may store cookies on your device.

Current Setting: Third Party Content EXCLUDED

Settings can be changed at any time from the Cookie Policy page.