REVIEW - Lucene in action


Lucene in action


Otis Gospodnetić, Erik Hatcher



Manning Publications (2005)




Derek Jones


February 2007



Lucene is an open source search engine written in Java (it requires that the pages to index and search have already been collected, eg, by a web crawler such a Nutch). To be exact Lucene is a search engine library containing classes and methods that programmers can call to create their own customized search engine.

This book is essentially a 'how-to' guide for creating a search engine using Lucene. I found it to be very readable and the extensive code-snippets were on the whole useful (ie, not just padding).

The discussion starts with how to index web pages, the various tradeoffs involved and how the various Lucene options can be used to tune an index to have the desired characteristics. This is followed by a very interesting discussion of how to parse the search queries, dealing with issues such as what constitutes a token and possible ways of dealing with various forms of the same root word (e.g., past/future tense, singular vs. plural). Subsequent chapters deal with more advanced topics including extending the search engine, performance testing, parsing common document formats and the book ends with a discussion of various applications (written by people involved with implementing these applications). The thickness of the book is kept down by not duplicating the online documentation by including a detailed listing of the API.

If you are building a search engine using Lucene this book is a must have. Even if you don't plan to build your own search engine this book provides a fascinating discussion of the nut-and-bolts issues involved in creating one.

Book cover image courtesy of Open Library.

Your Privacy

By clicking "Accept Non-Essential Cookies" you agree ACCU can store non-essential cookies on your device and disclose information in accordance with our Privacy Policy and Cookie Policy.

Current Setting: Non-Essential Cookies REJECTED

By clicking "Include Third Party Content" you agree ACCU can forward your IP address to third-party sites (such as YouTube) to enhance the information presented on this site, and that third-party sites may store cookies on your device.

Current Setting: Third Party Content EXCLUDED

Settings can be changed at any time from the Cookie Policy page.