REVIEW - Spoken Language Processing - A Guide to Theory, Algorithm, and System Development


Spoken Language Processing

A Guide to Theory, Algorithm, and System Development


Xuedong Huang, Alex Acero, Alejandro Acero, Hsiao-Wuen Hon



Prentice Hall (2001)




Jamie Allsop


December 2001



If you work in speech processing buy this book, you will not regret it.

Let me begin by highly recommending this book. It is destined to become a classic in its field. Here's why.

Spoken Language Processing is written by three well-known speech processing researchers; XueDong Huang, Alex Acero and Hsiao-Wuen Hon. Current researchers should recognise at least one of these names. Currently working for Microsoft (don't let that put you off!) in a R&D capacity they all hailed from Carnegie Mellon University, well known for its contribution to speech processing research. The raison d'ĂȘtre for the book in the authors' words is to bridge the gap between industry gurus and newcomers to speech processing. Why is this necessary? Well the main reason is that there are very few books on speech processing available and most information can only be found in the highly inaccessible form of research papers and conference proceedings, which often deliberately lack key information to prevent techniques being directly copied.

So what does this book cover? Quite a lot as it happens. Weighing in at 980 pages the book is a comprehensive overview of most of the major topics associated with speech processing. Divided into five main sections the book is well structured with a clear division of concerns. The title, "Spoken Language Processing", may be misleading to some as language processing topics only accounts for one section of the book. In this review I have opted to use the more general term speech processing.

The first two sections cover the fundamental theories that should be understood before embarking in-depth into a study of speech processing. This may seem an obvious approach but many texts do not follow this pattern making their use as reference tomes limited. Separating background theory from its use is also useful in that it allows a rigorous approach to its description. Too often texts give a hurried imprecise overview of theories used before launching into a long and complex use of the theory; losing the reader instantly in a quagmire of formulae.

The first two sections of the book, totalling approximately 360 pages deals with background material, material that the reader should at least understand the key concepts of. The first section concentrates on speech in general (including production and perception), probability and statistics and pattern classification. These last two topics mentioned are both important parts of the book and are dealt with in their own chapters. Both are well written with the right amount of explanation and background. Much of the remainder of the book expects at least some familiarity with the material presented here. These chapters, like all chapters in the book finish with a section entitled, "Historical Perspective and Further Reading", giving the reader just that. The inclusion of recommended further reading, in addition to the vast number of references appearing in each chapter, make the book as a whole a very good starting point for any work in speech processing.

The second section concerns itself with the DSP topics which relate to speech processing. In this section the reader will find everything from FFTs to multi-rate signal processing and speech signal representations to speech coding. Again the section is well written and the reader is not forced to refer to other texts to understand what is written. If a topic is not expanded upon here then it is an indication that is not dealt further in any great depth in the remainder of the book.

The third section of the book covers speech recognition and is probably the section which will find most use with many readers. This section alone is over 300 pages long and is thorough in its treatment of the subject. It starts immediately with a discussion of HMMs (Hidden Markov Models) which are almost exclusively the method employed in the pattern matching stage of speech recognition. Any algorithms that are mentioned are also detailed which really make the book useful. In fact algorithms are presented throughout the book making it a practical reference as much as a theoretical one. This is important because there is a big jump from understanding theory to having the know-how to implement an algorithm to exploit that theory. Other topics covered include an excellent chapter on environmental robustness with one of the best discussions of microphones I have seen. Language modelling and search algorithms are given a thorough treatment. I would like to have seen more detailed information on front-end processing and endpoint detection as this remains a critical stageof the recognition process. Perhaps the level of detail reflects the fact that this is currently a hot research topic with potential for significant advancement. The penultimate section of the book covers speech synthesis. Speech synthesis is an important aspect of any dialogue driven speech application. Much of the problem of text-to-speech synthesis results from the synthesisers lack of understanding of what it is saying. Much of the research in this field is geared to towards addressing this problem and a comprehensive overview of the areas of interest are presented here.The final section covers spoken language systems, ranging from a discussion of spoken language understanding to speech interfaces in real systems. The book finishes with a look at Microsoft's research prototype MiPad. Using a MiPad as a case study is a great way of putting the previous material of the book into context and giving the reader a glimpse of the many considerations involved in creating real recognition engines and applications. In summary this is a great book. The sub-title of the book is, A Guide to Theory, Algorithm, and System Development , and that, I think, sums up the content quite well. It is reminiscent of the famous, Discrete-Time Processing of Speech Signals , by Deller, Proakis and Hansen, and could almost be considered as an up-to-date version of that classic tome. It too covers similar material to that presented in this book, though the emphasis is different. If you work in speech processing buy this book, you will not regret it, and if you don't already own Discrete-Time Processing of Speech Signals buy that as well.

Book cover image courtesy of Open Library.

Your Privacy

By clicking "Accept All Cookies" you agree ACCU can store cookies on your device and disclose information in accordance with our Privacy Policy and Cookie Policy.

By clicking "Share IP Address" you agree ACCU can forward your IP address to third-party sites to enhance the information presented on the site, and that these sites may store cookies on your device.