REVIEW - Effective Awk Programming - Text Processing and Pattern Matching


Effective Awk Programming

Text Processing and Pattern Matching


Arnold Robbins



O'Reilly (2001)




Peter Tillier


December 2001



The book is very well written and full of useful information for the awk/gawk programmer

Some years ago I read ' sed and awk ' by Dale Dougherty and became interested in these tools as an aid to text processing. Two years later I bought a copy of ' sed and awk, 2nd Edition ' by Dale Dougherty and Arnold Robbins; this combined with considerable use of sed and awk confirmed my liking for both tools. I have used awk more in practice and have ported both it and sed, to some platforms that have limited memory and disk space and use them in preference to perl, whose implementation is usually much larger.

' Effective awk Programming, 3rd Edition ' (EaP3) provides a very good extension to the awk section of the second of the 'sed and awk' books. EaP3 and its previous paper and electronic editions are effectively the manual for GNU FSF's version of awk, GNU awk, known as gawk. Arnold Robbins has been the principal maintainer for gawk for several years, so the information about gawk is definitive.

EaP3 discusses the features that are common to all awk implementations and highlights those that are different in gawk. The gawk discussion is based upon version 3.1.0 of gawk and so includes discussion of the recently added networking capabilities. These make gawk a much more powerful, high-level, language.

The first part, 'The awk Language and gawk', discusses awk's statements, variables, arrays and general usage. The explanation is clear, concise and clarifies a number of 'dark corners' of the language. There are plenty of useful examples that demonstrate the language's features and can be extended into practical awk programs. This part finishes with three chapters that cover major extensions only available in gawk, such as those that support internationalisation, inter-process communication and profiling.

The second part, 'Using awk and gawk', comprises three chapters of examples of awk functions, awk programs and gawk networking. The examples are working code that can form the basis of your own programs. The chapter about awk functions provides a set of these, which are used in the working programs in the following chapter. The examples in the latter show how many UNIX utilities, such as cut, egrep and tee, can be implemented in awk, ending with a shell script, igawk, which supports the production of large gawk programs by using an @include facility that works in a similar way to the C preprocessor's #includes. In the networking chapter tcp, pop3 and http service routines are described and then used to implement a simple web server and CGI script; the final example is a simple mobile agent that is designed to migrate through a network of servers.

The last part of the book 'Appendices' discusses the history of awk and gawk, how to compile and install gawk, how to extend gawk functionality by adding extra built-in functions, basic programming concepts and the relevant GNU Licences.

The book is very well written and full of useful information for the awk/gawk programmer and so I rate it very highly. One question that many will ask is why pay $39.95 (I don't have the UK price to hand) for information that is free (the book's content is supplied in various forms with the gawk distribution)? My reasons in favour of buying the book are; a) as I am a frequent user of awk I find it useful to have a paper copy of the material that I can carry with me and b) part of the profits from the book sales go to the FSF and Arnold Robbins and help to fund further development by the FSF - an endeavour that I wholeheartedly support. For those who can't afford to or who don't wish to buy the book O'Reilly are providing a copy in DocBook form on their web site and a Windows Help file version is a part of the gawk3.1.0 Win32 distribution.

My copy of 'sed and awk, 2nd. ed.' is the third most-heavily thumbed in my book shelves; 'Algorithms + Data Structures = Programs' (Niklaus Wirth) and K& R I are first and second. I suspect that this book may well eventually move into third place, but there are very useful example programs in 'sed and awk, 2nd ed.' so the two complement one another well.

Book cover image courtesy of Open Library.

Your Privacy

By clicking "Accept Non-Essential Cookies" you agree ACCU can store non-essential cookies on your device and disclose information in accordance with our Privacy Policy and Cookie Policy.

Current Setting: Non-Essential Cookies REJECTED

By clicking "Include Third Party Content" you agree ACCU can forward your IP address to third-party sites (such as YouTube) to enhance the information presented on this site, and that third-party sites may store cookies on your device.

Current Setting: Third Party Content EXCLUDED

Settings can be changed at any time from the Cookie Policy page.