Bootstrapped by Boost

By Thomas Guest

C++ the language has developed faster than C++ the library. So, whilst there are many ways to initialise an object, there’s no standard way to parse a command line, let alone to serve a website. Fortunately the Boost libraries exist to fill the gap, extending the range of functionality available to C++ programmers.

Boost is no panacea. Although its libraries are extensively reviewed and tested, they sometimes seem designed to show what can be done with C++ rather than what should be done. Some libraries could double up as compiler test suites, exposing toolchain performance and conformance issues. Using Boost means builds take longer and upgrades require attention.

As a result, the code bases I’ve worked on have either avoided or cherry-picked from Boost, often preferring to hand-roll functionality present in the libraries. Recently, though, I have been working on a code base which uses Boost unreservedly.

This code base is a search engine which uses natural language processing (NLP) techniques to locate information in unstructured medical narratives. It’s largely the work of a single programmer, and could not have been constructed without leaning on Boost for graph processing, parsing, memory-mapped files, logging, serialisation, exposing a Python API, and also its extended suites of containers, algorithms and utilities.

This case study explores in detail the use of Boost in this codebase. Although the code is proprietary, I will present real code showing how the libraries fit in and what they do. To provide context, there’ll be an overview of some NLP and search techniques. The session will provide a practical introduction to several of the Boost libraries, and an honest reflection of my experience in using them.

The code examples used in the session can be found in this GitHub repository.