CPU Optimized Data Structures - More Fun with Assembler

By CB Bailey

Many modern CPUs provide an extremely rich set of instructions and features that enable some highly specialised optimizations for particular use cases.

This talk takes an educational problem and investigates whether we can optimize the representation of our problem in a way that allows for a much higher performance solution than an "obvious" solution in a generic programming language, such as C++, might achieve.

Our example problem will be the evaluation of poker hands, looking for the optimal way to test for straights, flushes, full houses using all the features of a reasonably modern x86-64 architecture CPU.

We’ll have a brief tour of some of the available SIMD instructions and their performance benefits and the costs of manipulating our data into a form where they can be used.

Finally we will pose the question: is it possible to meet the performance of our custom solution using "generic" C++ and an optimizing compiler?