Farewell to Disks: Efficient Processing of Obstinate Data

By Diomidis Spinellis

Questions whose answer requires sophisticated processing of huge data sets come up increasingly often in our networked and interlinked, and (increasingly) DNA-sequenced world. Attacking such problems with traditional techniques, such as loading data into memory for processing or querying a relational database, is cumbersome and inefficient. Data sizes are growing inexorably, while disk-based data structures and applications relying on them, optimized to handle sequential retrievals and relational joins, often prove inadequate for running complex algorithms. Therefore, sophisticated processing of huge complex data sets requires us to rethink the relationship between disk-based storage and main-memory processing.

Some features of modern systems, namely 64-bit architectures, memory mapped sparse files, virtual memory, and copy on write support, allow us to process our data with readable and efficient RAM-based algorithms, using slow disks and filesystems only for their large capacity and to secure the data's persistence. I demonstrate this approach through a series of C++ programs that run on Wikipedia's data looking for matching words and links between unrelated entries. Through these programs I will show how we can use STL containers, iterators, and algorithms to access disk-based data without performing any system calls. Although RAM-based processing opens up again many problems that database systems already solve, I will argue that such processing is the right move, because it provides us with a unified programming and performance model for all our data operations irrespective of where the data resides.





Your Privacy

By clicking "Accept Non-Essential Cookies" you agree ACCU can store non-essential cookies on your device and disclose information in accordance with our Privacy Policy and Cookie Policy.

Current Setting: Non-Essential Cookies REJECTED


By clicking "Include Third Party Content" you agree ACCU can forward your IP address to third-party sites (such as YouTube) to enhance the information presented on this site, and that third-party sites may store cookies on your device.

Current Setting: Third Party Content EXCLUDED



Settings can be changed at any time from the Cookie Policy page.