Andy Malakov software blog

Tuesday, August 5, 2008

Java NIO with large data files

We have a tool that sequentially reads financial data from massive multi-gigabyte files. It performs a lot of bytes-to-long, bytes-to-double, and utf-string conversions that were implemented in optimized fashion on top of BufferedInputStream.

It turns out that replacing that code with NIO's MappedByteBuffer processes 2 Gb file twice faster, while memory consumption remains the same.


Update: I have been warned and it turned out to be true: Reading 20 Gb file shows reverse results. Now NIO version is 8x slower than "old" stream-based approach. In this case I implemented paging because Java NIO cannot map file regions larger than 2Gb at once (Integer.MAX_VALUE).

On Windows XP 8K is a sweet spot for BufferedInputStream buffer size. Allocating larger buffer actually reduces performance significantly. May be size of mapped region has similar optimum.