CALL GLOP(): Java NIO with large data files

We have a tool that sequentially reads financial data from massive multi-gigabyte files. It performs a lot of bytes-to-long, bytes-to-double, and utf-string conversions that were implemented in optimized fashion on top of BufferedInputStream.

It turns out that replacing that code with NIO's MappedByteBuffer processes 2 Gb file twice faster, while memory consumption remains the same.

Update: I have been warned and it turned out to be true: Reading 20 Gb file shows reverse results. Now NIO version is 8x slower than "old" stream-based approach. In this case I implemented paging because Java NIO cannot map file regions larger than 2Gb at once (Integer.MAX_VALUE).

On Windows XP 8K is a sweet spot for BufferedInputStream buffer size. Allocating larger buffer actually reduces performance significantly. May be size of mapped region has similar optimum.

CALL GLOP()

Tuesday, August 5, 2008

Java NIO with large data files

Blog Archive