Reading large files in Java - virtual7 GmbH

Recently I got myself into a challenging situation about reading from a large file (over 500,000 lines) at a random line and retrieving a customizable sized chunk of elements from the file. The need for a quick and short computation of such an operation is essential when developing a web application because when the client needs its data if the server response is not fast enough to satisfy the request, the user may even lose interest in the page because it is slow.

Eventually I got my hands on two solutions (both pretty fast). Trough some testing I found out that one of them is faster than the other. Below are code snippets of the solutions followed by further details.

The first solution was to use the RandomAccessFile class from the java.io package . Instances of this class support I / O operations to a random access file (behaves like a large array of bytes stored in the file system). The reading from a random access file can be done from any position given by the file pointer, followed by the quantity of bytes to be read and after that increasing the value of the file pointer.

The second solution was to use the class Files from the java.nio package which contains exclusively static methods that operate on files among which there is the method lines (Path path) that reads from a file as a Stream and returns it. Notable is that the contents of the file are read and processed lazily, meaning that only a small portion of the file is stored in memory at any given time, a very important aspect when working with large files.

Also streams do not provide a mean to directly access or manipulate their elements and are instead concerned declaratively describing their source and the computational operations which will be performed in aggregate on that source. The stream class from java.utils package provides a method skip (long n) that returns a stream containing the remaining elements of the previous applied stream after discarding the first n elements.

The last solution was the fastest. Even though java.io RandomAccessFile ‘s seek (long pos) method uses seek0 (long pos) which is a native method meaning its main objectives are to use already existing legacy non-java code, to achieve machine level / memory level communication, to improve performance of a system, it was slower than the java.util.stream skip (long n) method applied on java.nio.Files lines () method.

Diving into this matter I found out that the functionality from the java.io is from before year 2000, meaning it is the best functionality that they could provide prior to 2000. After that, with the next release of java (codename: Merlin ), major changes were included among which library improvements. So NIO picks up where original I / O leaves of, providing high-speed, block-oriented I / O Java code. By defining classes to hold data, and by processing that data in block, NIO takes advantage of low-level optimizations in a way that the original I / O package could not, without using native code

Also another explication for why the NIO (non-blocking) is faster that I / O (blocking) could be that blocking I / O waits for operations on the data, meanwhile non-blocking I / O doesn’t wait for the completion of operations on data before returning it

The table below represents a set of multiple pair request for both methods tested with the same parameters and variable values to see how much time it was spent until the response was received.

Über
Letzte Artikel

admin

Letzte Artikel von admin (Alle anzeigen)

Reading large files in Java - 19. März 2020
A simple filter bytransient fields of a table with View Criteria and ExecuteWithParams operation in ADF - 2. November 2015
APEX 5 Upgrade issue - 29. Oktober 2015