torch.async-hdf5-reader
20 Apr 2016I decided to share this script I was using in order to accelerate the training of neural nets making them to gather data in parallel. So I wrote some tests to check it performs well its basic functions and commented the code a little bit.
Dependencies
It depends on the following libraries (they can be installed with luarocks):
- threads
- hdf5
- cutorch (will be optional in the future)
Usage
Given an hdf5 file with a dataset of 4D (e.g. num_examples, channels, height, width) data with 2D label data (e.g. num_examples, labels), it provides a class for asynchronously getting miniBatches:
This will initialize the class and copy the necessary information to the thread pool. Then one can call asyncReader:fetchData()
in order to make a thread to retrieve a batch from the database. This is an asynchronous call so other code can be executed while the batch is being prefetched.
At the point where we need the data, the blocking call asyncReader:getNextBatch()
can be used in order to get the data and labels tensors. Memory is allocated once at the class initialization and thus the returned tensors always reuse the same memory. In fact, tensors are duplicated so that one can read and write the retreived ones while a thread is filling the other ones.
If any error, doubt, etc. please tell me.