neuralgym.data

class neuralgym.data.Dataset

Bases: object

Base class for datasets.

Dataset members are automatically logged except members with name ending of ‘_’, e.g. ‘self.fnamelists_’.

data_pipeline(batch_size)

Return batch data with batch size, e.g. return batch_image or return (batch_data, batch_label).

Parameters:batch_size (int) – Batch size.
maybe_download_and_extract()

Abstract class: dataset maybe need download items.

view_dataset_info()

Function to view current dataset information.

class neuralgym.data.DataFromFNames(fnamelists, shapes, random=False, random_crop=False, fn_preprocess=None, dtypes=tf.float32, enqueue_size=32, queue_size=256, nthreads=16, return_fnames=False, filetype='image')

Bases: neuralgym.data.dataset.Dataset

Data pipeline from list of filenames.

Parameters:
  • fnamelists (list) – A list of filenames or tuple of filenames, e.g. [‘image_001.png’, …] or [(‘pair_image_001_0.png’, ‘pair_image_001_1.png’), …].
  • shapes (tuple) – Shapes of data, e.g. [256, 256, 3] or [[256, 256, 3], [1]].
  • random (bool) – Read from fnamelists randomly (default to False).
  • random_crop (bool) – If random crop to the shape from raw image or directly resize raw images to the shape.
  • dtypes (tf.Type) – Data types, default to tf.float32.
  • enqueue_size (int) – Enqueue size for pipeline.
  • enqueue_size – Enqueue size for pipeline.
  • nthreads (int) – Parallel threads for reading from data.
  • return_fnames (bool) – If True, data_pipeline will also return fnames (last tensor).
  • filetype (str) – Currently only support image.

Examples

>>> fnames = ['img001.png', 'img002.png', ..., 'img999.png']
>>> data = ng.data.DataFromFNames(fnames, [256, 256, 3])
>>> images = data.data_pipeline(128)
>>> sess = tf.Session(config=tf.ConfigProto())
>>> tf.train.start_queue_runners(sess)
>>> for i in range(5): sess.run(images)

To get file lists, you can either use file:

with open('data/images.flist') as f:
    fnames = f.read().splitlines()

or glob:

import glob
fnames = glob.glob('data/*.png')
data_pipeline(batch_size)

Batch data pipeline.

Parameters:batch_size (int) – Batch size.
Returns:
A tensor with shape [batch_size] and self.shapes
e.g. if self.shapes = ([256, 256, 3], [1]), then return [[batch_size, 256, 256, 3], [batch_size, 1]].