DataLoader

class DataLoader(dataset, sampler=None, transform=None, collator=None, num_workers=0, timeout=0, preload=False, parallel_stream=False)[source]

Data loader. Combines a dataset and a sampler, and provides a convenient way to iterate on a given dataset. The process is as follows:

flowchart LR Dataset.__len__ -- Sampler --> Indices batch_size -- Sampler --> Indices Indices -- Dataset.__getitem__ --> Samples Samples -- Transform + Collator --> mini-batch

See Use Data to build the input pipeline for more details.

Parameters
  • dataset (Dataset) – dataset from which to load the minibatch.

  • sampler (Optional[Sampler]) – defines the strategy to sample data from the dataset. If None, it will sequentially sample from the dataset one by one.

  • transform (Optional[Transform]) – defined the transforming strategy for a sampled batch.

  • collator (Optional[Collator]) – defined the merging strategy for a transformed batch.

  • num_workers (int) – the number of sub-process to load, transform and collate the batch. 0 means using single-process. Default: 0

  • timeout (int) – if positive, means the timeout value(second) for collecting a batch from workers. Default: 0

  • preload (bool) – whether to enable the preloading strategy of the dataloader. When enabling, the dataloader will preload one batch to the device memory to speed up the whole training process.

  • parallel_stream (bool) – whether to splitting workload across all workers when dataset is streamdataset and num_workers > 0. When enabling, each worker will collect data from different dataset in order to speed up the whole loading process. See ref:streamdataset-example for more details

Examples

>>> import megengine.data as data
>>> dataset = CustomDataset()         
>>> dataloader = DataLoader(dataset)  
>>> for batch_data in DataLoader:     
>>>     print(batch_data.shape)       

The effect of enabling preload

  • All elements in map, list, and tuple will be converted to Tensor by preloading, and you will get Tensor instead of the original Numpy array or Python built-in data structrure.

  • Tensors’ host2device copy and device kernel execution will be overlapped, which will improve the training speed at the cost of higher device memory usage (due to one more batch data on device memory). This feature saves more time when your NN training time is short or your machine’s host PCIe bandwidth for each device is low.