Dataset¶
- class Dataset[source]¶
- An abstract base class for all map-style datasets. - Abstract methods - All subclasses should overwrite these two methods: - __getitem__(): fetch a data sample for a given key.
- __len__(): return the size of the dataset.
 - They play roles in the data pipeline, see the description below. - Dataset in the Data Pipline - Usually a dataset works with - DataLoader,- Sampler,- Collatorand other components.- For example, the sampler generates indexes of batches in advance according to the size of the dataset (calling - __len__), When dataloader need to yield a batch of data, pass indexes into the- __getitem__method, then collate them to a batch.- Highly recommended reading Use Dataset to define a data set for more details; 
- It might helpful to read the implementation of - MNIST,- CIFAR10and other existed subclass.
 - Warning - By default, all elements in a dataset would be - numpy.ndarray. It means that if you want to do Tensor operations, it’s better to do the conversion explicitly, such as:- dataset = MyCustomDataset() # A subclass of Dataset data, label = MyCustomDataset[0] # equals to MyCustomDataset.__getitem__[0] data = Tensor(data, dtype="float32") # convert to MegEngine Tensor explicitly megengine.functional.ops(data) - Tensor ops on ndarray directly are undefined behaviors.