Shortcuts

Vision DataModules

The following are pre-built datamodules for computer-vision.


Supervised learning

These are standard vision datasets with the train, test, val splits pre-generated in DataLoaders with the standard transforms (and Normalization) values

BinaryMNIST

class pl_bolts.datamodules.binary_mnist_datamodule.BinaryMNISTDataModule(data_dir, val_split=5000, num_workers=16, normalize=False, seed=42, *args, **kwargs)[source]

Bases: pytorch_lightning.LightningDataModule

MNIST
Specs:
  • 10 classes (1 per digit)

  • Each image is (1 x 28 x 28)

Binary MNIST, train, val, test splits and transforms

Transforms:

mnist_transforms = transform_lib.Compose([
    transform_lib.ToTensor()
])

Example:

from pl_bolts.datamodules import BinaryMNISTDataModule

dm = BinaryMNISTDataModule('.')
model = LitModel()

Trainer().fit(model, dm)
Parameters
  • data_dir (str) – where to save/load the data

  • val_split (int) – how many of the training images to use for the validation split

  • num_workers (int) – how many workers to use for loading data

  • normalize (bool) – If true applies image normalize

prepare_data()[source]

Saves MNIST files to data_dir

test_dataloader(batch_size=32, transforms=None)[source]

MNIST test set uses the test split

Parameters
  • batch_size – size of batch

  • transforms – custom transforms

train_dataloader(batch_size=32, transforms=None)[source]

MNIST train set removes a subset to use for validation

Parameters
  • batch_size – size of batch

  • transforms – custom transforms

val_dataloader(batch_size=32, transforms=None)[source]

MNIST val set uses a subset of the training set for validation

Parameters
  • batch_size – size of batch

  • transforms – custom transforms

property num_classes[source]

Return: 10

CityScapes

class pl_bolts.datamodules.cityscapes_datamodule.CityscapesDataModule(data_dir, val_split=5000, num_workers=16, batch_size=32, seed=42, *args, **kwargs)[source]

Bases: pytorch_lightning.LightningDataModule

Cityscape

Standard Cityscapes, train, val, test splits and transforms

Specs:
  • 30 classes (road, person, sidewalk, etc…)

  • (image, target) - image dims: (3 x 32 x 32), target dims: (3 x 32 x 32)

Transforms:

transforms = transform_lib.Compose([
    transform_lib.ToTensor(),
    transform_lib.Normalize(
        mean=[0.28689554, 0.32513303, 0.28389177],
        std=[0.18696375, 0.19017339, 0.18720214]
    )
])

Example:

from pl_bolts.datamodules import CityscapesDataModule

dm = CityscapesDataModule(PATH)
model = LitModel()

Trainer().fit(model, dm)

Or you can set your own transforms

Example:

dm.train_transforms = ...
dm.test_transforms = ...
dm.val_transforms  = ...
Parameters
  • data_dir – where to save/load the data

  • val_split – how many of the training images to use for the validation split

  • num_workers – how many workers to use for loading data

  • batch_size – number of examples per training/eval step

prepare_data()[source]

Saves Cityscapes files to data_dir

test_dataloader()[source]

Cityscapes test set uses the test split

train_dataloader()[source]

Cityscapes train set with removed subset to use for validation

val_dataloader()[source]

Cityscapes val set uses a subset of the training set for validation

property num_classes[source]

Return: 30

CIFAR-10

class pl_bolts.datamodules.cifar10_datamodule.CIFAR10DataModule(data_dir=None, val_split=5000, num_workers=16, batch_size=32, seed=42, *args, **kwargs)[source]

Bases: pytorch_lightning.LightningDataModule

CIFAR-10
Specs:
  • 10 classes (1 per class)

  • Each image is (3 x 32 x 32)

Standard CIFAR10, train, val, test splits and transforms

Transforms:

mnist_transforms = transform_lib.Compose([
    transform_lib.ToTensor(),
    transforms.Normalize(
        mean=[x / 255.0 for x in [125.3, 123.0, 113.9]],
        std=[x / 255.0 for x in [63.0, 62.1, 66.7]]
    )
])

Example:

from pl_bolts.datamodules import CIFAR10DataModule

dm = CIFAR10DataModule(PATH)
model = LitModel()

Trainer().fit(model, dm)

Or you can set your own transforms

Example:

dm.train_transforms = ...
dm.test_transforms = ...
dm.val_transforms  = ...
Parameters
  • data_dir (Optional[str]) – where to save/load the data

  • val_split (int) – how many of the training images to use for the validation split

  • num_workers (int) – how many workers to use for loading data

  • batch_size (int) – number of examples per training/eval step

prepare_data()[source]

Saves CIFAR10 files to data_dir

test_dataloader()[source]

CIFAR10 test set uses the test split

train_dataloader()[source]

CIFAR train set removes a subset to use for validation

val_dataloader()[source]

CIFAR10 val set uses a subset of the training set for validation

property num_classes[source]

Return: 10

FashionMNIST

class pl_bolts.datamodules.fashion_mnist_datamodule.FashionMNISTDataModule(data_dir, val_split=5000, num_workers=16, seed=42, *args, **kwargs)[source]

Bases: pytorch_lightning.LightningDataModule

Fashion MNIST
Specs:
  • 10 classes (1 per type)

  • Each image is (1 x 28 x 28)

Standard FashionMNIST, train, val, test splits and transforms

Transforms:

mnist_transforms = transform_lib.Compose([
    transform_lib.ToTensor()
])

Example:

from pl_bolts.datamodules import FashionMNISTDataModule

dm = FashionMNISTDataModule('.')
model = LitModel()

Trainer().fit(model, dm)
Parameters
  • data_dir (str) – where to save/load the data

  • val_split (int) – how many of the training images to use for the validation split

  • num_workers (int) – how many workers to use for loading data

prepare_data()[source]

Saves FashionMNIST files to data_dir

test_dataloader(batch_size=32, transforms=None)[source]

FashionMNIST test set uses the test split

Parameters
  • batch_size – size of batch

  • transforms – custom transforms

train_dataloader(batch_size=32, transforms=None)[source]

FashionMNIST train set removes a subset to use for validation

Parameters
  • batch_size – size of batch

  • transforms – custom transforms

val_dataloader(batch_size=32, transforms=None)[source]

FashionMNIST val set uses a subset of the training set for validation

Parameters
  • batch_size – size of batch

  • transforms – custom transforms

property num_classes[source]

Return: 10

Imagenet

class pl_bolts.datamodules.imagenet_datamodule.ImagenetDataModule(data_dir, meta_dir=None, num_imgs_per_val_class=50, image_size=224, num_workers=16, batch_size=32, *args, **kwargs)[source]

Bases: pytorch_lightning.LightningDataModule

Imagenet
Specs:
  • 1000 classes

  • Each image is (3 x varies x varies) (here we default to 3 x 224 x 224)

Imagenet train, val and test dataloaders.

The train set is the imagenet train.

The val set is taken from the train set with num_imgs_per_val_class images per class. For example if num_imgs_per_val_class=2 then there will be 2,000 images in the validation set.

The test set is the official imagenet validation set.

Example:

from pl_bolts.datamodules import ImagenetDataModule

dm = ImagenetDataModule(IMAGENET_PATH)
model = LitModel()

Trainer().fit(model, dm)
Parameters
  • data_dir (str) – path to the imagenet dataset file

  • meta_dir (Optional[str]) – path to meta.bin file

  • num_imgs_per_val_class (int) – how many images per class for the validation set

  • image_size (int) – final image size

  • num_workers (int) – how many data workers

  • batch_size (int) – batch_size

prepare_data()[source]

This method already assumes you have imagenet2012 downloaded. It validates the data using the meta.bin.

Warning

Please download imagenet on your own first.

test_dataloader()[source]

Uses the validation split of imagenet2012 for testing

train_dataloader()[source]

Uses the train split of imagenet2012 and puts away a portion of it for the validation split

train_transform()[source]

The standard imagenet transforms

transform_lib.Compose([
    transform_lib.RandomResizedCrop(self.image_size),
    transform_lib.RandomHorizontalFlip(),
    transform_lib.ToTensor(),
    transform_lib.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])
val_dataloader()[source]

Uses the part of the train split of imagenet2012 that was not used for training via num_imgs_per_val_class

Parameters
  • batch_size – the batch size

  • transforms – the transforms

val_transform()[source]

The standard imagenet transforms for validation

transform_lib.Compose([
    transform_lib.Resize(self.image_size + 32),
    transform_lib.CenterCrop(self.image_size),
    transform_lib.ToTensor(),
    transform_lib.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])
property num_classes[source]

Return:

1000

MNIST

class pl_bolts.datamodules.mnist_datamodule.MNISTDataModule(data_dir='./', val_split=5000, num_workers=16, normalize=False, seed=42, batch_size=32, *args, **kwargs)[source]

Bases: pytorch_lightning.LightningDataModule

MNIST
Specs:
  • 10 classes (1 per digit)

  • Each image is (1 x 28 x 28)

Standard MNIST, train, val, test splits and transforms

Transforms:

mnist_transforms = transform_lib.Compose([
    transform_lib.ToTensor()
])

Example:

from pl_bolts.datamodules import MNISTDataModule

dm = MNISTDataModule('.')
model = LitModel()

Trainer().fit(model, dm)
Parameters
  • data_dir (str) – where to save/load the data

  • val_split (int) – how many of the training images to use for the validation split

  • num_workers (int) – how many workers to use for loading data

  • normalize (bool) – If true applies image normalize

prepare_data()[source]

Saves MNIST files to data_dir

test_dataloader(batch_size=32, transforms=None)[source]

MNIST test set uses the test split

Parameters
  • batch_size – size of batch

  • transforms – custom transforms

train_dataloader(batch_size=32, transforms=None)[source]

MNIST train set removes a subset to use for validation

Parameters
  • batch_size – size of batch

  • transforms – custom transforms

val_dataloader(batch_size=32, transforms=None)[source]

MNIST val set uses a subset of the training set for validation

Parameters
  • batch_size – size of batch

  • transforms – custom transforms

property num_classes[source]

Return: 10


Semi-supervised learning

The following datasets have support for unlabeled training and semi-supervised learning where only a few examples are labeled.

Imagenet (ssl)

class pl_bolts.datamodules.ssl_imagenet_datamodule.SSLImagenetDataModule(data_dir, meta_dir=None, num_workers=16, *args, **kwargs)[source]

Bases: pytorch_lightning.LightningDataModule

STL-10

class pl_bolts.datamodules.stl10_datamodule.STL10DataModule(data_dir=None, unlabeled_val_split=5000, train_val_split=500, num_workers=16, batch_size=32, seed=42, *args, **kwargs)[source]

Bases: pytorch_lightning.LightningDataModule

STL-10
Specs:
  • 10 classes (1 per type)

  • Each image is (3 x 96 x 96)

Standard STL-10, train, val, test splits and transforms. STL-10 has support for doing validation splits on the labeled or unlabeled splits

Transforms:

mnist_transforms = transform_lib.Compose([
    transform_lib.ToTensor(),
    transforms.Normalize(
        mean=(0.43, 0.42, 0.39),
        std=(0.27, 0.26, 0.27)
    )
])

Example:

from pl_bolts.datamodules import STL10DataModule

dm = STL10DataModule(PATH)
model = LitModel()

Trainer().fit(model, dm)
Parameters
  • data_dir (Optional[str]) – where to save/load the data

  • unlabeled_val_split (int) – how many images from the unlabeled training split to use for validation

  • train_val_split (int) – how many images from the labeled training split to use for validation

  • num_workers (int) – how many workers to use for loading data

  • batch_size (int) – the batch size

prepare_data()[source]

Downloads the unlabeled, train and test split

test_dataloader()[source]

Loads the test split of STL10

Parameters
  • batch_size – the batch size

  • transforms – the transforms

train_dataloader()[source]

Loads the ‘unlabeled’ split minus a portion set aside for validation via unlabeled_val_split.

train_dataloader_mixed()[source]

Loads a portion of the ‘unlabeled’ training data and ‘train’ (labeled) data. both portions have a subset removed for validation via unlabeled_val_split and train_val_split

Parameters
  • batch_size – the batch size

  • transforms – a sequence of transforms

val_dataloader()[source]

Loads a portion of the ‘unlabeled’ training data set aside for validation The val dataset = (unlabeled - train_val_split)

Parameters
  • batch_size – the batch size

  • transforms – a sequence of transforms

val_dataloader_mixed()[source]

Loads a portion of the ‘unlabeled’ training data set aside for validation along with the portion of the ‘train’ dataset to be used for validation

unlabeled_val = (unlabeled - train_val_split)

labeled_val = (train- train_val_split)

full_val = unlabeled_val + labeled_val

Parameters
  • batch_size – the batch size

  • transforms – a sequence of transforms

Read the Docs v: 0.2.5
Versions
latest
stable
0.2.5
0.2.4
0.2.3
0.2.2
0.2.1
0.2.0
0.1.1
master_rl
0.1.0
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.