Shortcuts

pl_bolts.datamodules.lightning_datamodule module

class pl_bolts.datamodules.lightning_datamodule.LightningDataModule(train_transforms=None, val_transforms=None, test_transforms=None)[source]

Bases: object

A DataModule standardizes the training, val, test splits, data preparation and transforms. The main advantage is consistent data splits and transforms across models.

Example:

class MyDataModule(LightningDataModule):

    def __init__(self):
        super().__init__()

    def prepare_data(self):
        # download, split, etc...

    def train_dataloader(self):
        train_split = Dataset(...)
        return DataLoader(train_split)

    def val_dataloader(self):
        val_split = Dataset(...)
        return DataLoader(val_split)

    def test_dataloader(self):
        test_split = Dataset(...)
        return DataLoader(test_split)

A DataModule implements 4 key methods

  1. prepare_data (things to do on 1 GPU not on every GPU in distributed mode)

  2. train_dataloader the training dataloader.

  3. val_dataloader the val dataloader.

  4. test_dataloader the test dataloader.

This allows you to share a full dataset without explaining what the splits, transforms or download process is.

classmethod add_argparse_args(parent_parser)[source]

Extends existing argparse by default LightningDataModule attributes.

Return type

ArgumentParser

classmethod from_argparse_args(args, **kwargs)[source]

Create an instance from CLI arguments.

Parameters
  • args (Union[Namespace, ArgumentParser]) – The parser or namespace to take arguments from. Only known arguments will be parsed and passed to the LightningDataModule.

  • **kwargs – Additional keyword arguments that may override ones in the parser or namespace. These must be valid Trainer arguments.

Example:

parser = ArgumentParser(add_help=False)
parser = LightningDataModule.add_argparse_args(parser)
module = LightningDataModule.from_argparse_args(args)
classmethod get_init_arguments_and_types()[source]

Scans the Trainer signature and returns argument names, types and default values.

Returns

(argument name, set with argument types, argument default value).

Return type

List with tuples of 3 values

abstract prepare_data(*args, **kwargs)[source]

Use this to download and prepare data. In distributed (GPU, TPU), this will only be called once. This is called before requesting the dataloaders:

Warning

Do not assign anything to the model in this step since this will only be called on 1 GPU.

Pseudocode:

model.prepare_data()
model.train_dataloader()
model.val_dataloader()
model.test_dataloader()

Example:

def prepare_data(self):
    download_imagenet()
    clean_imagenet()
    cache_imagenet()
size(dim=None)[source]

Return the dimension of each input Either as a tuple or list of tuples

Return type

Union[Tuple, int]

abstract test_dataloader(*args, **kwargs)[source]

Implement a PyTorch DataLoader for training.

Return type

Union[DataLoader, List[DataLoader]]

Returns

Single PyTorch DataLoader.

Note

Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Note

You can also return a list of DataLoaders

Example:

def test_dataloader(self):
    dataset = MNIST(root=PATH, train=False, transform=transforms.ToTensor(), download=False)
    loader = torch.utils.data.DataLoader(dataset=dataset, shuffle=False)
    return loader
abstract train_dataloader(*args, **kwargs)[source]

Implement a PyTorch DataLoader for training.

Return type

DataLoader

Returns

Single PyTorch DataLoader.

Note

Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Example:

def train_dataloader(self):
    dataset = MNIST(root=PATH, train=True, transform=transforms.ToTensor(), download=False)
    loader = torch.utils.data.DataLoader(dataset=dataset)
    return loader
abstract val_dataloader(*args, **kwargs)[source]

Implement a PyTorch DataLoader for training.

Return type

Union[DataLoader, List[DataLoader]]

Returns

Single PyTorch DataLoader.

Note

Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Note

You can also return a list of DataLoaders

Example:

def val_dataloader(self):
    dataset = MNIST(root=PATH, train=False, transform=transforms.ToTensor(), download=False)
    loader = torch.utils.data.DataLoader(dataset=dataset, shuffle=False)
    return loader
name: str = Ellipsis[source]
property test_transforms[source]
property train_transforms[source]
property val_transforms[source]
Read the Docs v: 0.1.0
Versions
latest
stable
0.1.1
0.1.0
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.