Shortcuts

Sklearn Datamodule

Utilities to map sklearn or numpy datasets to PyTorch Dataloaders with automatic data splits and GPU/TPU support.

from sklearn.datasets import load_boston
from pl_bolts.datamodules import SklearnDataModule

X, y = load_boston(return_X_y=True)
loaders = SklearnDataModule(X, y)

train_loader = loaders.train_dataloader(batch_size=32)
val_loader = loaders.val_dataloader(batch_size=32)
test_loader = loaders.test_dataloader(batch_size=32)

Or build your own torch datasets

from sklearn.datasets import load_boston
from pl_bolts.datamodules import SklearnDataset

X, y = load_boston(return_X_y=True)
dataset = SklearnDataset(X, y)
loader = DataLoader(dataset)

Sklearn Dataset Class

Transforms a sklearn or numpy dataset to a PyTorch Dataset.

class pl_bolts.datamodules.sklearn_datamodule.SklearnDataset(X, y, X_transform=None, y_transform=None)[source]

Bases: torch.utils.data.Dataset

Mapping between numpy (or sklearn) datasets to PyTorch datasets.

Parameters

Example

>>> from sklearn.datasets import load_boston
>>> from pl_bolts.datamodules import SklearnDataset
...
>>> X, y = load_boston(return_X_y=True)
>>> dataset = SklearnDataset(X, y)
>>> len(dataset)
506

Sklearn DataModule Class

Automatically generates the train, validation and test splits for a Numpy dataset. They are set up as dataloaders for convenience. Optionally, you can pass in your own validation and test splits.

class pl_bolts.datamodules.sklearn_datamodule.SklearnDataModule(X, y, x_val=None, y_val=None, x_test=None, y_test=None, val_split=0.2, test_split=0.1, num_workers=2, random_state=1234, shuffle=True, *args, **kwargs)[source]

Bases: pl_bolts.datamodules.lightning_datamodule.LightningDataModule

Automatically generates the train, validation and test splits for a Numpy dataset. They are set up as dataloaders for convenience. Optionally, you can pass in your own validation and test splits.

Example

>>> from sklearn.datasets import load_boston
>>> from pl_bolts.datamodules import SklearnDataModule
...
>>> X, y = load_boston(return_X_y=True)
>>> loaders = SklearnDataModule(X, y)
...
>>> # train set
>>> train_loader = loaders.train_dataloader(batch_size=32)
>>> len(train_loader.dataset)
355
>>> len(train_loader)
11
>>> # validation set
>>> val_loader = loaders.val_dataloader(batch_size=32)
>>> len(val_loader.dataset)
100
>>> len(val_loader)
3
>>> # test set
>>> test_loader = loaders.test_dataloader(batch_size=32)
>>> len(test_loader.dataset)
51
>>> len(test_loader)
1
test_dataloader(batch_size=16)[source]

Implement a PyTorch DataLoader for training.

Returns

Single PyTorch DataLoader.

Note

Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Note

You can also return a list of DataLoaders

Example:

def test_dataloader(self):
    dataset = MNIST(root=PATH, train=False, transform=transforms.ToTensor(), download=False)
    loader = torch.utils.data.DataLoader(dataset=dataset, shuffle=False)
    return loader
train_dataloader(batch_size=16)[source]

Implement a PyTorch DataLoader for training.

Returns

Single PyTorch DataLoader.

Note

Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Example:

def train_dataloader(self):
    dataset = MNIST(root=PATH, train=True, transform=transforms.ToTensor(), download=False)
    loader = torch.utils.data.DataLoader(dataset=dataset)
    return loader
val_dataloader(batch_size=16)[source]

Implement a PyTorch DataLoader for training.

Returns

Single PyTorch DataLoader.

Note

Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Note

You can also return a list of DataLoaders

Example:

def val_dataloader(self):
    dataset = MNIST(root=PATH, train=False, transform=transforms.ToTensor(), download=False)
    loader = torch.utils.data.DataLoader(dataset=dataset, shuffle=False)
    return loader
Read the Docs v: 0.1.0
Versions
latest
stable
0.1.1
0.1.0
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.