Shortcuts

Reinforcement Learning

These are common losses used in RL.


DQN Loss

pl_bolts.losses.rl.dqn_loss(batch, net, target_net, gamma=0.99)[source]

Warning

The feature dqn_loss is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html

Calculates the mse loss using a mini batch from the replay buffer.

Parameters
  • batch (Tuple[Tensor, Tensor]) – current mini batch of replay data

  • net (Module) – main training network

  • target_net (Module) – target network of the main training network

  • gamma (float) – discount factor

Return type

Tensor

Returns

loss

Double DQN Loss

pl_bolts.losses.rl.double_dqn_loss(batch, net, target_net, gamma=0.99)[source]

Warning

The feature double_dqn_loss is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html

Calculates the mse loss using a mini batch from the replay buffer. This uses an improvement to the original DQN loss by using the double dqn. This is shown by using the actions of the train network to pick the value from the target network. This code is heavily commented in order to explain the process clearly.

Parameters
  • batch (Tuple[Tensor, Tensor]) – current mini batch of replay data

  • net (Module) – main training network

  • target_net (Module) – target network of the main training network

  • gamma (float) – discount factor

Return type

Tensor

Returns

loss

Per DQN Loss

pl_bolts.losses.rl.per_dqn_loss(batch, batch_weights, net, target_net, gamma=0.99)[source]

Warning

The feature per_dqn_loss is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html

Calculates the mse loss with the priority weights of the batch from the PER buffer.

Parameters
  • batch (Tuple[Tensor, Tensor]) – current mini batch of replay data

  • batch_weights (List) – how each of these samples are weighted in terms of priority

  • net (Module) – main training network

  • target_net (Module) – target network of the main training network

  • gamma (float) – discount factor

Return type

Tuple[Tensor, ndarray]

Returns

loss and batch_weights

Read the Docs v: latest
Versions
latest
stable
Downloads
pdf
html
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.