Reinforcement Learning¶

These are common losses used in RL.

DQN Loss¶

pl_bolts.losses.rl.dqn_loss(batch, net, target_net, gamma=0.99)[source]

Warning

The feature dqn_loss is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html

Calculates the mse loss using a mini batch from the replay buffer.

Parameters

batch¶ (Tuple[Tensor, Tensor]) – current mini batch of replay data
net¶ (Module) – main training network
target_net¶ (Module) – target network of the main training network
gamma¶ (float) – discount factor

Return type

Tensor

Returns

loss

Double DQN Loss¶

pl_bolts.losses.rl.double_dqn_loss(batch, net, target_net, gamma=0.99)[source]

Warning

The feature double_dqn_loss is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html

Calculates the mse loss using a mini batch from the replay buffer. This uses an improvement to the original DQN loss by using the double dqn. This is shown by using the actions of the train network to pick the value from the target network. This code is heavily commented in order to explain the process clearly.

Parameters

batch¶ (Tuple[Tensor, Tensor]) – current mini batch of replay data
net¶ (Module) – main training network
target_net¶ (Module) – target network of the main training network
gamma¶ (float) – discount factor

Return type

Tensor

Returns

loss

Per DQN Loss¶

pl_bolts.losses.rl.per_dqn_loss(batch, batch_weights, net, target_net, gamma=0.99)[source]

Warning

The feature per_dqn_loss is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html

Calculates the mse loss with the priority weights of the batch from the PER buffer.

Parameters

batch¶ (Tuple[Tensor, Tensor]) – current mini batch of replay data
batch_weights¶ (List) – how each of these samples are weighted in terms of priority
net¶ (Module) – main training network
target_net¶ (Module) – target network of the main training network
gamma¶ (float) – discount factor

Return type

Tuple[Tensor, ndarray]

Returns

loss and batch_weights