Shortcuts

pl_bolts.datamodules.experience_source module

Datamodules for RL models that rely on experiences generated during training Based on implementations found here: https://github.com/Shmuma/ptan/blob/master/ptan/experience.py

class pl_bolts.datamodules.experience_source.BaseExperienceSource(env, agent)[source]

Bases: abc.ABC

Simplest form of the experience source :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.BaseExperienceSource.env: Environment that is being used :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.BaseExperienceSource.agent: Agent being used to make decisions

runner()[source]

Iterable method that yields steps from the experience source

Return type

Experience

class pl_bolts.datamodules.experience_source.DiscountedExperienceSource(env, agent, n_steps=1, gamma=0.99)[source]

Bases: pl_bolts.datamodules.experience_source.ExperienceSource

Outputs experiences with a discounted reward over N steps

discount_rewards(experiences)[source]

Calculates the discounted reward over N experiences :type _sphinx_paramlinks_pl_bolts.datamodules.experience_source.DiscountedExperienceSource.discount_rewards.experiences: Tuple[Experience] :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.DiscountedExperienceSource.discount_rewards.experiences: Tuple of Experience

Return type

float

Returns

total discounted reward

runner(device)[source]

Iterates through experience tuple and calculate discounted experience :type _sphinx_paramlinks_pl_bolts.datamodules.experience_source.DiscountedExperienceSource.runner.device: device :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.DiscountedExperienceSource.runner.device: current device to be used for executing experience steps

Yields

Discounted Experience

Return type

Experience

split_head_tail_exp(experiences)[source]

Takes in a tuple of experiences and returns the last state and tail experiences based on if the last state is the end of an episode :type _sphinx_paramlinks_pl_bolts.datamodules.experience_source.DiscountedExperienceSource.split_head_tail_exp.experiences: Tuple[Experience] :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.DiscountedExperienceSource.split_head_tail_exp.experiences: Tuple of N Experience

Return type

Tuple[List, Tuple[Experience]]

Returns

last state (Array or None) and remaining Experience

class pl_bolts.datamodules.experience_source.Experience(state, action, reward, done, new_state)[source]

Bases: tuple

Create new instance of Experience(state, action, reward, done, new_state)

_asdict()[source]

Return a new OrderedDict which maps field names to their values.

classmethod _make(iterable)[source]

Make a new Experience object from a sequence or iterable

_replace(**kwds)[source]

Return a new Experience object replacing specified fields with new values

_fields = ('state', 'action', 'reward', 'done', 'new_state')[source]
_fields_defaults = {}[source]
property action[source]

Alias for field number 1

property done[source]

Alias for field number 3

property new_state[source]

Alias for field number 4

property reward[source]

Alias for field number 2

property state[source]

Alias for field number 0

class pl_bolts.datamodules.experience_source.ExperienceSource(env, agent, n_steps=1)[source]

Bases: pl_bolts.datamodules.experience_source.BaseExperienceSource

Experience source class handling single and multiple environment steps :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.env: Environment that is being used :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.agent: Agent being used to make decisions :type _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.n_steps: int :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.n_steps: Number of steps to return from each environment at once

env_actions(device)[source]

For each environment in the pool, get the correct action :rtype: List[List[int]] :returns: List of actions for each env, with size (num_envs, action_size)

env_step(env_idx, env, action)[source]

Carries out a step through the given environment using the given action :type _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.env_step.env_idx: int :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.env_step.env_idx: index of the current environment :type _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.env_step.env: Env :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.env_step.env: env at index env_idx :type _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.env_step.action: List[int] :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.env_step.action: action for this environment step

Return type

Experience

Returns

Experience tuple

init_environments()[source]

For each environment in the pool setups lists for tracking history of size n, state, current reward and current step

Return type

None

pop_rewards_steps()[source]

Returns the list of the current total rewards and steps collected :returns: list of total rewards and steps for all completed episodes for each environment since last pop

pop_total_rewards()[source]

Returns the list of the current total rewards collected :rtype: List[float] :returns: list of total rewards for all completed episodes for each environment since last pop

runner(device)[source]

Experience Source iterator yielding Tuple of experiences for n_steps. These come from the pool of environments provided by the user. :type _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.runner.device: device :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.runner.device: current device to be used for executing experience steps

Return type

Tuple[Experience]

Returns

Tuple of Experiences

update_env_stats(env_idx)[source]

To be called at the end of the history tail generation during the termination state. Updates the stats tracked for all environments :type _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.update_env_stats.env_idx: int :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.update_env_stats.env_idx: index of the environment used to update stats

Return type

None

update_history_queue(env_idx, exp, history)[source]

Updates the experience history queue with the lastest experiences. In the event of an experience step is in the done state, the history will be incrementally appended to the queue, removing the tail of the history each time. :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.update_history_queue.env_idx: index of the environment :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.update_history_queue.exp: the current experience :param _sphinx_paramlinks_pl_bolts.datamodules.experience_source.ExperienceSource.update_history_queue.history: history of experience steps for this environment

Return type

None

class pl_bolts.datamodules.experience_source.ExperienceSourceDataset(generate_batch)[source]

Bases: torch.utils.data.IterableDataset

Basic experience source dataset. Takes a generate_batch function that returns an iterator. The logic for the experience source and how the batch is generated is defined the Lightning model itself

Read the Docs v: 0.2.5
Versions
latest
stable
0.2.5
0.2.4
0.2.3
0.2.2
0.2.1
0.2.0
0.1.1
master_rl
0.1.0
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.