Shortcuts

pl_bolts.models.rl.common.memory module

Series of memory buffers sued

class pl_bolts.models.rl.common.memory.Buffer(capacity)[source]

Bases: object

Basic Buffer for storing a single experience at a time

Parameters

capacity (int) – size of the buffer

append(experience)[source]

Add experience to the buffer

Parameters

experience (Experience) – tuple (state, action, reward, done, new_state)

Return type

None

sample(*args)[source]

returns everything in the buffer so far it is then reset

Return type

Union[Tuple, List[Tuple]]

Returns

a batch of tuple np arrays of state, action, reward, done, next_state

class pl_bolts.models.rl.common.memory.Experience(state, action, reward, done, new_state)[source]

Bases: tuple

Create new instance of Experience(state, action, reward, done, new_state)

_asdict()[source]

Return a new OrderedDict which maps field names to their values.

classmethod _make(iterable)[source]

Make a new Experience object from a sequence or iterable

_replace(**kwds)[source]

Return a new Experience object replacing specified fields with new values

_fields = ('state', 'action', 'reward', 'done', 'new_state')[source]
_fields_defaults = {}[source]
property action[source]

Alias for field number 1

property done[source]

Alias for field number 3

property new_state[source]

Alias for field number 4

property reward[source]

Alias for field number 2

property state[source]

Alias for field number 0

class pl_bolts.models.rl.common.memory.MeanBuffer(capacity)[source]

Bases: object

Stores a deque of items and calculates the mean

add(val)[source]

Add to the buffer

Return type

None

mean()[source]

Retrieve the mean

Return type

float

class pl_bolts.models.rl.common.memory.MultiStepBuffer(buffer_size, n_step=2)[source]

Bases: object

N Step Replay Buffer

Deprecated: use the NStepExperienceSource with the standard ReplayBuffer

append(experience)[source]

add an experience to the buffer by collecting n steps of experiences :param _sphinx_paramlinks_pl_bolts.models.rl.common.memory.MultiStepBuffer.append.experience: tuple (state, action, reward, done, next_state)

Return type

None

get_transition_info(gamma=0.9)[source]

get the accumulated transition info for the n_step_buffer :param _sphinx_paramlinks_pl_bolts.models.rl.common.memory.MultiStepBuffer.get_transition_info.gamma: discount factor

Return type

Tuple[float, array, int]

Returns

multi step reward, final observation and done

sample(batch_size)[source]

Takes a sample of the buffer :type _sphinx_paramlinks_pl_bolts.models.rl.common.memory.MultiStepBuffer.sample.batch_size: int :param _sphinx_paramlinks_pl_bolts.models.rl.common.memory.MultiStepBuffer.sample.batch_size: current batch_size

Return type

Tuple

Returns

a batch of tuple np arrays of Experiences

class pl_bolts.models.rl.common.memory.PERBuffer(buffer_size, prob_alpha=0.6, beta_start=0.4, beta_frames=100000)[source]

Bases: pl_bolts.models.rl.common.memory.ReplayBuffer

simple list based Prioritized Experience Replay Buffer Based on implementation found here: https://github.com/Shmuma/ptan/blob/master/ptan/experience.py#L371

append(exp)[source]

Adds experiences from exp_source to the PER buffer

Parameters

exp – experience tuple being added to the buffer

Return type

None

sample(batch_size=32)[source]

Takes a prioritized sample from the buffer

Parameters

batch_size – size of sample

Return type

Tuple

Returns

sample of experiences chosen with ranked probability

update_beta(step)[source]

Update the beta value which accounts for the bias in the PER

Parameters

step – current global step

Return type

float

Returns

beta value for this indexed experience

update_priorities(batch_indices, batch_priorities)[source]

Update the priorities from the last batch, this should be called after the loss for this batch has been calculated.

Parameters
  • batch_indices (List) – index of each datum in the batch

  • batch_priorities (List) – priority of each datum in the batch

Return type

None

class pl_bolts.models.rl.common.memory.ReplayBuffer(capacity)[source]

Bases: pl_bolts.models.rl.common.memory.Buffer

Replay Buffer for storing past experiences allowing the agent to learn from them

Parameters

capacity (int) – size of the buffer

sample(batch_size)[source]

Takes a sample of the buffer :type _sphinx_paramlinks_pl_bolts.models.rl.common.memory.ReplayBuffer.sample.batch_size: int :param _sphinx_paramlinks_pl_bolts.models.rl.common.memory.ReplayBuffer.sample.batch_size: current batch_size

Return type

Tuple

Returns

a batch of tuple np arrays of state, action, reward, done, next_state

Read the Docs v: 0.1.0
Versions
latest
stable
0.1.1
0.1.0
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.