Usage

Installation

you can use the irlc.ex01.agent.train() function:

irlc.ex01.agent.train(env, agent=None, experiment_name=None, num_episodes=1, verbose=True, reset=True, max_steps=10000000000.0, max_runs=None, return_trajectory=False, resume_stats=None, log_interval=1, delete_old_experiments=False)[source]

Implement the main training loop, see (Her21, Subsection 4.4.4). Simulate the interaction between agent agent and the environment env. The function has a lot of special functionality, so it is useful to consider the common cases. An example:

>>> stats, _ = train(env, agent, num_episodes=2)

Simulate interaction for two episodes (i.e. environment terminates two times and is reset). stats will be a list of length two containing information from each run

>>> stats, trajectories = train(env, agent, num_episodes=2, return_Trajectory=True)

trajectories will be a list of length two containing information from the two trajectories.

>>> stats, _ = train(env, agent, experiment_name='experiments/my_run', num_episodes=2)

Save stats, and trajectories, to a file which can easily be loaded/plotted (see course software for examples of this). The file will be time-stamped so using several calls you can repeat the same experiment (run) many times.

>>> stats, _ = train(env, agent, experiment_name='experiments/my_run', num_episodes=2, max_runs=10)

As above, but do not perform more than 10 runs. Useful for repeated experiments.

Parameters
  • env – Environment (Gym instance)

  • agent – Agent instance

  • experiment_name – Save outcome to file for easy plotting (Optional)

  • num_episodes – Number of episodes to simulate

  • verbose – Display progress bar

  • reset – Call env.reset() before simulation start.

  • max_steps – Terminate if this many steps have elapsed (for non-terminating environments)

  • max_runs – Maximum number of repeated experiments (requires experiment_name)

  • return_trajectory – Return trajectories list (Off by default since it might consume lots of memory)

  • resume_stats – Resume stat collection from last run (requires experiment_name)

  • log_interval – Log stats less frequently

  • delete_old_experiment – Delete old experiment with the same name

Returns

stats, trajectories (both as lists)

Next, we will do an entire class. What about some math before that?

\begin{eqnarray} y & = & ax^2 + bx + c \\ f(x) & = & x^2 + 2xy + y^2 \end{eqnarray}
class irlc.ex01.agent.Agent(env)[source]

Main agent class. See (Her21, Subsection 4.4.3) for additional details.

Example

>>> print("Hello World")
"Hello world"
\begin{eqnarray} y & = & ax^2 + bx + c \\ f(x) & = & x^2 + 2xy + y^2 \end{eqnarray}
extra_stats()[source]

Optional: Can be used to record extra information from the Agent while training. You can safely ignore this method, it will only be used for control theory to create nicer plots

hello(greeting)[source]

The canonical hello world example.

A longer description with some RST.

Parameters

greeting (str) – The person to say hello to.

Returns

The greeting

Return type

str

pi(s, k=None)[source]

Evaluate the Agent’s policy at time step k in state s

The details will differ depending on whether the agent interacts in a discrete-time or continous-time setting.

  • For discrete application (dynamical programming/search and reinforcement learning), k is discrete k=0, 1, 2, …

  • For control applications, k is continious and denote simulation time t, i.e. it should be called as

> agent.pi(x, t)

Parameters
  • s – Current state

  • k – Current time index.

Returns

action

train(s, a, r, sp, done=False)[source]

Called at each step of the simulation after a = pi(s,k) and environment transition to sp.

Allows the agent to learn from experience

Parameters
  • s – Current state x_k

  • a – Action taken

  • r – Reward obtained by taking action a_k in x_k

  • sp – The state that the environment transitioned to \({\bf x}_{k+1}\)

  • done – Whether environment terminated when transitioning to sp

Returns

None

To use Lumache, first install it using pip:

(.venv) $ pip install lumache