Usage

Installation

you can use the irlc.ex01.agent.train() function:

irlc.ex01.agent.train(env, agent=None, experiment_name=None, num_episodes=1, verbose=True, reset=True, max_steps=10000000000.0, max_runs=None, return_trajectory=False, resume_stats=None, log_interval=1, delete_old_experiments=False)[source]

Implement the main training loop, see (Her21, Subsection 4.4.4). Simulate the interaction between agent agent and the environment env. The function has a lot of special functionality, so it is useful to consider the common cases. An example:

>>> stats, _ = train(env, agent, num_episodes=2)

Simulate interaction for two episodes (i.e. environment terminates two times and is reset). stats will be a list of length two containing information from each run

>>> stats, trajectories = train(env, agent, num_episodes=2, return_Trajectory=True)

trajectories will be a list of length two containing information from the two trajectories.

>>> stats, _ = train(env, agent, experiment_name='experiments/my_run', num_episodes=2)

Save stats, and trajectories, to a file which can easily be loaded/plotted (see course software for examples of this). The file will be time-stamped so using several calls you can repeat the same experiment (run) many times.

>>> stats, _ = train(env, agent, experiment_name='experiments/my_run', num_episodes=2, max_runs=10)

As above, but do not perform more than 10 runs. Useful for repeated experiments.

Parameters

env – Environment (Gym instance)
agent – Agent instance
experiment_name – Save outcome to file for easy plotting (Optional)
num_episodes – Number of episodes to simulate
verbose – Display progress bar
reset – Call env.reset() before simulation start.
max_steps – Terminate if this many steps have elapsed (for non-terminating environments)
max_runs – Maximum number of repeated experiments (requires experiment_name)
return_trajectory – Return trajectories list (Off by default since it might consume lots of memory)
resume_stats – Resume stat collection from last run (requires experiment_name)
log_interval – Log stats less frequently
delete_old_experiment – Delete old experiment with the same name

Returns

stats, trajectories (both as lists)

Next, we will do an entire class. What about some math before that?

\begin{eqnarray} y & = & ax^2 + bx + c \\ f(x) & = & x^2 + 2xy + y^2 \end{eqnarray}

class irlc.ex01.agent.Agent(env)[source]

Main agent class. See (Her21, Subsection 4.4.3) for additional details.

Example
>>> print("Hello World")
"Hello world"

\begin{eqnarray} y & = & ax^2 + bx + c \\ f(x) & = & x^2 + 2xy + y^2 \end{eqnarray}

extra_stats()[source]: Optional: Can be used to record extra information from the Agent while training. You can safely ignore this method, it will only be used for control theory to create nicer plots

hello(greeting)[source]

The canonical hello world example.

A longer description with some RST.

Parameters: greeting (str) – The person to say hello to.
Returns: The greeting
Return type: str

pi(s, k=None)[source]

Evaluate the Agent’s policy at time step k in state s

The details will differ depending on whether the agent interacts in a discrete-time or continous-time setting.

For discrete application (dynamical programming/search and reinforcement learning), k is discrete k=0, 1, 2, …
For control applications, k is continious and denote simulation time t, i.e. it should be called as

> agent.pi(x, t)

Parameters

s – Current state
k – Current time index.

Returns

action

train(s, a, r, sp, done=False)[source]

Called at each step of the simulation after a = pi(s,k) and environment transition to sp.

Allows the agent to learn from experience

Parameters

s – Current state x_k
a – Action taken
r – Reward obtained by taking action a_k in x_k
sp – The state that the environment transitioned to \({\bf x}_{k+1}\)
done – Whether environment terminated when transitioning to sp

Returns

None

To use Lumache, first install it using pip:

(.venv) $ pip install lumache