Usage
Installation
you can use the irlc.ex01.agent.train()
function:
- irlc.ex01.agent.train(env, agent=None, experiment_name=None, num_episodes=1, verbose=True, reset=True, max_steps=10000000000.0, max_runs=None, return_trajectory=False, resume_stats=None, log_interval=1, delete_old_experiments=False)[source]
Implement the main training loop, see (Her21, Subsection 4.4.4). Simulate the interaction between agent agent and the environment env. The function has a lot of special functionality, so it is useful to consider the common cases. An example:
>>> stats, _ = train(env, agent, num_episodes=2)
Simulate interaction for two episodes (i.e. environment terminates two times and is reset). stats will be a list of length two containing information from each run
>>> stats, trajectories = train(env, agent, num_episodes=2, return_Trajectory=True)
trajectories will be a list of length two containing information from the two trajectories.
>>> stats, _ = train(env, agent, experiment_name='experiments/my_run', num_episodes=2)
Save stats, and trajectories, to a file which can easily be loaded/plotted (see course software for examples of this). The file will be time-stamped so using several calls you can repeat the same experiment (run) many times.
>>> stats, _ = train(env, agent, experiment_name='experiments/my_run', num_episodes=2, max_runs=10)
As above, but do not perform more than 10 runs. Useful for repeated experiments.
- Parameters
env – Environment (Gym instance)
agent – Agent instance
experiment_name – Save outcome to file for easy plotting (Optional)
num_episodes – Number of episodes to simulate
verbose – Display progress bar
reset – Call env.reset() before simulation start.
max_steps – Terminate if this many steps have elapsed (for non-terminating environments)
max_runs – Maximum number of repeated experiments (requires experiment_name)
return_trajectory – Return trajectories list (Off by default since it might consume lots of memory)
resume_stats – Resume stat collection from last run (requires experiment_name)
log_interval – Log stats less frequently
delete_old_experiment – Delete old experiment with the same name
- Returns
stats, trajectories (both as lists)
Next, we will do an entire class. What about some math before that?
- class irlc.ex01.agent.Agent(env)[source]
Main agent class. See (Her21, Subsection 4.4.3) for additional details.
- Example
>>> print("Hello World") "Hello world"
\begin{eqnarray} y & = & ax^2 + bx + c \\ f(x) & = & x^2 + 2xy + y^2 \end{eqnarray}- extra_stats()[source]
Optional: Can be used to record extra information from the Agent while training. You can safely ignore this method, it will only be used for control theory to create nicer plots
- hello(greeting)[source]
The canonical hello world example.
A longer description with some RST.
- Parameters
greeting (
str
) – The person to say hello to.- Returns
The greeting
- Return type
str
- pi(s, k=None)[source]
Evaluate the Agent’s policy at time step k in state s
The details will differ depending on whether the agent interacts in a discrete-time or continous-time setting.
For discrete application (dynamical programming/search and reinforcement learning), k is discrete k=0, 1, 2, …
For control applications, k is continious and denote simulation time t, i.e. it should be called as
> agent.pi(x, t)
- Parameters
s – Current state
k – Current time index.
- Returns
action
- train(s, a, r, sp, done=False)[source]
Called at each step of the simulation after a = pi(s,k) and environment transition to sp.
Allows the agent to learn from experience
- Parameters
s – Current state x_k
a – Action taken
r – Reward obtained by taking action a_k in x_k
sp – The state that the environment transitioned to \({\bf x}_{k+1}\)
done – Whether environment terminated when transitioning to sp
- Returns
None
To use Lumache, first install it using pip:
(.venv) $ pip install lumache