irlc.ex01.agent.train

irlc.ex01.agent.train(env, agent=None, experiment_name=None, num_episodes=1, verbose=True, reset=True, max_steps=10000000000.0, max_runs=None, return_trajectory=False, resume_stats=None, log_interval=1, delete_old_experiments=False)[source]

Implement the main training loop, see (Her21, Subsection 4.4.4). Simulate the interaction between agent agent and the environment env. The function has a lot of special functionality, so it is useful to consider the common cases. An example:

>>> stats, _ = train(env, agent, num_episodes=2)

Simulate interaction for two episodes (i.e. environment terminates two times and is reset). stats will be a list of length two containing information from each run

>>> stats, trajectories = train(env, agent, num_episodes=2, return_Trajectory=True)

trajectories will be a list of length two containing information from the two trajectories.

>>> stats, _ = train(env, agent, experiment_name='experiments/my_run', num_episodes=2)

Save stats, and trajectories, to a file which can easily be loaded/plotted (see course software for examples of this). The file will be time-stamped so using several calls you can repeat the same experiment (run) many times.

>>> stats, _ = train(env, agent, experiment_name='experiments/my_run', num_episodes=2, max_runs=10)

As above, but do not perform more than 10 runs. Useful for repeated experiments.

Parameters
  • env – Environment (Gym instance)

  • agent – Agent instance

  • experiment_name – Save outcome to file for easy plotting (Optional)

  • num_episodes – Number of episodes to simulate

  • verbose – Display progress bar

  • reset – Call env.reset() before simulation start.

  • max_steps – Terminate if this many steps have elapsed (for non-terminating environments)

  • max_runs – Maximum number of repeated experiments (requires experiment_name)

  • return_trajectory – Return trajectories list (Off by default since it might consume lots of memory)

  • resume_stats – Resume stat collection from last run (requires experiment_name)

  • log_interval – Log stats less frequently

  • delete_old_experiment – Delete old experiment with the same name

Returns

stats, trajectories (both as lists)