About us

some stuff here.

def myfun():
    return a

more things about the code, now with autodic. $$ this is an equation? $$ or this $a=b$ or what about :math:a^2=b this. Go see the class irlc.ex01.agent.Agent for extra stuff.

class irlc.ex01.agent.Agent(env)[source]

Main agent class. See (Her21, Subsection 4.4.3) for additional details.

Example

>>> print("Hello World")
"Hello world"
\begin{eqnarray} y & = & ax^2 + bx + c \\ f(x) & = & x^2 + 2xy + y^2 \end{eqnarray}
extra_stats()[source]

Optional: Can be used to record extra information from the Agent while training. You can safely ignore this method, it will only be used for control theory to create nicer plots

hello(greeting)[source]

The canonical hello world example.

A longer description with some RST.

Parameters

greeting (str) – The person to say hello to.

Returns

The greeting

Return type

str

pi(s, k=None)[source]

Evaluate the Agent’s policy at time step k in state s

The details will differ depending on whether the agent interacts in a discrete-time or continous-time setting.

  • For discrete application (dynamical programming/search and reinforcement learning), k is discrete k=0, 1, 2, …

  • For control applications, k is continious and denote simulation time t, i.e. it should be called as

> agent.pi(x, t)

Parameters
  • s – Current state

  • k – Current time index.

Returns

action

train(s, a, r, sp, done=False)[source]

Called at each step of the simulation after a = pi(s,k) and environment transition to sp.

Allows the agent to learn from experience

Parameters
  • s – Current state x_k

  • a – Action taken

  • r – Reward obtained by taking action a_k in x_k

  • sp – The state that the environment transitioned to \({\bf x}_{k+1}\)

  • done – Whether environment terminated when transitioning to sp

Returns

None