[[public:t-720-atai:atai-20:main|ATAI-20 Main]]
\\

====ATAI-20 Reykjavik University====

\\
\\
====Engineering Assignment 1:====
=====Deep-Reinforcement Learner=====

\\

**Aim:** This assignment is meant to give you a better insight into complexity dimensions and properties of task-environments.

**Summary:** In this first exercise you are asked to evaluate a given Deep-Reinforcement Learner (an actor-critic learner, to be specific; for further information see [[https://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf|Konda & Tsitsiklis 2000]]) in different task-environments, coded in Python. The task-environment (or just task, really) is the well-known cart-pole task: Balancing a pole on a moving platform, in 1-D (left and right movements). 

===Your task:===

  - **Plain Vanilla.** Evaluate the actor-critic’s performance on the cart-pole task given to you as python code:
    - Run the learner repeatedly; collect the data. Stop each run when either 1000 epochs are reached or the agent manages to get more than 200 iterations in average per epoch over at least 100 continuous epochs (this is usually the case at around 400-500 epochs).
    - Plot its improvement in performance over time. 
  - **Modified Version.** Evaluate the learner’s performance on a modified version of the cart-pole task. For this you should evaluate at least 3 of the following modifications of the environments and compare them to the results from 1.:
    - Noise on observation/ action/ and environment dynamics.
    - Hide each variable once (x, v, theta, omega) and run the setup with only three observables.
    - Introduce extremely high noise on one observable for all four observables (three normal, one noisy variables).
    - Change the task after a certain amount of epochs. Think of at least three different changes, one is given as an example in the code.
    - Change the discreteness of time/ observables increasing or decreasing the variable resolution.
  - **New Task-Environment.** Design your own simple task-environment in which you can show your own ideas of complexity of task-environments which might not have been included in the cart-pole.
  - **Report.** Write a 1-2 page report where you describe your results. Draw some insights in relation to learning in general and try to make some generalizations based on them, and discuss, e.g.:
    - When does the actor-critic learner fail?
    - Which changes will be //impossible// for the actor-critic to adjust to (try this out yourself from what you know of neural networks; hint: input and output layers of ANNs).
    - What is your opinion of the //generality// and //adaptability// of the actor-critic learner with respect to **//novelty//** (novel task-environments)? 
    - Is this in any way similar to how humans learn? If 'yes', how? If 'no', what's different, and why?
    - ...more

|  {{/public:t-720-atai:cart-pole-task.jpg?500}}  | 
|  The cart-pole task.  |

\\
\\


=== Setup ===
Install python3 on your computer (https://www.python.org/downloads/).\\
Download the attached zip file and extract it to some location (e.g. .../assignment_1/) andcd into the folder.\\
Install the included requirements.txt file:\\
  $ pip install -r requirements.txt
Run the code:\\
  $ python main.py

Zip Files:\\
{{:public:t-720-atai:atai-20:exercise_1.zip|Old zip file}}\\
{{:public:t-720-atai:atai-20:exercise_1_updated.zip|Updated zip file}}

\\
\\

=== Further information ===
In the file “env.py” you can find the parent-class of the cart-pole environment, you should inherit from this class when you write your own environment in 3. and include all abstract methods.

In the file “ac_learner.py” you can find the source code of an actor critic learner.

In the file “cart_pole_env.py” you can find the source code for the task environment. In order to evaluate the learner on the different environment settings the following methods are of importance:

apply_observation_noise:
	Apply noise only to the observations received by the agent, NOT the internal environment variables

apply_action_noise:
	Apply noise to the force applied to the cart-pole

apply_environment_noise:
	Apply noise to the dynamics of the environment, but not to the observations of the agent

adjust_task:
	Change the task in a chosen manner

apply_discretization:
	discretize the data passed to the agent


In each of these methods you can implement a different method to adjust the task or the information passed to or from the agent. In “env.py” a helper class for noise is included, which you can use.

After the agent runs for the defined max number of epochs a plot is created of iterations per epoch. Use this to evaluate the learning performance of the learner.

**HINT** You are allowed to change any parts of the learner or the environment, just make sure to document all changes and explain how and why they influence the learning performance of the agent.

\\
\\
\\

=== Some more reading ===

Besides many more:

Thórisson, K.R., Bieger, J., Schiffel, S., Garrett, D.: [[http://alumni.media.mit.edu/~kris/ftp/AGIEvaluationFlexibleFramework-ThorissonEtAl2015.pdf|Towards Flexible Task-Environments for Comprehensive Evaluation of Artificial Intelligent Systems and Automatic Learners]]. In: International Conference on Artificial General Intelligence. pp. 187–196. Springer (2015)

Russell, S.J., Norvig, P.: Artificial intelligence: A modern approach. Malaysia; Pear-son Education Limited, (2016)

Eberding, L.M., Sheikhlar, A., Thórisson, K.R.: Sage: Task-environment platform for autonomy and generality evaluation. In: International Conference on ArtificialGeneral Intelligence. Springer (2020)

Konda, V.R., Tsitsiklis, J.N.: [[https://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf|Actor-Critic algorithms]]. In: Advances in neural in-formation processing systems. pp. 1008–1014 (2000)