User Tools

Site Tools


public:t-720-atai:atai-20:engineering_assignment_1

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
public:t-720-atai:atai-20:engineering_assignment_1 [2020/08/24 09:31] thorissonpublic:t-720-atai:atai-20:engineering_assignment_1 [2024/04/29 13:33] (current) – external edit 127.0.0.1
Line 13: Line 13:
 **Aim:** This assignment is meant to give you a better insight into complexity dimensions and properties of task-environments. **Aim:** This assignment is meant to give you a better insight into complexity dimensions and properties of task-environments.
  
-**Summary:** In this first exercise you are asked to evaluate a given Deep-Reinforcement Learner (an actor-critic learner, to be specific; for further information see Konda & Tsitsiklis 2000) in different task-environments, coded in Python. The task-environment (or just task, really) is the well-known cart-pole task: Balancing a pole on a moving platform, in 1-D (left and right movements). +**Summary:** In this first exercise you are asked to evaluate a given Deep-Reinforcement Learner (an actor-critic learner, to be specific; for further information see [[https://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf|Konda & Tsitsiklis 2000]]) in different task-environments, coded in Python. The task-environment (or just task, really) is the well-known cart-pole task: Balancing a pole on a moving platform, in 1-D (left and right movements). 
  
 ===Your task:=== ===Your task:===
  
   - **Plain Vanilla.** Evaluate the actor-critic’s performance on the cart-pole task given to you as python code:   - **Plain Vanilla.** Evaluate the actor-critic’s performance on the cart-pole task given to you as python code:
-    - Run the learner repeatedly; collect the data. Stop each run when either 1000 epochs are reached or the agent manages to get more than 200 iterations in average per epoch over at least 100 continuous epochs (This is usually the case at around 400-500 epochs)+    - Run the learner repeatedly; collect the data. Stop each run when either 1000 epochs are reached or the agent manages to get more than 200 iterations in average per epoch over at least 100 continuous epochs (this is usually the case at around 400-500 epochs).
     - Plot its improvement in performance over time.      - Plot its improvement in performance over time. 
-  - **Modified Version.** Evaluate the learner’s performance on a modified version of the cart-pole task. For this you should evaluate at least 3 of the following modifications of the environments:+  - **Modified Version.** Evaluate the learner’s performance on a modified version of the cart-pole task. For this you should evaluate at least 3 of the following modifications of the environments and compare them to the results from 1.:
     - Noise on observation/ action/ and environment dynamics.     - Noise on observation/ action/ and environment dynamics.
     - Hide each variable once (x, v, theta, omega) and run the setup with only three observables.     - Hide each variable once (x, v, theta, omega) and run the setup with only three observables.
Line 26: Line 26:
     - Change the task after a certain amount of epochs. Think of at least three different changes, one is given as an example in the code.     - Change the task after a certain amount of epochs. Think of at least three different changes, one is given as an example in the code.
     - Change the discreteness of time/ observables increasing or decreasing the variable resolution.     - Change the discreteness of time/ observables increasing or decreasing the variable resolution.
-    - And compare them to the results from 1. 
   - **New Task-Environment.** Design your own simple task-environment in which you can show your own ideas of complexity of task-environments which might not have been included in the cart-pole.   - **New Task-Environment.** Design your own simple task-environment in which you can show your own ideas of complexity of task-environments which might not have been included in the cart-pole.
-  - **Report.** Write a 1-2 page report where you describe your results. Draw some insights and try to make some generalizations based on them, and discuss, e.g.:+  - **Report.** Write a 1-2 page report where you describe your results. Draw some insights in relation to learning in general and try to make some generalizations based on them, and discuss, e.g.:
     - When does the actor-critic learner fail?     - When does the actor-critic learner fail?
-    - Which changes are impossible for the actor-critic to adjust to (Try this out yourself from what you know of neural networks hint: input and output layers of ANNs) +    - Which changes will be //impossible// for the actor-critic to adjust to (try this out yourself from what you know of neural networkshint: input and output layers of ANNs). 
-    - What is your general opinion of the generality and adaptability to novelty of the actor-critic learner?+    - What is your opinion of the //generality// and //adaptability// of the actor-critic learner with respect to **//novelty//** (novel task-environments)?  
 +    - Is this in any way similar to how humans learn? If 'yes', how? If 'no', what's different, and why?
     - ...more     - ...more
  
-{{/public:t-720-atai:cart-pole-task.jpg?500}}+|  {{/public:t-720-atai:cart-pole-task.jpg?500}}  |  
 +|  The cart-pole task.  |
  
 \\ \\
Line 48: Line 49:
   $ python main.py   $ python main.py
  
-Zip File: {{:public:t-720-atai:atai-20:exercise_1.zip|Zip file}}+Zip Files:\\ 
 +{{:public:t-720-atai:atai-20:exercise_1.zip|Old zip file}}\\ 
 +{{:public:t-720-atai:atai-20:exercise_1_updated.zip|Updated zip file}} 
 + 
 +\\ 
 +\\
  
 === Further information === === Further information ===
Line 87: Line 93:
 Besides many more: Besides many more:
  
-Thórisson, K.R., Bieger, J., Schiffel, S., Garrett, D.: Towards flexible task environments for comprehensive evaluation of artificial intelligent systems and automatic learners. In: International Conference on Artificial General Intelligence. pp. 187–196. Springer (2015)+Thórisson, K.R., Bieger, J., Schiffel, S., Garrett, D.: [[http://alumni.media.mit.edu/~kris/ftp/AGIEvaluationFlexibleFramework-ThorissonEtAl2015.pdf|Towards Flexible Task-Environments for Comprehensive Evaluation of Artificial Intelligent Systems and Automatic Learners]]. In: International Conference on Artificial General Intelligence. pp. 187–196. Springer (2015)
  
 Russell, S.J., Norvig, P.: Artificial intelligence: A modern approach. Malaysia; Pear-son Education Limited, (2016) Russell, S.J., Norvig, P.: Artificial intelligence: A modern approach. Malaysia; Pear-son Education Limited, (2016)
/var/www/cadia.ru.is/wiki/data/attic/public/t-720-atai/atai-20/engineering_assignment_1.1598261516.txt.gz · Last modified: 2024/04/29 13:32 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki