Differences

This shows you the differences between two versions of the page.

--- public:t-720-atai:atai-20:engineering_assignment_2 [2020/09/07 10:11] – leonard
+++ public:t-720-atai:atai-20:engineering_assignment_2 [2024/04/29 13:33] (current) – external edit 127.0.0.1
@@ Line 16: / Line 16: @@
 The same installation as for assignment 1 can be used.
-===The Game===
+====The Game====
 **Condition 1: SYMBOLIC**\\
 You are presented with an alphanumeric display of a continuously updated state of observable variables relevant to the task (e.g. x, v, theta, and omega). With the arrow keys you can apply, as previously the reinforcement learner, a force of -10 (left) or +10 (right) Newton to the cart. Your task is to keep the pole upright for as long as possible. In the top-right you can see your current score (the total reward you achieved in this epoch (+1 for each successful iteration)), in the center are the values of the observables. You can set the environment to run synchronous or async, meaning that, if sync is not set the environment updates automatically after 100 ms. Further, you can invert the forces by pressing the i key on your keyboard.\\
@@ Line 43: / Line 43: @@
 The code for the cart-pole task-environment used for this is exactly the same as in Assignment 1 and can/ should therefore be reused completely.
-===Your task:===
+====Your task:====
 **Read the full instructions before beginning.**\\
   - Your task is to get good at performing the task in all 4 Levels. Record the data for your training in each condition.
   - Apply the following settings to the environment (If you downloaded and use the provided environment this should be the case already):
-    - All variables (x, v, theta, omega) are observables
+    - All variables (x, v, theta, omega) are observables.
-    - Apply noise to the environment with a mean of 0 and a standard deviation of
+    - Apply noise (all of them at the same time) to the environment with a mean of 0 and a standard deviation of
       - x: 0.2 m
       - v: 0.2 m/s
@@ Line 54: / Line 54: @@
       - omega: 0.2 rad/s
     - Set the environment to run asynchronous
-  - Play the game on Conditions 1, 2, 3, and 4 in that order for at least 10 runs each and note for each condition your
+  - Play the game on Conditions 1, 2, 3, and 4 in that order for at least 10 epochs (or better phrased: until you are confident in playing in the condition, but for at least 10 epochs) each and note for each condition your
-    - highest score
+    - highest score,
-    - average score and standard deviation
+    - average score and standard deviation,
-    - mean score
+    - median score.
-  - Invert the forces by pressing the “i” key on your keyboard during a run (after 5-10 restarts/ fails) and continue for another 5-10 episodes. Do this in all four conditions in the same order as previously. (Redo instruction 3 with this inversion). What can you say about your learning speed with force inversion.
+  - Invert the forces by pressing the “i” key on your keyboard during a run (after 5-10 restarts/ fails) and continue for another 5-10 episodes. Do this in all four conditions in the same order as previously. (Redo instruction 3 with this inversion). What can you say about your learning speed with force inversion?
-  - Apply the following settings to the environment:
+  - Apply the following settings to the environment (all of them at the same time):
-    - Only the variables x, v, omega are observables
+    - Only the variables x, v, omega are observables.
-    - Do not apply noise
+    - Do not apply noise.
-    - Set the environment to run asynchronous
+    - Set the environment to run asynchronous.
   - And replay the game as stated under instruction 3.
   - Change the environment in any way you like to make it harder (or easier) to play note your changes and your scores accordingly.
-  - Reset the settings back to the ones from the first instruction and replay the game (as described in the third instruction)
+  - Reset the settings back to the ones from the beginning and replay the game (as described in the third instruction).
   - Compare your results from the first tries from number 3 to the other, and especially the last tries from number 7. What can you conclude about the possibilities of cumulative, life-long learning?
-  - Write a report on your results including the different scores and comparisons between the different tries. Compare your results with the results from the last assignment and discuss. Discuss the advantages, and disadvantages of human learning (and human nature) this might include (but is not restricted to):
+  - **Report**. Write a report on your results including the different scores and comparisons between the different tries. Compare your results with the results from the last assignment and discuss. Discuss the advantages, and disadvantages of human learning (and human nature) this might include (but is not restricted to):
-    - Previously acquired knowledge used in this game
+    - Previously acquired knowledge used in this game.
-    - Cumulative learning
+    - Cumulative learning.
-    - Transfer learning
+    - Transfer learning.
-    - Boredom
+    - Boredom.
-===Further information===
+====Further information====
 Try to not only do what you are asked, but rather investigate learning further by yourself (if you have the time to do so, of course). Some of the things you could to:
-  - Change dt, make it faster or slower
+  - Change dt, make it faster or slower.
   - Try out the synchronous environment - this is actually, how the RL plays the game
   - Try the things you changed in assignment 1 for the RL now on yourself, for example changing the task in the middle of a run.
-  - Try even more (things the RL could not do, for example change the observation state during a run (e.g. after 100 iterations))
+  - Try even more (things the RL could not do, for example change the observation state during a run (e.g. after 100 iterations)).
   - What else can you think of?
-Try to make a point on the advantages and disadvantages of cumulative/ human learning.\\
+Try to make a point on the advantages and disadvantages of cumulative/ human learning.
+\\
 \\
 \\
 **Set up the conditions:**\\
 For most of the tasks given to you all you need is to change parts in the main.py or cart_pole_env.py, I hope it is self explaining, if any questions arise please ask!