Differences

This shows you the differences between two versions of the page.

--- public:t-720-atai:atai-21:engineering_assignment_1 [2021/09/16 10:12] – leonard
+++ public:t-720-atai:atai-21:engineering_assignment_1 [2024/04/29 13:33] (current) – external edit 127.0.0.1
@@ Line 29: / Line 29: @@
 Run the code:\\
   $ python main.py
+For the first task (Deep-Reinforcement-Learning) you will need to install pytorch. Since this is different depending on which OS you use and whether you have a GPU which supports CUDA (Nvidia GPUs only) you should follow the installation instructions [[https://pytorch.org/get-started/locally/
+|here]].
 Zip Files:\\
 {{:public:t-720-atai:atai-21:assignment_1_rl.zip|Assignment 1 Reinforcement Learning}}\\
+{{:public:t-720-atai:atai-21:assignment_1_rl_new.zip|Assignment 1 Reinforcement Learning Updated env.py file to correctly apply action noise}}\\
 {{:public:t-720-atai:atai-21:assignment_1_hl.zip|Assignment 1 Human Learning}}
@@ Line 43: / Line 47: @@
     - Plot its improvement in performance over time.
   - **Modified Version.** Evaluate the learner’s performance on a modified version of the cart-pole task. For this you should evaluate at least two of the following modifications of the environments and compare them to the results from 1.:
-    - Noise on observation/ action/ and environment dynamics.
+    - Noise on observation and action.
     - Hide each variable once (x, v, theta, omega) and run the setup with only three observables.
     - Introduce extremely high noise on one observable for all four observables (three normal, one noisy variables).
     - Change the task after a certain amount of epochs. Think of at least three different changes, one is given as an example in the code.
     - Change the discreteness of time/ observables increasing or decreasing the variable resolution.
+  - Calculate the average score, median score, maximum score, and standard deviation of each task.
 \\
-\\
 === Further information ===
 In the file “ac_learner.py” you can find the source code of an actor critic learner.
@@ Line 75: / Line 80: @@
 After the agent runs for the defined max number of epochs a plot is created of iterations per epoch. Use this to evaluate the learning performance of the learner. Extend the plots to include whatever information you deem to be important.
-**HINT** You are allowed to change any parts of the learner or the environment, just make sure to document all changes and explain how and why they influence the learning performance of the agent.
+**HINT** You are allowed to change any parts of the learner or the environment, just make sure to document all changes and explain //**how and why**// they influence the learning performance of the agent.
@@ Line 128: / Line 133: @@
   - **From now on only play on the two conditions in which you performed worst**
   - Invert the forces by pressing the “i” key on your keyboard during a run (after 5-10 restarts/ fails) and continue for another 5-10 episodes. Do this in the two conditions in which you performed worst earlier in the same order as previously. (Redo instruction 3 with this inversion). What can you say about your learning speed with force inversion?
-  - Apply the following settings to the environment (all of them at the same time):
-    - Only the variables x, v, omega are observables.
-    - Do not apply noise.
-    - Set the environment to run asynchronous.
-  - And replay the game as stated under instruction 3.
   - Reset the settings back to the ones from the beginning and replay the game (as described in the third instruction) on the two conditions you performed worst in the first tries.
   - Compare your results from the first tries from number 3 to the others. What can you conclude about the possibilities of cumulative, life-long learning?
-\\
 \\
@@ Line 170: / Line 169: @@
     - Is RL in any way similar to how humans learn? If 'yes', how? If 'no', what's different, and why?
     - Compare the RL to human learning.