public:t-720-atai:atai-21:engineering_assignment_1
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
public:t-720-atai:atai-21:engineering_assignment_1 [2021/09/16 10:12] – leonard | public:t-720-atai:atai-21:engineering_assignment_1 [2024/04/29 13:33] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 29: | Line 29: | ||
Run the code:\\ | Run the code:\\ | ||
$ python main.py | $ python main.py | ||
+ | | ||
+ | For the first task (Deep-Reinforcement-Learning) you will need to install pytorch. Since this is different depending on which OS you use and whether you have a GPU which supports CUDA (Nvidia GPUs only) you should follow the installation instructions [[https:// | ||
+ | |here]]. | ||
Zip Files:\\ | Zip Files:\\ | ||
{{: | {{: | ||
+ | {{: | ||
{{: | {{: | ||
Line 43: | Line 47: | ||
- Plot its improvement in performance over time. | - Plot its improvement in performance over time. | ||
- **Modified Version.** Evaluate the learner’s performance on a modified version of the cart-pole task. For this you should evaluate at least two of the following modifications of the environments and compare them to the results from 1.: | - **Modified Version.** Evaluate the learner’s performance on a modified version of the cart-pole task. For this you should evaluate at least two of the following modifications of the environments and compare them to the results from 1.: | ||
- | - Noise on observation/ action/ | + | - Noise on observation and action. |
- Hide each variable once (x, v, theta, omega) and run the setup with only three observables. | - Hide each variable once (x, v, theta, omega) and run the setup with only three observables. | ||
- Introduce extremely high noise on one observable for all four observables (three normal, one noisy variables). | - Introduce extremely high noise on one observable for all four observables (three normal, one noisy variables). | ||
- Change the task after a certain amount of epochs. Think of at least three different changes, one is given as an example in the code. | - Change the task after a certain amount of epochs. Think of at least three different changes, one is given as an example in the code. | ||
- Change the discreteness of time/ observables increasing or decreasing the variable resolution. | - Change the discreteness of time/ observables increasing or decreasing the variable resolution. | ||
+ | - Calculate the average score, median score, maximum score, and standard deviation of each task. | ||
\\ | \\ | ||
- | \\ | + | |
=== Further information === | === Further information === | ||
In the file “ac_learner.py” you can find the source code of an actor critic learner. | In the file “ac_learner.py” you can find the source code of an actor critic learner. | ||
Line 75: | Line 80: | ||
After the agent runs for the defined max number of epochs a plot is created of iterations per epoch. Use this to evaluate the learning performance of the learner. Extend the plots to include whatever information you deem to be important. | After the agent runs for the defined max number of epochs a plot is created of iterations per epoch. Use this to evaluate the learning performance of the learner. Extend the plots to include whatever information you deem to be important. | ||
- | **HINT** You are allowed to change any parts of the learner or the environment, | + | **HINT** You are allowed to change any parts of the learner or the environment, |
Line 128: | Line 133: | ||
- **From now on only play on the two conditions in which you performed worst** | - **From now on only play on the two conditions in which you performed worst** | ||
- Invert the forces by pressing the “i” key on your keyboard during a run (after 5-10 restarts/ fails) and continue for another 5-10 episodes. Do this in the two conditions in which you performed worst earlier in the same order as previously. (Redo instruction 3 with this inversion). What can you say about your learning speed with force inversion? | - Invert the forces by pressing the “i” key on your keyboard during a run (after 5-10 restarts/ fails) and continue for another 5-10 episodes. Do this in the two conditions in which you performed worst earlier in the same order as previously. (Redo instruction 3 with this inversion). What can you say about your learning speed with force inversion? | ||
- | - Apply the following settings to the environment (all of them at the same time): | ||
- | - Only the variables x, v, omega are observables. | ||
- | - Do not apply noise. | ||
- | - Set the environment to run asynchronous. | ||
- | - And replay the game as stated under instruction 3. | ||
- Reset the settings back to the ones from the beginning and replay the game (as described in the third instruction) on the two conditions you performed worst in the first tries. | - Reset the settings back to the ones from the beginning and replay the game (as described in the third instruction) on the two conditions you performed worst in the first tries. | ||
- Compare your results from the first tries from number 3 to the others. What can you conclude about the possibilities of cumulative, life-long learning? | - Compare your results from the first tries from number 3 to the others. What can you conclude about the possibilities of cumulative, life-long learning? | ||
- | \\ | ||
\\ | \\ | ||
Line 170: | Line 169: | ||
- Is RL in any way similar to how humans learn? If ' | - Is RL in any way similar to how humans learn? If ' | ||
- Compare the RL to human learning. | - Compare the RL to human learning. | ||
- | |||
- | |||
- | |||
- | |||
/var/www/cadia.ru.is/wiki/data/attic/public/t-720-atai/atai-21/engineering_assignment_1.1631787161.txt.gz · Last modified: 2024/04/29 13:32 (external edit)