public:t-720-atai:atai-21:engineering_assignment_1
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
public:t-720-atai:atai-21:engineering_assignment_1 [2021/09/14 22:31] – [The Game] arash | public:t-720-atai:atai-21:engineering_assignment_1 [2024/04/29 13:33] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 10: | Line 10: | ||
\\ | \\ | ||
- | **Aim:** This assignment is meant to give you a better insight into cumulative | + | **Aim:** This assignment is meant to give you a better insight into deep reinforcement |
- | **Summary: | + | **Summary: |
- | The files for this assignment can be found here: {{: | + | |
+ | Additionally | ||
\\ | \\ | ||
- | \\ | + | |
+ | | {{/ | ||
+ | | The cart-pole task. | | ||
=== Setup === | === Setup === | ||
Install python3 on your computer (https:// | Install python3 on your computer (https:// | ||
- | Download the attached zip file and extract it to some location | + | Download the attached zip file and extract it to some location and cd into the folder.\\ |
Install the included requirements.txt file:\\ | Install the included requirements.txt file:\\ | ||
$ pip install -r requirements.txt | $ pip install -r requirements.txt | ||
Run the code:\\ | Run the code:\\ | ||
$ python main.py | $ python main.py | ||
+ | | ||
+ | For the first task (Deep-Reinforcement-Learning) you will need to install pytorch. Since this is different depending on which OS you use and whether you have a GPU which supports CUDA (Nvidia GPUs only) you should follow the installation instructions [[https:// | ||
+ | |here]]. | ||
Zip Files:\\ | Zip Files:\\ | ||
- | {{: | + | {{: |
+ | {{: | ||
+ | {{: | ||
- | The requirements.txt file was tested using python 3.9. If you use a different python version or have problems with the installation please contact us early enough before the deadline so that we can help sort it out. | ||
+ | ====Assignment 1.1: Deep Reinforcement Learning==== | ||
+ | |||
+ | ===Your task:=== | ||
+ | |||
+ | - **Plain Vanilla.** Evaluate the actor-critic’s performance on the cart-pole task given to you as python code: | ||
+ | - Run the learner repeatedly; collect the data. Stop each run when either 1000 epochs are reached or the agent manages to get more than 200 iterations in average per epoch over at least 100 continuous epochs (this is usually the case at around 400-500 epochs). | ||
+ | - Plot its improvement in performance over time. | ||
+ | - **Modified Version.** Evaluate the learner’s performance on a modified version of the cart-pole task. For this you should evaluate at least two of the following modifications of the environments and compare them to the results from 1.: | ||
+ | - Noise on observation and action. | ||
+ | - Hide each variable once (x, v, theta, omega) and run the setup with only three observables. | ||
+ | - Introduce extremely high noise on one observable for all four observables (three normal, one noisy variables). | ||
+ | - Change the task after a certain amount of epochs. Think of at least three different changes, one is given as an example in the code. | ||
+ | - Change the discreteness of time/ observables increasing or decreasing the variable resolution. | ||
+ | - Calculate the average score, median score, maximum score, and standard deviation of each task. | ||
\\ | \\ | ||
+ | |||
+ | === Further information === | ||
+ | In the file “ac_learner.py” you can find the source code of an actor critic learner. | ||
+ | |||
+ | In the file “cart_pole_env.py” you can find the source code for the task environment. In order to evaluate the learner on the different environment settings the following methods are of importance: | ||
+ | |||
+ | apply_observation_noise: | ||
+ | Apply noise only to the observations received by the agent, NOT the internal environment variables | ||
+ | |||
+ | apply_action_noise: | ||
+ | Apply noise to the force applied to the cart-pole | ||
+ | |||
+ | apply_environment_noise: | ||
+ | Apply noise to the dynamics of the environment, | ||
+ | |||
+ | adjust_task: | ||
+ | Change the task in a chosen manner | ||
+ | |||
+ | apply_discretization: | ||
+ | discretize the data passed to the agent | ||
+ | |||
+ | |||
+ | In each of these methods you can implement a different method to adjust the task or the information passed to or from the agent. In “env.py” a helper class for noise is included, which you can use. | ||
+ | |||
+ | After the agent runs for the defined max number of epochs a plot is created of iterations per epoch. Use this to evaluate the learning performance of the learner. Extend the plots to include whatever information you deem to be important. | ||
+ | |||
+ | **HINT** You are allowed to change any parts of the learner or the environment, | ||
+ | |||
+ | |||
\\ | \\ | ||
+ | \\ | ||
+ | \\ | ||
+ | |||
+ | ====Assignment 1.2: Human Learning==== | ||
Line 76: | Line 131: | ||
- average score and standard deviation, | - average score and standard deviation, | ||
- median score. | - median score. | ||
- | - Invert the forces by pressing the “i” key on your keyboard during a run (after 5-10 restarts/ fails) and continue for another 5-10 episodes. Do this in all four conditions in the same order as previously. (Redo instruction 3 with this inversion). What can you say about your learning speed with force inversion? | + | |
- | - Apply the following settings to the environment (all of them at the same time): | + | |
- | - Only the variables x, v, omega are observables. | + | - Reset the settings back to the ones from the beginning and replay the game (as described in the third instruction) |
- | - Do not apply noise. | + | - Compare your results from the first tries from number 3 to the others. What can you conclude about the possibilities of cumulative, life-long learning? |
- | - Set the environment to run asynchronous. | + | |
- | - And replay the game as stated under instruction 3. | + | \\ |
- | - Change the environment in any way you like to make it harder (or easier) to play note your changes and your scores accordingly. | + | |
- | - Implement the adjust_task function and at least one other in the cart_pole_env.py file (e.g. apply_action_noise, | + | |
- | - Play the game using those implemented conditions (One by one and all of them together). | + | |
- | - Again note your scores under these different conditions. | + | |
- | - Reset the settings back to the ones from the beginning and replay the game (as described in the third instruction). | + | |
- | - Compare your results from the first tries from number 3 to the other, and especially the last tries from number 8. What can you conclude about the possibilities of cumulative, life-long learning? | + | |
- | - **Report**. Write a report on your results including the different scores and comparisons between the different tries. Discuss the advantages, and disadvantages of human learning (and human nature) this might include (but is not restricted to): | + | |
- | - Previously acquired knowledge used in this game. | + | |
- | - Cumulative learning. | + | |
- | - Transfer learning. | + | |
- | - Boredom. | + | |
====Further information==== | ====Further information==== | ||
Line 98: | Line 142: | ||
- Change dt, make it faster or slower. | - Change dt, make it faster or slower. | ||
- Try out the synchronous environment - this is actually, how reinforcement learners would play the game. | - Try out the synchronous environment - this is actually, how reinforcement learners would play the game. | ||
- | - Try the other functions which you maybe did not implement | + | - Try the things |
- | - Try even more (for example change the observation state during a run (e.g. after 100 iterations)). | + | - Try even more (things the RL could not do, for example change the observation state during a run (e.g. after 100 iterations)). |
- What else can you think of? | - What else can you think of? | ||
- You can adjust the plot_iterations function in the env.py file to plot additional information like mean, std_dev etc. | - You can adjust the plot_iterations function in the env.py file to plot additional information like mean, std_dev etc. | ||
Line 106: | Line 150: | ||
\\ | \\ | ||
\\ | \\ | ||
+ | |||
+ | |||
+ | |||
+ | ====Assignment 1.3: Report==== | ||
+ | |||
+ | - **Report.** Write a 4-5 page report where you describe your results. Draw some insights in relation to learning in general and try to make some generalizations based on them, and discuss, e.g.: | ||
+ | - Regarding RL: | ||
+ | - When does the actor-critic learner fail? | ||
+ | - Which changes will be // | ||
+ | - What is your opinion of the // | ||
+ | - Regarding human learning: | ||
+ | - Discuss the advantages, and disadvantages of human learning (and human nature). | ||
+ | - This might include (but is not restricted to): | ||
+ | - Previously acquired knowledge used in this game. | ||
+ | - Cumulative learning. | ||
+ | - Transfer learning. | ||
+ | - Boredom. | ||
+ | - Is RL in any way similar to how humans learn? If ' | ||
+ | - Compare the RL to human learning. | ||
+ | |||
+ | |||
+ | |||
\\ | \\ | ||
**Set up the conditions: | **Set up the conditions: | ||
For most of the tasks given to you all you need is to change parts in the main.py or cart_pole_env.py, | For most of the tasks given to you all you need is to change parts in the main.py or cart_pole_env.py, |
/var/www/cadia.ru.is/wiki/data/attic/public/t-720-atai/atai-21/engineering_assignment_1.1631658694.txt.gz · Last modified: 2024/04/29 13:32 (external edit)