Both sides previous revisionPrevious revisionNext revision | Previous revision |
public:t-720-atai:atai-20:engineering_assignment_3 [2020/09/20 10:45] – [Part 1 - Introduction to ONA:] thorisson | public:t-720-atai:atai-20:engineering_assignment_3 [2024/04/29 13:33] (current) – external edit 127.0.0.1 |
---|
Further, ONA itself needs to be adjusted to work with the cart-pole environment. This needs to be done in order to restrict ONA to only use “^left” and “^right” as actions (similar to the actor-critic or yourself in assignment 1 and 2).\\ | Further, ONA itself needs to be adjusted to work with the cart-pole environment. This needs to be done in order to restrict ONA to only use “^left” and “^right” as actions (similar to the actor-critic or yourself in assignment 1 and 2).\\ |
For this you will have to change a few lines in two c-files of ONA:\\ | For this you will have to change a few lines in two c-files of ONA:\\ |
| |
| \\ |
| |
1. Open …/OpenNARS-for-Applications/src/Shell.c and comment the lines 75-82 such that the Shell_NARInit() function looks like this:\\ | 1. Open …/OpenNARS-for-Applications/src/Shell.c and comment the lines 75-82 such that the Shell_NARInit() function looks like this:\\ |
Only the Atomic Terms “^left” and “^right” should be left\\ | Only the Atomic Terms “^left” and “^right” should be left\\ |
| |
| \\ |
2. Open .../OpenNARS-for-Applications/Config.h and change the value of “OPERATIONS_MAX” in line 86 to 2 (instead of 10):\\ | 2. Open .../OpenNARS-for-Applications/Config.h and change the value of “OPERATIONS_MAX” in line 86 to 2 (instead of 10):\\ |
//Maximum amount of operations which can be registered | //Maximum amount of operations which can be registered |
#define OPERATIONS_MAX 2 | #define OPERATIONS_MAX 2 |
| |
| \\ |
| |
3. Rebuild ONA.\\ | 3. Rebuild ONA.\\ |
===Your Task:=== | ===Your Task:=== |
- **Plain Vanilla.** Evaluate ONA’s performance on the cart-pole task given to you as python code: | - **Plain Vanilla.** Evaluate ONA’s performance on the cart-pole task given to you as python code: |
- Run the learner repeatedly (at least 5 times); collect the data. Stop each run when 300 epochs are reached. | * 1.a Run the learner repeatedly (at least 5 times); collect the data. Stop each run when 300 epochs are reached. |
- Plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar) | * 1.b Plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar). |
- **Modified Version.** Evaluate the learner’s performance on a modified version of the cart-pole task and compare them to the results from the plain vanilla runs: | - **Modified Version**. Evaluate the learner’s performance on a modified version of the cart-pole task and compare them to the results from the plain vanilla runs: |
- Hide each variable one by one at the start of the experiment and run ONA at least 5 times for at least 300 epochs (increase this number, if you believe that it is necessary) and plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar). | * 2.a **Limited Observations.** Hide each variable one by one (from the start of the experiment) and run ONA at least 5 times for each condition, at least 300 epochs (increase this number, if you believe that it is necessary), and plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar). |
- Hide each variable one by one at the start of the experiment and then expose it after 200 epochs. Let the system run until it relearns the new task and continue for another 200 epochs before hiding the variable again and continuing for another 200 epochs. Do this at least 5 times per variable. Stop if ONA cannot relearn the new task after 1500 episodes. Plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar). | * 2.b **Sudden Availability**. Hide each variable one by one from the start of the experiment; then expose it after 200 epochs. Let the system run until it relearns the new task, and continue for another 200 epochs before hiding the variable again and continuing for another 200 epochs. Do this at least 5 times per variable. Stop if ONA cannot relearn the new task after 1500 episodes. Plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar). |
- The exact opposite of b): All variables (x, v, theta, omega) are exposed at the beginning and a single variable hidden after 200 epochs before re-exposed. Apply the same epoch rules of hiding/ exposure as described in b) (just the other way around). Do this for the variables one by one again at least 5 times and plot your results. | * 2.c **Sudden Disappearance**. The exact opposite of b): All variables (x, v, theta, omega) are exposed at the beginning and a single variable hidden after 200 epochs before re-exposed. Apply the same epoch rules of hiding/ exposure as described in b) (just the other way around). Do this for the variables one by one again at least 5 times and plot your results. |
- Change the task after a certain amount of epochs. Think of at least three different changes, remember the first assignment. | * 2.d **Custom Task Mod**. Think of a way to change the task in some way after a certain amount of epochs (e.g. 200). Think of at least three different changes, remember the first assignment. |
- Play around with ONA. What happens if you change the discretization of the observations (see the data to narsese parsing)? What happens if you change the reward conditions? Try things out that interest you and which clarify the possibilities (and restrictions) ONA provides. | * 2.e **Try Out Your Own Ideas** with ONA. For example, what happens if you change the discretization of the observations (see the data to Narsese parsing)? What happens if you change the reward conditions? Try things out that you are curious about and try to figure out some of the possibilities (and limitations) of ONA. |
- **Report.** Summarize your results in a report. Compare them to the results from the first assignment where appropriate. Try to explain your results and what makes ONA different from a Deep Reinforcement Learner? | - **Report.** Summarize your results in a report. Compare them to the results from the first assignment (where appropriate). Try to explain your results. What makes ONA different from a Deep Reinforcement Learner? |
| |
| |
The summary of the results from the first assignment can be found here: {{:public:t-720-atai:atai-20:summary-general-remarks.pdf|}} | The summary of the results from the first assignment can be found here: {{:public:t-720-atai:atai-20:summary-general-remarks.pdf|}} |