Differences

This shows you the differences between two versions of the page.

--- public:t-720-atai:atai-20:engineering_assignment_3 [2020/09/20 10:59] – [Part 2 - ONA on the cart-pole task] thorisson
+++ public:t-720-atai:atai-20:engineering_assignment_3 [2024/04/29 13:33] (current) – external edit 127.0.0.1
@@ Line 50: / Line 50: @@
 Further, ONA itself needs to be adjusted to work with the cart-pole environment. This needs to be done in order to restrict ONA to only use “^left” and “^right” as actions (similar to the actor-critic or yourself in assignment 1 and 2).\\
 For this you will have to change a few lines in two c-files of ONA:\\
+\\
 . Open …/OpenNARS-for-Applications/src/Shell.c and comment the lines 75-82 such that the Shell_NARInit() function looks like this:\\
@@ Line 72: / Line 74: @@
 Only the Atomic Terms “^left” and “^right” should be left\\
+\\
 . Open .../OpenNARS-for-Applications/Config.h and change the value of “OPERATIONS_MAX” in line 86 to 2 (instead of 10):\\
     //Maximum amount of operations which can be registered
     #define OPERATIONS_MAX 2
+\\
 . Rebuild ONA.\\
@@ Line 84: / Line 89: @@
 ===Your Task:===
   - **Plain Vanilla.** Evaluate ONA’s performance on the cart-pole task given to you as python code:
-    * **1.a** Run the learner repeatedly (at least 5 times); collect the data. Stop each run when 300 epochs are reached.
+    * 1.a Run the learner repeatedly (at least 5 times); collect the data. Stop each run when 300 epochs are reached.
-    * **1.b** Plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar).
+    * 1.b Plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar).
-  - **Modified Version.** Evaluate the learner’s performance on a modified version of the cart-pole task and compare them to the results from the plain vanilla runs:
+  - **Modified Version**. Evaluate the learner’s performance on a modified version of the cart-pole task and compare them to the results from the plain vanilla runs:
-    * **2.a** **Limited Observations** Hide each variable one by one (from the start of the experiment) and run ONA at least 5 times for each condition, at least 300 epochs (increase this number, if you believe that it is necessary), and plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar).
+    * 2.a **Limited Observations.** Hide each variable one by one (from the start of the experiment) and run ONA at least 5 times for each condition, at least 300 epochs (increase this number, if you believe that it is necessary), and plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar).
-    * **2.b** **Sudden Availability** Hide each variable one by one from the start of the experiment; then expose it after 200 epochs. Let the system run until it relearns the new task, and continue for another 200 epochs before hiding the variable again and continuing for another 200 epochs. Do this at least 5 times per variable. Stop if ONA cannot relearn the new task after 1500 episodes. Plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar).
+    * 2.b **Sudden Availability**. Hide each variable one by one from the start of the experiment; then expose it after 200 epochs. Let the system run until it relearns the new task, and continue for another 200 epochs before hiding the variable again and continuing for another 200 epochs. Do this at least 5 times per variable. Stop if ONA cannot relearn the new task after 1500 episodes. Plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar).
-    * **2.c** **The Exact Opposite of b):** All variables (x, v, theta, omega) are exposed at the beginning and a single variable hidden after 200 epochs before re-exposed. Apply the same epoch rules of hiding/ exposure as described in b) (just the other way around). Do this for the variables one by one again at least 5 times and plot your results.
+    * 2.c **Sudden Disappearance**. The exact opposite of b): All variables (x, v, theta, omega) are exposed at the beginning and a single variable hidden after 200 epochs before re-exposed. Apply the same epoch rules of hiding/ exposure as described in b) (just the other way around). Do this for the variables one by one again at least 5 times and plot your results.
-    * **2.d** **Task Mod** Change the task in some way after a certain amount of epochs (e.g. 200). Think of at least three different changes, remember the first assignment.
+    * 2.d **Custom Task Mod**. Think of a way to change the task in some way after a certain amount of epochs (e.g. 200). Think of at least three different changes, remember the first assignment.
-    * **2.e** **Try Out Your Own Ideas** With ONA. For example, what happens if you change the discretization of the observations (see the data to Narsese parsing)? What happens if you change the reward conditions? Try things out that you are curious about and try to figure out some of the possibilities (and limitations) of ONA.
+    * 2.e **Try Out Your Own Ideas** with ONA. For example, what happens if you change the discretization of the observations (see the data to Narsese parsing)? What happens if you change the reward conditions? Try things out that you are curious about and try to figure out some of the possibilities (and limitations) of ONA.
   - **Report.** Summarize your results in a report. Compare them to the results from the first assignment (where appropriate). Try to explain your results. What makes ONA different from a Deep Reinforcement Learner?
 The summary of the results from the first assignment can be found here: {{:public:t-720-atai:atai-20:summary-general-remarks.pdf|}}