Differences

This shows you the differences between two versions of the page.

--- public:t-720-atai:atai-20:engineering_assignment_3 [2020/09/20 11:02] – [Part 2 - ONA on the cart-pole task] thorisson
+++ public:t-720-atai:atai-20:engineering_assignment_3 [2024/04/29 13:33] (current) – external edit 127.0.0.1
@@ Line 91: / Line 91: @@
     * 1.a Run the learner repeatedly (at least 5 times); collect the data. Stop each run when 300 epochs are reached.
     * 1.b Plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar).
-  - **Modified Version.** Evaluate the learner’s performance on a modified version of the cart-pole task and compare them to the results from the plain vanilla runs:
+  - **Modified Version**. Evaluate the learner’s performance on a modified version of the cart-pole task and compare them to the results from the plain vanilla runs:
     * 2.a **Limited Observations.** Hide each variable one by one (from the start of the experiment) and run ONA at least 5 times for each condition, at least 300 epochs (increase this number, if you believe that it is necessary), and plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar).
-    * 2.b **Sudden Availability.** Hide each variable one by one from the start of the experiment; then expose it after 200 epochs. Let the system run until it relearns the new task, and continue for another 200 epochs before hiding the variable again and continuing for another 200 epochs. Do this at least 5 times per variable. Stop if ONA cannot relearn the new task after 1500 episodes. Plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar).
+    * 2.b **Sudden Availability**. Hide each variable one by one from the start of the experiment; then expose it after 200 epochs. Let the system run until it relearns the new task, and continue for another 200 epochs before hiding the variable again and continuing for another 200 epochs. Do this at least 5 times per variable. Stop if ONA cannot relearn the new task after 1500 episodes. Plot its improvement in performance over time (for example the mean of the 5+ runs or a running average or similar).
-    * 2.c **Sudden Disappearance.** The exact opposite of b): All variables (x, v, theta, omega) are exposed at the beginning and a single variable hidden after 200 epochs before re-exposed. Apply the same epoch rules of hiding/ exposure as described in b) (just the other way around). Do this for the variables one by one again at least 5 times and plot your results.
+    * 2.c **Sudden Disappearance**. The exact opposite of b): All variables (x, v, theta, omega) are exposed at the beginning and a single variable hidden after 200 epochs before re-exposed. Apply the same epoch rules of hiding/ exposure as described in b) (just the other way around). Do this for the variables one by one again at least 5 times and plot your results.
-    * 2.d **Custom Task Mod.** Think of a way to change the task in some way after a certain amount of epochs (e.g. 200). Think of at least three different changes, remember the first assignment.
+    * 2.d **Custom Task Mod**. Think of a way to change the task in some way after a certain amount of epochs (e.g. 200). Think of at least three different changes, remember the first assignment.
     * 2.e **Try Out Your Own Ideas** with ONA. For example, what happens if you change the discretization of the observations (see the data to Narsese parsing)? What happens if you change the reward conditions? Try things out that you are curious about and try to figure out some of the possibilities (and limitations) of ONA.
   - **Report.** Summarize your results in a report. Compare them to the results from the first assignment (where appropriate). Try to explain your results. What makes ONA different from a Deep Reinforcement Learner?