Differences

This shows you the differences between two versions of the page.

--- public:t-713-mers:mers-25:task_theory [2025/08/26 11:43] – leonard
+++ public:t-713-mers:mers-25:task_theory [2025/09/02 08:17] (current) – [Why Task Theory Matters for Empirical Reasoning] leonard
@@ Line 10: / Line 10: @@
 |  \\ What it is  | A systematic framework for describing, comparing, and analyzing **tasks**, independent of any specific agent. Provides the foundations for evaluating intelligent systems empirically (measurable outcomes, repeatable experiments, controlled variables). |
 |  Purpose  | In AI today, evaluation is benchmark-driven (ImageNet, Atari, Go) but lacks a unifying science. Task theory proposes such a foundation: comparable to physics for engineering. It lets us treat **tasks as objects of study**, enabling systematic experimentation in Empirical Reasoning Systems (ERS). |
-|  Task vs. Environment  | A **Task** is a desired transformation of the world (e.g., get ball into goal). The **Environment** is the context where variables evolve. Together: ⟨Task, Environment⟩. In ERS, this separation allows us to ask: *what is the structure of the problem?* before asking *how the agent solves it*. |
+|  Task vs. Environment  | A **Task** is a desired transformation of the world (e.g., get ball into goal). The **Environment** is the context where variables evolve. Together: ⟨Task, Environment⟩ is the Task-Environment. In ERS, this separation allows us to ask: *What is the structure of the problem?* before asking *how the agent solves it*. |
 |  Agent Separation  | Describing tasks independently of the agent prevents conflating “what is to be achieved” with “who/what achieves it.” This is central for ERS: it allows us to evaluate reasoning systems across different domains and agents. |
 |  Why Important  | Enables: (1) **Comparison** of tasks across domains; (2) **Abstraction** into task classes; (3) **Estimation** of resource needs (time, energy, precision); (4) **General evaluation** of reasoning systems, beyond one-off benchmarks. |
@@ Line 25: / Line 25: @@
 |  Simple Task  | Few variables, deterministic (press a button). |
 |  Complex Task  | Many variables, uncertainty, multi-step (cooking, multi-agent negotiation). |
-\\
-==== Example of a task ====
-{{ :public:t-713-mers:tasktheoryflowchart.png?nolink&800 |}}
-Taken from [[https://www.researchgate.net/profile/Kristinn-Thorisson/publication/357637172_About_the_Intricacy_of_Tasks/links/620d1c8fc5934228f9701333/About-the-Intricacy-of-Tasks.pdf|About the Intricacy of Tasks]] by L.M. Eberding et al.
 \\
@@ Line 37: / Line 31: @@
 |  Intricacy (Observer)  | Structural complexity of a task, derived from number of variables, their couplings, and constraints in {V, F, C}. Defined **independently of the agent**.  |
 |  Effective Intricacy (Agent)  | How complicated the task **appears to an agent**, given its sensors, prior knowledge, reasoning, and precision. For a perfect agent, effective intricacy → 0. |
+|  Intricacy of Tasks  | Based on (at least) three dimensions: |
+|  | The minimal number of causal-relational models needed to represent the relations of the causal structure related to the goal(s). |
+|  | The number, length and type of mechanisms of causal chains that affect observable variables on a causal path to at least one goal. |
+|  | The number of hidden confounders influencing causal structures related to the goal. |
 |  Difficulty  | A relation: **Difficulty(T, Agent) = f(Intricacy(T), Agent Capacities)**. Same task can be easy for one agent, impossible for another. |
 |  Example  | Catching a ball: Observer sees physical intricacy (variables: position, velocity, gravity, timing). Agent: a human child has low effective intricacy after learning; a simple robot has very high effective intricacy. |
 |  Connection to ERS  | Difficulty is the bridge between **objective task description** (for observers) and **empirical performance measures** (for agents). ERS requires both views: tasks must be defined **in the world** (observer) but evaluated **through agent behavior**. |
+\\
+==== Example of a Task with different Intricacy ====
+{{ :public:t-713-mers:tasktheoryflowchart.png?nolink&700 |}}
+Taken from [[https://www.researchgate.net/profile/Kristinn-Thorisson/publication/357637172_About_the_Intricacy_of_Tasks/links/620d1c8fc5934228f9701333/About-the-Intricacy-of-Tasks.pdf|About the Intricacy of Tasks]] by L.M. Eberding et al.
 \\
@@ Line 82: / Line 86: @@
 |  For Engineering (Agent & System Design)  | Allows construction of benchmarks that measure **generality** (performance across task classes), not just single skills. Supports systematic curricula for training agents. |
 |  For Empirical Evaluation (ERS Core)  | Clarifies whether failure is due to the **task** (high intricacy, under-specified goals) or the **agent** (limited sensors, reasoning). Enables falsifiable claims about system capability. |
-|  Reflection  | In ERS, intelligence boils down to: *Given a formally defined task, how well does an agent reason about it empirically, under uncertainty and constraints?* Task theory provides the shared language to answer this. |
 \\
-==== Discussion Prompts ====
-|  Question  | Observer Angle  | Agent Angle  |
-|------------|-----------------|--------------|
-|  How is a "task" different from a "problem" in classical AI?  | Problem = symbolic puzzle; Task = measurable transformation in a world | Must act in the world to achieve it |
-|  Why must tasks be agent-independent?  | To compare systems systematically | Otherwise evaluation collapses into “how this agent did” |
-|  Can you think of a task with low intricacy but high difficulty for humans?  | Observer: low variable count | Agent: limited memory/attention makes it hard (e.g., memorizing 200 digits) |
-|  What role does causality play in defining tasks?  | Observer: rules F define dynamics | Agent: must infer/approximate causal relations from data |
-|  How does a variable-task simulator (like SAGE) help ERS?  | Observer: controls task parameters systematically | Agent: experiences wide range of tasks, supports empirical generality tests |