T-713-MERS-2025 Main
Link to Lecture Notes
What it is | A systematic framework for describing, comparing, and analyzing tasks, independent of any specific agent. Provides the foundations for evaluating intelligent systems empirically (measurable outcomes, repeatable experiments, controlled variables). |
Purpose | In AI today, evaluation is benchmark-driven (ImageNet, Atari, Go) but lacks a unifying science. Task theory proposes such a foundation: comparable to physics for engineering. It lets us treat tasks as objects of study, enabling systematic experimentation in Empirical Reasoning Systems (ERS). |
Task vs. Environment | A Task is a desired transformation of the world (e.g., get ball into goal). The Environment is the context where variables evolve. Together: ⟨Task, Environment⟩. In ERS, this separation allows us to ask: *what is the structure of the problem?* before asking *how the agent solves it*. |
Agent Separation | Describing tasks independently of the agent prevents conflating “what is to be achieved” with “who/what achieves it.” This is central for ERS: it allows us to evaluate reasoning systems across different domains and agents. |
Why Important | Enables: (1) Comparison of tasks across domains; (2) Abstraction into task classes; (3) Estimation of resource needs (time, energy, precision); (4) General evaluation of reasoning systems, beyond one-off benchmarks. |
Example Analogy | In physics, wind tunnels test many airplane designs under the same controlled conditions. In ERS, task theory plays a similar role: controlling task variables so that reasoning systems can be compared fairly. |
T: A Task | T = { G, V, F, C } | |
G: Goal | Set of desired states or outcomes. Goals define what counts as “success” from the observer’s perspective. Example: robot reaches waypoint within 1m tolerance. | |
V: Variables | V = { v₁, v₂, … }. Measurable and manipulatable aspects of the environment relevant to the task. Observer defines these formally (e.g., position, temperature); agent may only have partial/noisy access. | |
F: Transformation Rules | Describe how variables evolve (physics, rules of a game, causal dynamics). These are objective world relations, available in principle to the observer. Agents must infer or approximate them. | |
C: Constraints | Boundaries of what is possible (time, energy, error bounds, resource limits). Again, observer’s perspective = formal definition; agent’s perspective = experienced as difficulty or failure when limits are exceeded. | |
Simple Task | Few variables, deterministic (press a button). | |
Complex Task | Many variables, uncertainty, multi-step (cooking, multi-agent negotiation). |
Intricacy (Observer) | Structural complexity of a task, derived from number of variables, their couplings, and constraints in {V, F, C}. Defined independently of the agent. |
Effective Intricacy (Agent) | How complicated the task appears to an agent, given its sensors, prior knowledge, reasoning, and precision. For a perfect agent, effective intricacy → 0. |
Intricacy of Tasks | Based on (at least) three dimensions: |
The minimal number of causal-relational models needed to represent the relations of the causal structure related to the goal(s). | |
The number, length and type of mechanisms of causal chains that affect observable variables on a causal path to at least one goal. | |
The number of hidden confounders influencing causal structures related to the goal. | |
Difficulty | A relation: Difficulty(T, Agent) = f(Intricacy(T), Agent Capacities). Same task can be easy for one agent, impossible for another. |
Example | Catching a ball: Observer sees physical intricacy (variables: position, velocity, gravity, timing). Agent: a human child has low effective intricacy after learning; a simple robot has very high effective intricacy. |
Connection to ERS | Difficulty is the bridge between objective task description (for observers) and empirical performance measures (for agents). ERS requires both views: tasks must be defined in the world (observer) but evaluated through agent behavior. |
Taken from About the Intricacy of Tasks by L.M. Eberding et al.
Determinism | Whether the same action in the same state always leads to the same result (deterministic) or whether outcomes vary (stochastic). |
Ergodicity | The degree to which all relevant states can in principle be reached, and how evenly/consistently they can be sampled through interaction. |
Controllable Continuity | Whether small changes in agent output produce small, continuous changes in the environment (high continuity) or abrupt/discontinuous ones (low continuity). |
Asynchronicity | Whether the environment changes only in response to the agent (synchronous) or independently of it, on its own time (asynchronous). |
Dynamism | Extent to which the environment changes over time without agent input; static vs. dynamic worlds. |
Observability | How much of the environment state is accessible to the agent (full, partial, noisy). |
Controllability | The extent to which the agent can influence the environment state; fully controllable vs. only partially or weakly controllable. |
Multiple Parallel Causal Chains | Whether multiple independent processes can run in parallel, influencing outcomes simultaneously. |
Number of Agents | Whether there is only a single agent or multiple agents (cooperative, competitive, or mixed). |
Periodicity | Whether the environment exhibits cycles or repeating structures that can be exploited for prediction. |
Repeatability | Whether experiments in the environment can be repeated under the same conditions, producing comparable results. |
What it is | Tasks can be described at different levels of detail — from coarse abstract goals to fine-grained physical variables. The chosen level shapes both evaluation (observer) and execution (agent). |
Observer’s Perspective | The observer can choose how finely to specify variables, transformations, and constraints. A higher level of detail allows precise measurement but may make analysis intractable. |
Agent’s Perspective | The agent perceives and reasons at its own level of detail, often coarser than the environment’s “true” detail. Mismatch between observer’s definition and agent’s accessible level creates difficulty. |
Coarse Level | Only abstract goals and broad categories of variables are specified. Example: “Deliver package to location.” |
Intermediate Level | Includes some measurable variables and causal relations. Example: “Move package from x to y using navigation map.” |
Fine Level | Explicit representation of detailed physical dynamics, constraints, and noise. Example: “Motor torque, wheel slip, GPS error bounds, battery usage.” |
Implications for ERS | Enables systematic scaling of task complexity in experiments. Supports fair comparison: two agents can be tested at the same or different levels of detail. Clarifies where errors originate: poor reasoning vs. inadequate detail in task definition. |
Maximum Intricacy | Any agent that is constrained by resources (time, energy, computation power, etc.) has a maximal intricacy of tasks it can solve. |
Problem | Even simple tasks like walking to the bus station, if defined in the finest level of detail (every motor command, etc.), have a massive intricacy attached. Planning through every step is computationally infeasible. |
Changing the task | If a task is too intricate to be performed, the task must be adjusted to fit the agent's capabilities. However, we still want to get the task done! |
Changing the Level of Detail | Is the only way to change the task, thus changing its intricacy without losing the goal of the task. |
For Science (Observer) | Provides systematic, measurable, repeatable description of tasks — necessary for empirical study of reasoning systems. Comparable to controlled experiments in physics or biology. |
For Engineering (Agent & System Design) | Allows construction of benchmarks that measure generality (performance across task classes), not just single skills. Supports systematic curricula for training agents. |
For Empirical Evaluation (ERS Core) | Clarifies whether failure is due to the task (high intricacy, under-specified goals) or the agent (limited sensors, reasoning). Enables falsifiable claims about system capability. |
Reflection | In ERS, intelligence boils down to: *Given a formally defined task, how well does an agent reason about it empirically, under uncertainty and constraints?* Task theory provides the shared language to answer this. |