Table of Contents

Task Teory

T-713-MERS-2025 Main
Link to Lecture Notes

Task Teory

What it is	A systematic framework for describing, comparing, and analyzing tasks, independent of any specific agent. Provides the foundations for evaluating intelligent systems empirically (measurable outcomes, repeatable experiments, controlled variables).
Purpose	In AI today, evaluation is benchmark-driven (ImageNet, Atari, Go) but lacks a unifying science. Task theory proposes such a foundation: comparable to physics for engineering. It lets us treat tasks as objects of study, enabling systematic experimentation in Empirical Reasoning Systems (ERS).
Task vs. Environment	A Task is a desired transformation of the world (e.g., get ball into goal). The Environment is the context where variables evolve. Together: ⟨Task, Environment⟩ is the Task-Environment. In ERS, this separation allows us to ask: What is the structure of the problem? before asking how the agent solves it.
Agent Separation	Describing tasks independently of the agent prevents conflating “what is to be achieved” with “who/what achieves it.” This is central for ERS: it allows us to evaluate reasoning systems across different domains and agents.
Why Important	Enables: (1) Comparison of tasks across domains; (2) Abstraction into task classes; (3) Estimation of resource needs (time, energy, precision); (4) General evaluation of reasoning systems, beyond one-off benchmarks.
Example Analogy	In physics, wind tunnels test many airplane designs under the same controlled conditions. In ERS, task theory plays a similar role: controlling task variables so that reasoning systems can be compared fairly.

Task: How it Hangs Together

T: A Task	T = { G, V, F, C }
G: Goal	Set of desired states or outcomes. Goals define what counts as “success” from the observer’s perspective. Example: robot reaches waypoint within 1m tolerance.
V: Variables	V = { v₁, v₂, … }. Measurable and manipulatable aspects of the environment relevant to the task. Observer defines these formally (e.g., position, temperature); agent may only have partial/noisy access.
F: Transformation Rules	Describe how variables evolve (physics, rules of a game, causal dynamics). These are objective world relations, available in principle to the observer. Agents must infer or approximate them.
C: Constraints	Boundaries of what is possible (time, energy, error bounds, resource limits). Again, observer’s perspective = formal definition; agent’s perspective = experienced as difficulty or failure when limits are exceeded.
Simple Task	Few variables, deterministic (press a button).
Complex Task	Many variables, uncertainty, multi-step (cooking, multi-agent negotiation).

Intricacy & Difficulty

Intricacy (Observer)	Structural complexity of a task, derived from number of variables, their couplings, and constraints in {V, F, C}. Defined independently of the agent.
Effective Intricacy (Agent)	How complicated the task appears to an agent, given its sensors, prior knowledge, reasoning, and precision. For a perfect agent, effective intricacy → 0.
Intricacy of Tasks	Based on (at least) three dimensions:
	The minimal number of causal-relational models needed to represent the relations of the causal structure related to the goal(s).
	The number, length and type of mechanisms of causal chains that affect observable variables on a causal path to at least one goal.
	The number of hidden confounders influencing causal structures related to the goal.
Difficulty	A relation: Difficulty(T, Agent) = f(Intricacy(T), Agent Capacities). Same task can be easy for one agent, impossible for another.
Example	Catching a ball: Observer sees physical intricacy (variables: position, velocity, gravity, timing). Agent: a human child has low effective intricacy after learning; a simple robot has very high effective intricacy.
Connection to ERS	Difficulty is the bridge between objective task description (for observers) and empirical performance measures (for agents). ERS requires both views: tasks must be defined in the world (observer) but evaluated through agent behavior.

Example of a Task with different Intricacy

Taken from About the Intricacy of Tasks by L.M. Eberding et al.

Dimensions of Task Environments (Thórisson et al., 2015)

Determinism	Whether the same action in the same state always leads to the same result (deterministic) or whether outcomes vary (stochastic).
Ergodicity	The degree to which all relevant states can in principle be reached, and how evenly/consistently they can be sampled through interaction.
Controllable Continuity	Whether small changes in agent output produce small, continuous changes in the environment (high continuity) or abrupt/discontinuous ones (low continuity).
Asynchronicity	Whether the environment changes only in response to the agent (synchronous) or independently of it, on its own time (asynchronous).
Dynamism	Extent to which the environment changes over time without agent input; static vs. dynamic worlds.
Observability	How much of the environment state is accessible to the agent (full, partial, noisy).
Controllability	The extent to which the agent can influence the environment state; fully controllable vs. only partially or weakly controllable.
Multiple Parallel Causal Chains	Whether multiple independent processes can run in parallel, influencing outcomes simultaneously.
Number of Agents	Whether there is only a single agent or multiple agents (cooperative, competitive, or mixed).
Periodicity	Whether the environment exhibits cycles or repeating structures that can be exploited for prediction.
Repeatability	Whether experiments in the environment can be repeated under the same conditions, producing comparable results.

Levels of Detail in Task Theory

What it is	Tasks can be described at different levels of detail — from coarse abstract goals to fine-grained physical variables. The chosen level shapes both evaluation (observer) and execution (agent).
Observer’s Perspective	The observer can choose how finely to specify variables, transformations, and constraints. A higher level of detail allows precise measurement but may make analysis intractable.
Agent’s Perspective	The agent perceives and reasons at its own level of detail, often coarser than the environment’s “true” detail. Mismatch between observer’s definition and agent’s accessible level creates difficulty.
Coarse Level	Only abstract goals and broad categories of variables are specified. Example: “Deliver package to location.”
Intermediate Level	Includes some measurable variables and causal relations. Example: “Move package from x to y using navigation map.”
Fine Level	Explicit representation of detailed physical dynamics, constraints, and noise. Example: “Motor torque, wheel slip, GPS error bounds, battery usage.”
Implications for ERS	Enables systematic scaling of task complexity in experiments. Supports fair comparison: two agents can be tested at the same or different levels of detail. Clarifies where errors originate: poor reasoning vs. inadequate detail in task definition.

Intricacy and Level of Detail

Maximum Intricacy	Any agent that is constrained by resources (time, energy, computation power, etc.) has a maximal intricacy of tasks it can solve.
Problem	Even simple tasks like walking to the bus station, if defined in the finest level of detail (every motor command, etc.), have a massive intricacy attached. Planning through every step is computationally infeasible.
Changing the task	If a task is too intricate to be performed, the task must be adjusted to fit the agent's capabilities. However, we still want to get the task done!
Changing the Level of Detail	Is the only way to change the task, thus changing its intricacy without losing the goal of the task.

Why Task Theory Matters for Empirical Reasoning

For Science (Observer)	Provides systematic, measurable, repeatable description of tasks — necessary for empirical study of reasoning systems. Comparable to controlled experiments in physics or biology.
For Engineering (Agent & System Design)	Allows construction of benchmarks that measure generality (performance across task classes), not just single skills. Supports systematic curricula for training agents.
For Empirical Evaluation (ERS Core)	Clarifies whether failure is due to the task (high intricacy, under-specified goals) or the agent (limited sensors, reasoning). Enables falsifiable claims about system capability.