User Tools

Site Tools


public:t-720-atai:atai-25:agents_and_control

T-720-ATAI-2025 Main
Links to Lecture Notes



CONTROL




What is a Controller?



Abstract Controller

An abstraction of a controller: A set of processes P that can receive an input, i_t~in~I, produced by and selected from a task-environment, current state S, at least one goal G (implicit or explicit - see table below) and output o_t~in~O in the form of atomic actions (selected from a set of atomic possible outputs O), that (in the limit) achieve goal(s) G.
The internals of a controller for the complex, adaptive control of a situated agent is referred to as cognitive architecture.
Any practical, operational controller is embodied, in that it interacts with its environment through interfaces whereby its internal computations are turned into physical actions of some form or other. Input i~in~I enters via measuring devices or sensors, and o_t~in~O exits the controller via effectors.


Key Concepts in Control

Sensor A transducer that changes one type of energy to another type.
Transducer A device that changes one type of energy to another, typically amplifying and/or dampening the energy in the process.
Actuator A physical (or virtual) transduction mechanism that implements an action that a controller has committed to.
Control Connection Predefined causal connection between a measured variable <m>v</m> and a controllable variable vc where v = f(vc).
Mechanical Controller Fuses control mechanism with measurement mechanism via mechanical coupling. Adaptation would require mechanical structure to change. Makes adaptation very difficult to implement.
Digital Controller Separates the stages of measurement, analysis, and control. Makes adaptive control in machines feasible.

Feedback
For a variable v, information of its value at time t1 is transmitted back to the controller through a feedback mechanism as v', where
v'(t) > v(t)
that is, there is a latency in the transmission, which is a function of the speed of transmission (encoding (measurement) time + transmission time + decoding (read-back) time).
Latency A measure for the size of the difference between v and v'.
Jitter The change in Latency over time. Second-order latency.


Early Implementation: Centrifugal Governor

What it is A mechanical system for controlling power of a motor or engine. Centrifugal governors were used to regulate the distance and pressure between millstones in windmills in the 17th century. REF
Why it's important Earliest example of automatic regulation with proportional feedback.
Modern equivalents Servo motors (servos), PID control.
Introduction to PID control.
Curious feature The signal represented and the control (Action) output uses the same mechanical system, fusing the information represented with the control mechanism. This is the least flexible way of implementing control.


Centrifugal Governor

What it is: Mechanical controller invented in the 17th century REF
Model of Watt's Governor
A model of Watt's Governor, used by James Watts in 1788 and used to govern the steam engine of the first trains. As the vertical shaft spins faster, the weights get pulled outwards due to the centrifugal force, lifting up small threaded wheel at bottom, which in turn pulls up a shaft used to open a valve or other device. As that valve opens it reduces the power output of the engine which rotates the vertical shaft, so the shaft spins slower and the balls are lowered, turning the power back up. Thus this mechanism regulates fluctuations in the power source and keeps them around a set target.
Excellent explanation of Watt's Governor (albeit a little slow.)
Watt's Governor is an example of a proportional controller; its control signal is proportional to the difference between the actual and desired value (called “setpoint”). Later implementations added integral of the difference to the setpoint, resulting in what are called PI (proportional integral) control, and to make such a controller stable in light of aggressive corrective policies (which are nice when you want the controller to act quickly to a change in the actual value), a derivative can be calculated and added to the setpoint as well, resulting in the common PID controller design. This is great for controlling a single-valued continuous (or discrete) quantitative system.
REF
Of course, AI systems – and especially systems with general intelligence – must deal with a whole lot more than that!


Generalization of the Centrifugal Governor

Diagram
c Controller.
Plant That which is to be controlled.
Delta Comparison of r and feedback signal.
r Reference value or signal / setpoint.
o Current value / setting of the plant (steam engine, etc.).
Method Uses error signal to correct operation.
SISO Single-input / single-output.
MIMO Multiple-input / multiple-output.



What is an 'agent' ?



Agent = Embodied Controller

“Agent” The term comes from the concept of 'agenthood' or 'agent of change' – the ability of some physical structure to cause (systematic, planned) change.
Controller + Embodiment An (embodied) controller is an agent that consists of at least one sensor, at least one effector, and a controller.
Theory vs. Practice While abstract controllers are good for teaching about control theory, only an embodied controller is good for actual control.
Why it Matters The purpose of intelligence is to 'get stuff done'. To get anything done requires some sort of physical embodiment, whether it is a digital display, a loudspeaker through which the agent barks orders at an audience, or a city-wide network involving fully interconnected telecommunications, heating, traffic routing, power, etc. through which direct control of all can be asserted.


Sense-Control-Act Pipeline

A simple control pipeline consists of at least one sensor, at least one control process of some sort, and at least one end effector. The goal resides in the controller. Based on what the Sensor senses and sends to the Controller, the Controller produces (in some way, e.g. via computations) an action plan (if it's really simple it's a bit counter intuitive to call it a “plan”, but it's technically a plan since it and its desired effect are already known before it has been performed), and sends it to an end-effector (Act) that executes it.
The Controller may keep a copy of what was sent to the end-effector (inner loop a := efferent copy) as well as monitor the effect of what the end-effector does to the outside world.
Modern robotics have sensors on all actuators; for instance, the Ghost MiniTaur 4-legged robot uses a technique called “torque estimation” that allows using its motors as sensors video.


Complexity of Agents

Complexity of AI Agents The complexity of AI agents stems in part from the fact that not only do they control a pipeline, or a set of pipelines, but that the reference signals (setpoints) keep changing depending on circumstances, and that they not only learn how this should happen.
How Can That Be Done? How can an AI architecture learn not only multiple co-dependent control pipelines, but also how they may be relevant in some situations and not others, as well as how their setpoints may change depending on context?
Models Conant & Ashby showed in 1970 that any good controller of a system must harbor a model of that system.
We will address models in the SYMBOLS, MODELS, CAUSATION sprint.
Agent complexity Determined by I X P X O, not just P, i, or o.
Agent action complexity potential Potential for P to control combinatorics of, or change, o, beyond initial i (at “birth”).
Agent input complexity potential Potential for P to structure i in post-processing, and to extend i.
Agent P complexity potential Potential for P to acquire and effectively and efficiently store and access past i (learning); potential for P to change P.
Agent intelligence potential Potential for P to coherently coordinate all of the above to improve its own ability to use its resources, acquire more resources, in light of drives (top-level goals).




Control Architectures



What is an (Intelligent) Architecture?


What it is
In CS: the organization of the software that implements a system.
In AI: The total system that has direct and independent control of the behavior of an Agent via its sensors and effectors.
The controller view in AI means that the architecture is the controller.

Why it's important
The system architecture determines what kind of information processing an agent controller can do, and what the system as a whole is capable of in a particular Task-Environemnt.
A controller view helps us remember that the system exists for a purpose: To get something done.

Key concepts
- Process types
- process initiation
- information storage
- information flow

Relevance in AI
The term “system” not only includes the processing components, the functions these implement, their input and output, and relationships, but also temporal aspects of the system's behavior as a whole.
This is important in AI because any controller of an agent is supposed to control it in such a way that its behavior can be classified as being “intelligent” over time.
So what are the necessary and sufficient components of that behavior set?

Rationality
The “rationality hypothesis” models an intelligent agent as a “rational” agent: An agent that would always do the most “sensible” thing at any point in time.
The problem with the rationality hypothesis is that given insufficient resources, including time, the concept of rationality doesn't hold up, because it assumes you have time to weigh all alternatives (or, if you have limited time, that you can choose to evaluate the most relevant options and choose among those). But since such decisions are always about the future, and we cannot predict the future perfectly, for most decisions that we get a choice in how to proceed there is no such thing as a rational choice.

“Satisficing”
Herbert Simon proposed the concept of 'satisficing' to replace the concept of “pseudo-optimizing” when talking about intelligent action in a complex task-environment: Actions that meet a particular minimum requirement in light of a particular goal 'satisfy' and 'suffice' for the purposes of that goal.
We don't care (and don't have the time) to consider whether an action is “optimal” if it gets the job done in a reasonable way.
Intelligence is in part a systemic phenomenon Thought experiment: Take any system we deem intelligent, e.g. a 10-year old human, and isolate any of his/her skills and features. A machine that implements any single one of these is unlikely to seem worthy of being called “intelligent” (viz chess programs), without further qualification (e.g. “a limited expert in a sub-field”).
“The intelligence resides in the architecture.” - KRTh


CS Software Architectural Concepts & Building Blocks


Pipes & filters
Extension of functions.
Component: Each component has a set of inputs and a set of outputs. A component reads streams of data on its inputs and produces streams of data on its outputs, delivering a complete instance of the result in a standard order.
Pipes: Connectors in a system of such components transmit outputs of one filter to inputs of others.
Object-orientation Abstract compound data types with associated operations.
Event-based invocation Pre-defined event types trigger particular computation sequences in pre-defined ways.
Layered systems System is deliberately separated into layers, a layer being a grouping of one or more sub-functions.
Hierarchical systems System is deliberately organized into a hierarchy, where the position in the hierarchy represents one or more important (key system) parameters.
Blackboards System employs a common data store, accessible by more than a single a sub-process of the system (often all).
Hybrid architectures Take two or more of the above and mix together to suit your tastes.


Reactive ("Feedback") Agent Architecture

Feedback Reacts to measurements. Change happens in light of a received measurement, in which case a control signal v can be produced after perturbations of <m>v</m> happens, so that the output of the plant o can catch up with the change.
What it requires This requires data from sensors.
Signal Behavior When reacting to a time-varying signal v the frequency of change, the possible patterns of change, and the magnitude of change of v; latency and jitter can produce unstoppable fluctuations.
Architecture Largely fixed for the entire lifetime of the agent.
Agent may learn but acts only in reaction to experience (no prediction).
Learning reactive control
Associating reactions to situations.
The Challenge Learning requires repeated direct experimentation. Unless we know beforehand which signals cause perturbations in o are dangerous the controller may destroy itself. In task-domains where the number of available signals is vastly greater than the controller's search resources, it may take an unacceptable time for the controller to find good associations for doing its work.


Reactive Architectures: Levels of Complexity

Super-simple Sensors connected directly to motors, e.g. Braitenberg Vehicles.
Basic Deterministic connections between components with small memory.
Complex Grossly modular architecture (< 30 modules) with multiple relationships at more than one level of control detail (LoC). \\  Examples: Speech-controlled dialogue systems like Siri and Alexa.
Super-complex Large number of modules (> 30) at various sizes, each with multiple relationships to others, at more than one LoC.
Example: Subsumption architecture.


Bottom Line
Most architectures in AI are of this kind - towards the higher end of this spectrum very few exists.
What about complex control systems for power plants, manufacturing plants, etc - don't they contain an equal level of complexity (or greater) than the complex or super-complex end of this spectrum? Answer is: Perhaps so (depending on how it's measured), but bear in mind that none of them learn or adapt (except in some extremely trivial ways that are highly localized and specific).
Relying exclusively on constructionist methodologies, it is difficult to put learning into architectures at the high end of this spectrum, and when it's done it's difficult to keep the side effects securely within safe limits.



Example of Reactive Control: Braitenberg Vehicles

Braitenberg vehicle example control scheme: “love”. Steers towards (and crashes into) that which its sensors sense. Braitenberg vehicle example control scheme: “hate”. Avoids that which it senses.
Braitenberg vehicle example control scheme: “curious”. Changing the behavior of “love” by avoiding crashing into things.
(Thinner wires means weaker signals.)


Braitenberg Vehicles Online Code Example

Another Example of a Reactive Control: Subsumption Architecture



Subsumption control architecture building block.
(Numbers in circles indicate timing
in seconds before the unit
reverts after getting a positive pulse.)


Example subsumption architecture for robot.
subsumption-arch-2.jpg


Problems with Feedback-Only Controllers



Thermostat
A cooling thermostat has a built-in supersimple model of its task-environment, one that is sufficient for it to do its job. It consists of a few variables, an on-off switch, two thresholds, and two simple rules that tie these together; the sensed temperature variable, the upper threshold for when to turn the heater on, and the lower threshold for when to turn the heater off. The thermostat never has to decide which model is appropriate, it is “baked into it“ by the thermostat’s designer. It is not a predictive (forward) model, this is a strict feedback model.
The thermostat cannot change its model, this can only be done by the user opening it and twiddling some thumbscrews.
Limitation Because the system designer knows beforehand which signals cause perturbations in <m>o</m> and can hard-wire these from the get-go in the thermostat, there is no motivation to create a model-creating controller (it is much harder!).
Other “state of the art” systems The same is true for expert systems, subsumption robots, and general game playing machines: their model is to tightly baked into their architecture by the designer. Yes, there are some variables in these that can be changed automatically “after the machine leaves the lab” (without designer intervention), but they are parameters inside a (more or less) already-determined model.
What Can We Do? Feed-forward control !
This requires models.


Predictive ("Feedforward") Agent Architecture

Feedforward Using prediction, the change of a control signal v can be done before perturbations of v happens, so that the output of the plant o stays constant.
What it requires This requires information about the entity controlled in the form of a predictive model, and a second set of signals p that are antecedents of o and can thus be used to predict the behavior of o.
Signal behavior When predicting a time-varying signal v the frequency of change, the possible patterns of change, and the magnitude of change of v are of key importance, as are these factors for the information used to predict its behavior p.
Architecture Largely fixed for the entire lifetime of the agent.
Subsumes the reactive agent architecture, adding that the agent may learn to predict and can use predictions to steer its actions.
Learning predictive control By deploying a learner capable of learning predictive control a more robust behavior can be achieved in the controller, even with low sampling rates.
The Challenge Unless we know beforehand which signals cause perturbations in o and can hard-wire these from the get-go in the controller, the controller must search for these signals. In task-domains where the number of available signals is vastly greater than the controller's search resources, it may take an unacceptable time for the controller to find good predictive variables.


Predictive Control: Levels of Complexity

Super-simple These have fixed topology; mostly hard-wired control and perception. Prediction limited to one or a few hard-wired topics. No learning.
Basic Deterministic connections between components with small memory, where the memory makes learning and prediction possible. Small number of variables.
Example: Google Nest “intelligent” thermostat.
Complex Grossly modular architecture (< 30 modules) with multiple relationships at more than one level of control detail (LoC).
Example: Predictive management for powergrid of a state or nation.
Super-complex Large number of modules (> 30) at various sizes, each with multiple relationships to others, at more than one LoC.
Examples: No obvious ones come to mind.
Bottom Line It is difficult but possible to integrate predictive learning and behavior control into complex agent architectures using constructionist approaches (hand-coding); better methodologies are needed.


Benefits of Combined Feedforward + Feedback Controllers

Ability to Predict With the ability to predict comes the ability to deal with events that happen faster than the perception-action cycle of the controller, as well as the ability to anticipate events far into the future.

Greater Potential to Learn
A machine that is free to create, select, and evaluate models operating on observable and hypothesized variables has potential to learn anything (within the confines of the algorithms it has been given for these operations) because as long as the range of possible models is reasonably broad and general, the topics, tasks, domains, and worlds it could (in theory) handle becomes vastly larger than systems where a particular model is given to the system a priori (I say ‘in theory’ because there are other factors, e.g. the ergodicity of the environment and resource constraints that must be favorable to e.g. the system’s speed of learning).
Greater Potential for Cognitive Growth A system that can build models of its own model creation, selection, and evaluation has the ability to improve its own nature. This is in some sense the ultimate AGI (depending on the original blueprint, original seed, and some other factors of course) and therefore we only need two levels of this, in theory, for a self-evolving potentially omniscient/omnipotent (as far as the universe allows) system.
Bottom Line AGI without both feed-forward and feed-back mechanisms is fairly unthinkable.


Reflective Agent Architecture


Architecture
Architecture changes over the history of the agent. Can demonstrate cognitive growth (cognitive developmental stages).
Subsume features of reactive and predictive architectures, adding introspection (reflection) and some form of (meta-)reasoning (as necessary for managing the introspection).
Super-simple These are above the complexity of super-simple architectures.
Simple These are above the complexity of simple architectures.

Complex
Complexity stems from interaction among parts, many of which are generated by the system at runtime and whose complexity may mirror some parts of the task-environment (if task-environment is complex, and lifetime is long, the resulting control structures are likely to be complex as well).
Examples: NARS, AERA.
Super-Complex Complexity stems from two-level (or more) systems of the complex kind, where a meta-control layer is in charge of changing the lower level (self-rewriting architecture).
Examples: AERA (in theory - not experimentally validated).





Architectures for Learning Controllers



Learning RPR Controllers: What General Machine Intelligence Calls For

RPR Reactive-predictive-reflective
What it is A controller that learns its job and is capable of reactive, predictive and reflective behavior is a learning RPR controller.
Why it's Important Learning complicates the design!
Instead of fixed information for taking action, the controller creates such information autonomously.
To be powerful, goals, subgoals, etc. will have to be generatable by the system autonomously.
Self-Generated Information includes
Goals To execute complex tasks, viable sub-goals must be generated.
Models To predict what will happen, the learner must create models of the task-environment.
Information Any relevant information needs to be stored.
Assessment of Relevance When information is abundant, the controller must figure out what is relevant.
… etc. This list is very long in a full listing.
In Complex Worlds If the world is too complicated for a designer to come up with the principles for the system to invent those on its own from scratch, meta-mechanisms for the learner to autonomously creating such principles may be sought. In the animal kingdom we call such learning “development”.



Goal


What it is
Gtop = [ Gsub-1, Gsub-2, … Gsub-n, G-sub-1, G-sub-2, … G-sub-n ],
i.e. a set of zero or more subgoals, where
G- indicates states to be avoided (i.e. constraints/“negative goals”) and
G = [ s1, s2, … sn, R ], where sn describes a sub-state s ⊂ S of a (subset) of a World and
R are relevant relations between these.
Components of s s = [ v1, v2 … vn, R ]: A set of patterns, expressed as variables with error/precision constraints, that refer to the world.
What we can do with it Define a task: task := goal + timeframe + initial world state
Why it is important Goals are needed for concrete tasks, and tasks are a key part of why we would want AI in the first place. For any complex tasks there will be identifiable sub-goals – talking about these in compressed manners (e.g. using natural language) is important for learning and for monitoring of task progress.
Historically speaking Goals have been with the field of AI from the very beginning, but definitions vary.

What to be aware of
We can assign goals to an AI without the AI having an explicit data structure that we can say matches the goal directly (see e.g. Braitenberg Vehicles - above). These are called implicit goals. We may conjecture that if we want an AI to be able to talk about its goals they will have to be – in some sense – explicit, that is, having a discrete representation in the AI's mind (information structures) that can be manipulated, inspected, compressed / decompressed, and related to other data structures for various purposes, in isolation (without affecting in any unnecessary, unwanted, or unforeseen way, other (irrelevant) information structures).


Inferred GMI Architectural Features


Large architecture
From the above we can readily infer that if we want GMI, an architecture that is considerably more complex than systems being built in most AI labs today is likely unavoidable. In a complex architecture the issue of concurrency of processes must be addressed, a problem that has not yet been sufficiently resolved in present software and hardware. This scaling problem cannot be addressed by the usual “we’ll wait for Moore’s law to catch up” because the issue does not primarily revolve around speed of execution but around the nature of the architectural principles of the system and their runtime operation.
Predictable Robustness in Novel Circumstances The system must have a robustness in light of all kinds of task-environment and embodiment perturbations, otherwise no reliable plans can be made, and thus no reliable execution of tasks can ever be reached, no matter how powerful the learning capacity. This robustness must be predictable a-priori at some level of abstraction – for a wide range of novel circumstances it cannot be a complete surprise that the system “holds up” (halts). (If this were the case then the system itself would not be able to predict its chances of success in face of novel circumstances, thus eliminating an important part of the “G” from its “GMI” label.)

Graceful Degradation
Part of the robustness requirement is that the system be constructed in a way as to minimize potential for catastrophic (and unpredictable) failure. A programmer forgets to delimit a command in a compiled program and the whole application crashes; this kind of brittleness is not an option for cognitive systems operating in partially stochastic environments, where perturbations may come in any form at any time (and perfect prediction is impossible): It cannot not be hyper-sensitive to tiny deviations or details, quite the contrary, it must be super-robust.
Transversal Functions The system must have pan-architectural characteristics that enable it to operate consistently as a whole, to be highly adaptive (yet robust) in its own operation across the board, including metacognitive abilities. Some functions likely to be needed to achieve this include attention, learning, analogy-making capabilities, and self-inspection (reflection).

Transversal Time
Ignoring (general) temporal constraints is not an option if we want AGI. (Move over Turing!) Time is a semantic property, and the system must be able to understand – and be able to learn to understand – time as a real-world phenomenon in relation to its own skills and architectural operation. Time is everywhere, and is different from other resources in that there is a global clock which cannot, for many task-environments, be turned backwards. Energy must also be addressed, but may not be as fundamentally detrimental to ignore as time while we are in the early stages of exploring methods for developing auto-catalytic knowledge acquisition and cognitive growth mechanisms.
Time must be a tightly integrated phenomenon in any GMI architecture in its very design - management and understanding of time cannot be retrofitted into a complex architecture!

Transversal Learning
The system should be able to learn anything and everything, which means learning probably not best located in a particular “module” or “modules” in the architecture.
Learning must be a tightly integrated phenomenon in any GMI architecture, and must be part of the design from the beginning - implementing general learning into an existing architecture is out of the question: Learning cannot be retrofitted to a complex architecture!
Transversal Resource Management Resource management - attention - must be tightly integrated.
Attention must be part of the system design from the beginning - retrofitting resource management into a architecture that didn't include this from the beginning is next to impossible!
Transversal Analogies Analogies must be included in the system design from the beginning - retrofitting the ability to make general analogies between anything and everything is impossible!

Transversal Self-Inspection
Reflectivity, as it is known, is a fundamental property of knowledge representation. The fact that we humans can talk about the stuff that we think about, and can talk about the fact that we talk about the fact that we can talk about it, is a strong implication that reflectivity is a key property of AGI systems.
Reflectivity must be part of the architecture from the beginning - retrofitting this ability into any architecture is virtually impossible!
Transversal Integration A general-purpose system must tightly and finely coordinate a host of skills, including their acquisition, transitions between skills at runtime, how to combine two or more skills, and transfer of learning between them over time at many levels of temporal and topical detail.



Learning RPR Controllers Must Also Classify


Classification
To act, one needs to know what to act on.
To know what to act on, one needs to classify.
To classify, one needs to sense.
To sense, one needs measurement devices.
Learning to Classify In a world with a lot of variation, a learning controller must also learn to classify.
To learn to classify, one must learn what to control to classify appropriately.
Learning to control, therefore, requires learning two kinds of classification (at the very least).
ANNs Contemporary ANNs (e.g. Deep Neural Networks, Double-Deep Q-Learners, etc.) can only do classification. They can only do classification by going through a long continuous training session, after which the learning is turned off.



Example of misguided
use of classification
tesla-classification-fail1.jpg
download video





2025©K.R.Thórisson

/var/www/cadia.ru.is/wiki/data/pages/public/t-720-atai/atai-25/agents_and_control.txt · Last modified: 2025/01/06 18:35 by thorisson

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki