Center for Analysis and Design of Intelligent Agents

T-720-ATAI-2016

Lecture Notes, F-11 16.02.2016

Key Features of Feedforward Predictive Control

Feedforward	Based on correct and accurate predictions, the change of a control signal <m>v</m> can be done before perturbations of <m>v</m> happens, so that the output of the plant <m>o</m> stays constant.
What it requires	This requires information about <m>v</m> in the form of a predictive model – or simply model, and a set of signals that can map this model's state to the current state of <m>v</m>.
Learning predictive control	By deploying a learner capable of learning predictive control over a period of time, a more robust behavior can be achieved in the controller, even if each instance of the application of control is limited by sampling rates lower than the frequency of change in <m>v</m>.

What Are Models?

Model	A model of something is an information structure that behaves in some ways like the thing being modeled. ‘Model’ here actually means exactly the same as the word when used in the vernacular — look up any dictionary defnition and that is what it means. A model of something is not the thing itself, it is in some way a ‘mirror image’ of it, typically with some unimportant details removed, and represented in a way that allows for various manipulations for the purpose of making predictions (answering questions), where the form of allowed manipulations are particular to the representation of the model and the questions to be answered.
Example	A model of Earth sits on a shelf in my daughter’s room. With it I can answer questions about the gross layout of continents, and names assigned to various regions as they were around 1977 (because that’s when I got it for my confirmation ). A model requires a process for using it. In this example that process is humans that can read and manipulate smallish objects.
Computational models	A typical type of question to be answered with computational (mathematical) models are what-if questions, and a typical method of manipulation is running simulations (producing deductions). Along with this we need the appropriate computational machine.
Model (again)	A 'model' in this conception has a target phenomenon that it applies to, and it has a form of representation, comprehensiveness, and level of detail; these are the primary features that determine what a model is good for. A computational model of the world in raw machine-readable form is not very efficient for quickly identifying all the countries adjacent to Switzerland - for that a traditional globe is much better.
Model acquisition	The ability to create models of (observed) phenomena.

Causal Chains

Consider a potential physical causal relationship between 7 variables in a task-environment. The 7 variables might be causally related to each other in various ways, such that a change in one causes changes in others. Partial observation of their behavior may provide insufficient clues to generate a complete and correct model of their relations, but some data is better than no data.
Here we'll say that <m>V_1</m> has a causal connection to <m>V_2</m>, which is causally coupled to <m>V_3</m>. Variables <m>[V_1, V_7]</m> hold particular values over the period of the observed causal relationship between the others – they can be considered constants.

Causal relations between variables <m>[V_1, V_6]</m>. I, II and III, left-hand side: Causes; right-hand side: Effects.
Part I depicts physical causal relationships between variables (A – linear relation; B – logrithmic relation; C – hyperbolic relation). Alternatively, part I may represent theoretical models of physical or hypothetical constructs.
In II, these relationships have been implemented as three modular simulation models, one module per causal factor and one per measured effect. The functions a and b connecting the modules have also been quantized from what they were in I. The left-hand side represents transmitting modules and the right-hand side receiving modules.
In III, two modules are used to represent all causal relationships of I.
In both II and III the modules' internal state represents the state of the causes and effects in I, respectively. Modularization is thus theoretically independent of the theoretical model. However, implementations following either II or III may produce different results due to artifacts in how the simulations are scheduled on a processor (for instance, if the implementation of in Y-III has shortcomings in its scheduling, variable <m>v_2</m> may be updated at a different rate than <m>v_4</m> and <m>v_6</m>, and since the latter two are updated together this may cause a spurious correlation between them).

Learner Observing Causal Chains

A learning agent is situated so as to perceive the effects of the relationships this physical causal network implements. The agent perceives the interaction between the variables for a while, rendering some data about their relations, but not enough to be certain about it, and certainly not enough to create a complete model of it.

Autonomous Model Acquisition

What it is

The ability to create a model of some target phenomenon automatically.

Challenge

Unless we know beforehand which signals cause perturbations in <m>o</m> and can hard-wire these from the get-go in the controller, the controller must search for these signals. In task-domains where the number of available signals is vastly greater than the controller's resources available to do such search, it may take an unacceptable time for the controller to find good predictive variables to create models with.

Model Acquisition Function

The agent has a model generation function <m>P_M</m> implemented in its controller. The role of the function is to take observed chains of events and produce models intended to capture the events' causal relationships.

Model Generation

Based on prior observations, of the variables and their temporal execution in some context, the controller's model generation function <m>P_M</m> may have captured their causal relationship in three alternative models, <m>M_1, M_2, M_3</m>, each slightly but measurably different from the others. Each can be considered a hypothesis of the actual relationship between the included variables, when in the context provided by <m>V_5, V_6</m>.

The agent's model generation mechanisms allow it to produce models of events it sees. Here it creates models (a) <m>M_1</m> and (b) <m>M_2</m>. The usefulness / utility of these models can be tested by performing an operation on the world (c ) as prescribed by the models. (Ideally, when one wants to find on which one is best, the most efficient method is an (energy-preserving) intervention that can only leave one as the winner.)

Experimentation

The agent then reaches out via its senctor (<m>m</m>) and affects the world, in this case variable <m>V_2</m>. The result is perceived, in this case <m>V_3</m> or <m>V_7</m> were not affected (red “X”), but variable <m>V_4</m> was affected according to the predictions of <m>M_2</m> (green V). Model <m>M_1</m> is not involved in this “experiment”.

Environmental Reinforcement

The results of the observed events, in light of the agent's own perturbation, is used by the agent to give a score to the models that may be relevant to the variables in question. At a minimum, only the success and failure of the various models is recorded, but typically a host of new models is generated as a result, and possibly some are erased.

Two Nested Concurrent Loops

An effective and efficient agent has two simultaneous information loops at work at all times. The first (top illustration) is a feed-foward loop, where contextually relevant models are selected for producing predictions based on the current context and immediate goals, and a feedback loop (bottom illustration) wherein the environment presents the results of actions taken, informing the agent's controller whether the predictions were correct or not. Only the latter is classified as reinforcement.

Model Operations

Creating	A model of <m>phi</m> does not spring forward automatically, someone or something must create it.
Retrieving	If you have a large collection of models in a large task-environment or world (i.e. the set of variables on the phenomena you are interacting with is rather large), to use any of the models you must know what model to use with respect to what phenomenon (which means you need to recognize a context and apply the appropriate pattern matching to retrieve the appropriate model(s)). You may even have different models for different kinds of questions.
Usage	Once you select the appropriate model you must set up the computational conditions such that the goal of the model manipulation may be met. This requires reading the current situation, setting the parameters of the model appropriately to match the condition, and then running (forward) simulations to deduce what may happen.
	Example	If you are predicting where a pingpong ball will be 0.4 seconds from now so that you can hit it with your paddle you must a) run a simulation with the appropriate parameter settings, including the speed and direction of the ball, plus amount and direction of its spin if any, b) use the result to program the motor control sequence to get your hand and paddle in the vicinity of the ball at that future predicted state, 0.4 seconds in the future (which now may be 0.3 if it took 100 msecs to do all this), c) initiate the execution of that motor sequence, and d) hope that you met the goal of hitting the ball. The model appropriate for predicting ball direction need not be complex; it has only a few inputs (the 3D direction of the ball’s path so far, plus probably the angle at which your opponent hit the ball), and it need not be a giant “pingpong model” which may include the color of a typical pingpong table, size of typical paddles, etc, it could simply be a model using only the relevant parameters already mentioned. The context of the pingpong (everything else in your surrounding) gives rise to selecting the appropriate model(s) at the appropriate time(s), and make all of it time-dependent in a way that your arm’s motions are not lagging 10 minutes behind your intention to hit the ball, but rather, tend toward being synchronized with the event context in which they are intended to happen in.
Evaluating	All models created for some phenomenon <m>phi</m> must be evaluated on their utility for predicting aspects of <m>phi</m>. This is done via *experience*. Note that experience can also help improve the model creation and evaluation process itself (more on this later).
Erasing	Useless and bad models must be removed.

The result of feedback (reinforcement) may result in the deletion, rewriting, or some other modification of the original model selected for prediction. Here the feedback has resulted in a modified model <m>M{prime}_2</m>.

Requirements for Using Models

Effectiveness	Creation of models must be effective - otherwise a system will spend too much time creating useless or bad models. Making the model creation effective may require e.g. parallelizing the execution of operations on them.
Efficiency	Operations on models listed above must be efficient lest they interfere with the normal operation of the system / agent. One way to achieve temporal efficiency is to parallelize their execution, and make them simple.
Scalability	For any moderately interesting / complex environment, a vast number of models may be entertained and considered at any point in time, and thus a large set of potential models must be manipulatable by the system / agent.

Problems with Feedback-Only Controllers

Thermostat	A cooling thermostat has a built-in supersimple model of its task-environment, one that is sufficient for it to do its job. It consists of a few variables, an on-off switch, two thresholds, and two simple rules that tie these together; the sensed temperature variable, the upper threshold for when to turn the heater on, and the lower threshold for when to turn the heater off. The thermostat never has to decide which model is appropriate, it is “baked into” it by the thermostat’s designer. It is not a predictive (forward) model, this is a strict feedback model. The thermostat cannot change its model, this can only be done by the user opening it and twiddling some thumbscrews.
Limitation	Because the system designer knows beforehand which signals cause perturbations in <m>o</m> and can hard-wire these from the get-go in the thermostat, there is no motivation to create a model-creating controller (it is much harder).
Other “state of the art” systems	The same is true for expert systems, subsumption robots, and general game playing machines: their model is to tightly baked into their architecture by the designer. Yes, there are some variables in thesethat can be changed automatically “after the machine leaves the lab” (without designer intervention), but they are parameters inside a (more or less) already-determined model.

Benefits of Combined Feed-forward + Feedback Controllers

Greater Potential to Learn

A machine that is free to create, select, and evaluate models, on the other hand, has potential to learn anything (within the confines of the algorithms it has been given for these operations) because as long as the range of possible models is reasonably broad and general, the topics, tasks, domains, and worlds it could (in theory) operate in becomes vastly larger than systems where a particular model is given to the system a priori (I say ‘in theory’ because there are other factors, e.g. the ergodicity of the environment and resource constraints that must be favorable to e.g. the system’s speed of learning).

Greater Potential for Cognitive Growth

A system that can build models of its own model creation, selection, and evaluation has the ability to improve its own nature. This is in some sense the ultimate AGI (depending on the original blueprint, original seed, and some other factors of course) and therefore we only need two levels of this, in theory, for a self-evolving potentially omniscient/omnipotent (as far as the universe allows) system.

EOF

Table of Contents