|Feedforward||Based on correct and accurate predictions, the change of a control signal can be done before perturbations of happens, so that the output of the plant stays constant.|
|What it requires||This requires information about in the form of a predictive model – or simply model, and a set of signals that can map this model's state to the current state of .|
|Learning predictive control||By deploying a learner capable of learning predictive control over a period of time, a more robust behavior can be achieved in the controller, even if each instance of the application of control is limited by sampling rates lower than the frequency of change in .|
|Model|| A model of something is an information structure that behaves in some ways like the thing being modeled.
‘Model’ here actually means exactly the same as the word when used in the vernacular — look up any dictionary defnition and that is what it means. A model of something is not the thing itself, it is in some way a ‘mirror image’ of it, typically with some unimportant details removed, and represented in a way that allows for various manipulations for the purpose of making predictions (answering questions), where the form of allowed manipulations are particular to the representation of the model and the questions to be answered.
|Example||A model of Earth sits on a shelf in my daughter’s room. With it I can answer questions about the gross layout of continents, and names assigned to various regions as they were around 1977 (because that’s when I got it for my confirmation ). A model requires a process for using it. In this example that process is humans that can read and manipulate smallish objects.|
|Computational models||A typical type of question to be answered with computational (mathematical) models are what-if questions, and a typical method of manipulation is running simulations (producing deductions). Along with this we need the appropriate computational machine.|
|Model (again)||A 'model' in this conception has a target phenomenon that it applies to, and it has a form of representation, comprehensiveness, and level of detail; these are the primary features that determine what a model is good for. A computational model of the world in raw machine-readable form is not very efficient for quickly identifying all the countries adjacent to Switzerland - for that a traditional globe is much better.|
|Model acquisition||The ability to create models of (observed) phenomena.|
| Consider a potential physical causal relationship between 7 variables in a task-environment. The 7 variables might be causally related to each other in various ways, such that a change in one causes changes in others. Partial observation of their behavior may provide insufficient clues to generate a complete and correct model of their relations, but some data is better than no data.
Here we'll say that has a causal connection to , which is causally coupled to . Variables hold particular values over the period of the observed causal relationship between the others – they can be considered constants.
| Causal relations between variables . I, II and III, left-hand side: Causes; right-hand side: Effects.
Part I depicts physical causal relationships between variables (A – linear relation; B – logrithmic relation; C – hyperbolic relation). Alternatively, part I may represent theoretical models of physical or hypothetical constructs.
In II, these relationships have been implemented as three modular simulation models, one module per causal factor and one per measured effect. The functions a and b connecting the modules have also been quantized from what they were in I. The left-hand side represents transmitting modules and the right-hand side receiving modules.
In III, two modules are used to represent all causal relationships of I.
In both II and III the modules' internal state represents the state of the causes and effects in I, respectively. Modularization is thus theoretically independent of the theoretical model. However, implementations following either II or III may produce different results due to artifacts in how the simulations are scheduled on a processor (for instance, if the implementation of in Y-III has shortcomings in its scheduling, variable may be updated at a different rate than and , and since the latter two are updated together this may cause a spurious correlation between them).
|A learning agent is situated so as to perceive the effects of the relationships this physical causal network implements. The agent perceives the interaction between the variables for a while, rendering some data about their relations, but not enough to be certain about it, and certainly not enough to create a complete model of it.|
|What it is||The ability to create a model of some target phenomenon automatically.|
|Challenge||Unless we know beforehand which signals cause perturbations in and can hard-wire these from the get-go in the controller, the controller must search for these signals. In task-domains where the number of available signals is vastly greater than the controller's resources available to do such search, it may take an unacceptable time for the controller to find good predictive variables to create models with.|
|The agent has a model generation function implemented in its controller. The role of the function is to take observed chains of events and produce models intended to capture the events' causal relationships.|
|Based on prior observations, of the variables and their temporal execution in some context, the controller's model generation function may have captured their causal relationship in three alternative models, , each slightly but measurably different from the others. Each can be considered a hypothesis of the actual relationship between the included variables, when in the context provided by .|
|The agent's model generation mechanisms allow it to produce models of events it sees. Here it creates models (a) and (b) . The usefulness / utility of these models can be tested by performing an operation on the world (c ) as prescribed by the models. (Ideally, when one wants to find on which one is best, the most efficient method is an (energy-preserving) intervention that can only leave one as the winner.)|
|The agent then reaches out via its senctor () and affects the world, in this case variable . The result is perceived, in this case or were not affected (red “X”), but variable was affected according to the predictions of (green V). Model is not involved in this “experiment”.|
|The results of the observed events, in light of the agent's own perturbation, is used by the agent to give a score to the models that may be relevant to the variables in question. At a minimum, only the success and failure of the various models is recorded, but typically a host of new models is generated as a result, and possibly some are erased.|
|An effective and efficient agent has two simultaneous information loops at work at all times. The first (top illustration) is a feed-foward loop, where contextually relevant models are selected for producing predictions based on the current context and immediate goals, and a feedback loop (bottom illustration) wherein the environment presents the results of actions taken, informing the agent's controller whether the predictions were correct or not. Only the latter is classified as reinforcement.|
|Creating||A model of does not spring forward automatically, someone or something must create it.|
|Retrieving||If you have a large collection of models in a large task-environment or world (i.e. the set of variables on the phenomena you are interacting with is rather large), to use any of the models you must know what model to use with respect to what phenomenon (which means you need to recognize a context and apply the appropriate pattern matching to retrieve the appropriate model(s)). You may even have different models for different kinds of questions.|
|Usage||Once you select the appropriate model you must set up the computational conditions such that the goal of the model manipulation may be met. This requires reading the current situation, setting the parameters of the model appropriately to match the condition, and then running (forward) simulations to deduce what may happen.|
|Example|| If you are predicting where a pingpong ball will be 0.4 seconds from now so that you can hit it with your paddle you must
a) run a simulation with the appropriate parameter settings, including the speed and direction of the ball, plus amount and direction of its spin if any,
b) use the result to program the motor control sequence to get your hand and paddle in the vicinity of the ball at that future predicted state, 0.4 seconds in the future (which now may be 0.3 if it took 100 msecs to do all this),
c) initiate the execution of that motor sequence, and
d) hope that you met the goal of hitting the ball.
The model appropriate for predicting ball direction need not be complex; it has only a few inputs (the 3D direction of the ball’s path so far, plus probably the angle at which your opponent hit the ball), and it need not be a giant “pingpong model” which may include the color of a typical pingpong table, size of typical paddles, etc, it could simply be a model using only the relevant parameters already mentioned. The context of the pingpong (everything else in your surrounding) gives rise to selecting the appropriate model(s) at the appropriate time(s), and make all of it time-dependent in a way that your arm’s motions are not lagging 10 minutes behind your intention to hit the ball, but rather, tend toward being synchronized with the event context in which they are intended to happen in.
|Evaluating||All models created for some phenomenon must be evaluated on their utility for predicting aspects of . This is done via experience. Note that experience can also help improve the model creation and evaluation process itself (more on this later).|
|Erasing||Useless and bad models must be removed.|
|The result of feedback (reinforcement) may result in the deletion, rewriting, or some other modification of the original model selected for prediction. Here the feedback has resulted in a modified model .|
|Effectiveness||Creation of models must be effective - otherwise a system will spend too much time creating useless or bad models. Making the model creation effective may require e.g. parallelizing the execution of operations on them.|
|Efficiency||Operations on models listed above must be efficient lest they interfere with the normal operation of the system / agent. One way to achieve temporal efficiency is to parallelize their execution, and make them simple.|
|Scalability||For any moderately interesting / complex environment, a vast number of models may be entertained and considered at any point in time, and thus a large set of potential models must be manipulatable by the system / agent.|
|Thermostat||A cooling thermostat has a built-in supersimple model of its task-environment, one that is sufficient for it to do its job. It consists of a few variables, an on-off switch, two thresholds, and two simple rules that tie these together; the sensed temperature variable, the upper threshold for when to turn the heater on, and the lower threshold for when to turn the heater off. The thermostat never has to decide which model is appropriate, it is “baked into” it by the thermostat’s designer. It is not a predictive (forward) model, this is a strict feedback model. The thermostat cannot change its model, this can only be done by the user opening it and twiddling some thumbscrews.|
|Limitation||Because the system designer knows beforehand which signals cause perturbations in and can hard-wire these from the get-go in the thermostat, there is no motivation to create a model-creating controller (it is much harder).|
|Other “state of the art” systems||The same is true for expert systems, subsumption robots, and general game playing machines: their model is to tightly baked into their architecture by the designer. Yes, there are some variables in thesethat can be changed automatically “after the machine leaves the lab” (without designer intervention), but they are parameters inside a (more or less) already-determined model.|
|Greater Potential to Learn||A machine that is free to create, select, and evaluate models, on the other hand, has potential to learn anything (within the confines of the algorithms it has been given for these operations) because as long as the range of possible models is reasonably broad and general, the topics, tasks, domains, and worlds it could (in theory) operate in becomes vastly larger than systems where a particular model is given to the system a priori (I say ‘in theory’ because there are other factors, e.g. the ergodicity of the environment and resource constraints that must be favorable to e.g. the system’s speed of learning).|
|Greater Potential for Cognitive Growth||A system that can build models of its own model creation, selection, and evaluation has the ability to improve its own nature. This is in some sense the ultimate AGI (depending on the original blueprint, original seed, and some other factors of course) and therefore we only need two levels of this, in theory, for a self-evolving potentially omniscient/omnipotent (as far as the universe allows) system.|
2016©K. R. Thórisson