Center for Analysis and Design of Intelligent Agents

T-720-ATAI-2019

Lecture Notes: AERA

High-Level View of AERA

AERA	The Auto-Catalytic Endogenous Reflective Architecture is an AGI-aspiring self-programming system that combines feedback and feed-forward control in a model-based and model-driven system that is programmed with a seed.

High-level view of the three main functions at work in a running AERA system and their interaction with its knowledge store.
Models	All models are stored in a central memory, and the three processes of planning, attention (resource management) and learning happen as a result of programs that operate on models by matching, activating, and scoring them. Models that predict correctly – not just “what happens next?” but also “what will happen if I do X?” – get a success point. Every time a model 'fires' like that it gets counted, so the ratio of success over counts gives you the “goodness” of a model. Models that have the lowest scores are deleted, models with a good score that suddenly fail result in the generation of new versions of itself (think of it as hypotheses for why it failed this time), and this process over time increases the quality and utility of the knowledge of the controller, in other words it learns.
Attention	Attention is nothing more than resource management, in the case of cognitive controllers it typically involves management of knowledge, time, energy, and computing power. Attention in AERA is the set of functions that decides how the controller uses its compute time, how long it “mulls things over”, and how far into the future it allows itself to “think”. It also involves which models the system works with at any point in time, how much it explores models outside of the obvious candidate set at any point in time.
Planning	Planning is the set of operations involved with looking at alternative ways of proceeding, based on predictions into the future and the quality of the solutions found so far, at any point in time. The plans produced by AERA are of a mixed opportunistic (short time horizon)/firm commitment (long time horizon) kind, and their stability (subject to change drastically over their course) depend solely on the dependability of the models involved – i.e. how well the models represent what is actually going on in the world (including the controllers “mind”).
Learning	Learning happens as a result of the accumulation of models; as they increasingly describe “reality” better (i.e. their target phenomenon) they get better for planning and attention, which in turn improves the learning.
Memory	AREA's “global knowledge base” is in some ways similar to the idea of blackboards: AERA stores all its knowledge in a “global workspace” or memory. Unlike (Selfridge's idea of) blackboards, the blackboard contains executive functions that manage the knowledge dynamically, in addition to “the experts”, which in AERA's case are very tiny and better thought of as “models with codelet helpers”.
Pervasive Use of Codelets	A codelet is a piece of code that is smaller than a typical self-contained program, typically a few lines long, and can only be executed in particular contexts. Programs are constructed on the fly by the operation of the whole system selecting which codelets to run when, based on the knowledge of the system, the active goals, and the state it finds itself in at any point in time.
No “Modules”	Note that the diagram above may imply the false impression that AERA consists of these four software “modules”, or “classes”, or the like. Nothing could be further from the truth: All of AERA's mechanism above are a set of functions that are “welded in with” the operation of the whole system, distributed in a myriad of mechanisms and actions. Does this mean that AERA is spaghetti code, or a mess of a design? On the contrary, the integration and overlap of various mechanisms to achieve the high-level functions depicted in the diagram are surprisingly clean, simple, and coherent in their implementation and operation. This does not mean, however, that AERA is easy to understand – mainly because it uses concepts and implements mechanisms and relies on concepts that are very different from most traditional software systems commonly recognized in computer science.

General Form of AERA Models

Models in AERA have a left-hand-side (LHS) and a right-hand-side (RHS). Read from left-to-right they state that “if you see what is in the LHS then I predict what you see in the RHS”. When read right-to-left they say “If you want what is on the LHS try getting what is on the LHS first”. The latter is a way to produce sub-goals via abduction; the former is a way to predict the future via deduction.

This model, called Model_M, predicts that if you see variables 6 and 7 you will see variable 4 some time later (AERA models refer to specific times - the model is somewhat simplified here for convenience). Read from right to left (backward chaining - BWD) it states that if you want variable-4 you should try to obtain variables 6 and 7.

We call such models “bi-directional causal-relational models” because they can be read in either direction and they model the relations (including causal relations) between variables. Note that models can reference other models on either side and can include patterns on either side. In case the values of variables on either side matter for the other side we use functions that belong to the model that compute these values. And due to the bi-directionality we must have bi-directional functions for this purpose as well.
(For instance, if you want to open the door you must push down the handle first, then pull the door towards you; if you pull the door towards you with the handle pushed down then the door will open. The amount of pulling will determine the amount the door is ajar - this can be computed via a function relating the LHS to the RHS.)

Autonomous Model Acquisition

What it is

The ability to create a model of some target phenomenon automatically.

Challenge

Unless we (the designers of an intelligent controller) know beforehand which signals from the controller cause desired perturbations in <m>o</m> and can hard-wire these from the get-go, the controller must find these signals.
In task-domains where the number of available signals is vastly greater than the controller's resources available to do such search, it may take an unacceptable time for the controller to find good predictive variables to create models with.
<m>V_te » V_mem</m>, where the former is the total number of potentially observable and manipulatable variables in the task-environment and the latter is the number of variables that the agent can hold in its memory at any point in time.

Model Acquisition Function


The agent has a model generation function <m>P_M</m> implemented in its controller. The role of the function is to take observed chains of events and produce models intended to capture the events' causal relationships.

A learning agent is situated so as to perceive the effects of the relationships between variables. The agent observes the interaction between the variables for a while, rendering some data about their relations (but not enough to be certain about it, and certainly not enough to create a complete model of it). This generates hypotheses about the relation between variables, in the form of candidate relational models of the observed events.

Model Generation & Evaluation

Based on prior observations, of the variables and their temporal execution in some context, the controller's model generation process <m>P_M</m> may have captured their causal relationship in three alternative models, <m>M_1, M_2, M_3</m>, each slightly but measurably different from the others. Each can be considered a hypothesis of the actual relationship between the referenced variables, when in the context provided by <m>V_5, V_6</m>.
As an example, we could have a tennis ball's direction <m>V_1</m>, speed <m>V_2</m>, and shape <m>V_3</m> that changes when it hits a wall <m>V_5</m>, according to its relative angle <m>V_6</m> to the wall.

The agent's model generation mechanisms allow it to produce models of events it sees. Here it creates models (a) <m>M_1</m> and (b) <m>M_2</m>. The usefulness of these models for particular situations and goals can be tested by performing an operation on the world (c ) as prescribed by the models, through backward chaining (abduction).
Ideally, when one wants to find on which model is best for a particular situation (goals+environment+state), the most efficient method is an (energy-preserving) intervention that can only leave one as the winner.

The feedback (reinforcement) resulting from direct or indirect tests of a model may result in its deletion, rewriting, or some other modification. Here the feedback has resulted in a modified model <m>M{prime}_2</m>.

Demo Of AERA In Action

Demos	The most complex demo of an AERA system was the S1 agent learning to do an interview (in the EU-funded HUMANOBS research project). Main HUMANOBS page
TV Interview	In the style of a TV interview, the agent S1 watched two humans engaged in a “TV-style” interview about the recycling of six everyday objects made out of various materials.
Data	S1 received realtime timestamped data from the 3D movement of the humans (digitized via appropriate tracking methods at 20 Hz), words generated by a speech recognizer, and prosody (fundamental pitch of voice at 60 Hz, along with timestamped starts and stops).
Seed	The seed consisted of a handful of top-level goals for each agent in the interview (interviewer and interviewee), and a small knowledge base about entities in the scene.
What Was Given	* actions: grab, release, point-at, look-at (defined as event types constrained by geometric relationships) * stopping the interview clock ends the session * objects: glass-bottle, plastic-bottle, cardboard-box, wodden-cube, newspaper, wooden-cube * objects have properties (e.g. made-of) * interviewee-role * interviewer-role * Model for interviewer * top-level goal of interviewer: prompt interviewee to communicate * in interruption case: an imposed interview duration time limit * Models for interviewee * top-level goal of interviewee: to communicate * never communicate unless prompted * communicate about properties of objects being asked about, for as long as there still are properties available * don’t communicate about properties that have already been mentioned
What Had To Be Learned	GENERAL INTERVIEW PRINCIPLES * word order in sentences (with no a-priori grammar) * disambiguation via co-verbal deictic references * role of interviewer and interviewee * interview involves serialization of joint actions (a series of Qs and As by each participant) MULTIMODAL COORDINATION & JOINT ACTION * take turns speaking * co-verbal deictic reference * manipulation as deictic reference * looking as deictic reference * pointing as deictic reference INTERVIEWER * to ask a series of questions, not repeating questions about objects already addressed * “thank you” stops the interview clock * interruption condition: using “hold on, let’s go to the next question” can be used to keep interview within time limits INTERVIEWEE * what to answer based on what is asked * an object property is not spoken of if it is not asked for * a silence from the interviewer means “go on” * a nod from the interviewer means “go on”
Result	After having observed two humans interact in a simulated TV interview for some time, the AERA agent S1 takes the role of interviewee, continuing the interview in precisely the same fasion as before, answering the questions of the human interviewer (see videos HH.no_interrupt.mp4 and HH.no_interrupt.mp4 for the human-human interaction that S1 observed; see HM.no_interrupt_mp4 and HM_interrupt_mp4 for other examples of the skills that S1 has acquired by observation). In the “interrupt” scenario S1 has learned to use interruption as a method to keep the interview from going over a pre-defined time limit. The results are recorded in a set of three videos: Human-human interaction (what S1 observes) Human-S1 interaction (S1 interviewing a human) S1-Human Interaction (S1 being interviewed by a human)

Table of Contents