User Tools

Site Tools


public:t-720-atai:atai-19:lecture_notes_w9

This is an old revision of the document!


T-720-ATAI-2019 Main
Links to Lecture Notes

T-720-ATAI-2019

Lecture Notes, W9: Autonomy, AERA, NARS





Autonomy

Autonomy Implies that the machine “does it alone”.
Predictability Predictability is a desired feature of any useful AI.
An autonomous machine that is not predictable has severely limited utility.
Reliability Reliability is another desired feature of any useful AI.
An autonomous machine with low reliability has severely compromised utility.
Explainability Explainability is a third desired feature of any useful AI.
An autonomous machine whose actions cannot be explained also cannot be predicted.



Autonomous-X

What It Is Autonomy is a key feature of intelligence - the ability of a system to “act on its own”.
Autonomous-X is anything that “autonomy” is relevant for or applies to in a system's operation.
Why It Is Important This table exists to highlight some really key features of autonomy that any human-level intelligence probably must have. We say “probably” because, since we don't have any yet, and because there is no proper theory of intelligence, we cannot be sure.
Take Action Any autonomy requires some primitive actions - some action or inaction must be an option for an autonomous system, otherwise no other features are relevant. Robots are a typical example of what is meant by “taking action”: an arm or a hand moves, or stands still, as a result of a computation - a decision made by a controller whose autonomy we are about to inspect.
Learning We already have machines that learn autonomously, although most of the available methods are limited in that they (a) rely heavily on quality selection of learning material/environments, (b) require careful setup of training, and (c ) careful and detailed specifications of how progress is evaluated.
Selection of Variables Very few if any existing learning methods can decide for themselves whether, from a set of variables with potential relevance for its learning, any one of them (a) is relevant, (b) and if so how much and (c ) in what way it is relevant.
Goal-Generation Very few if any existing learning methods can generate their own (sub-) goals. Of those that might be said to be able to, none can do so freely for any topic or domain.
Control of Resources By “resources” we mean computing power (think time), time, and energy, at the very least. Few if any existing learning methods are any good at (a) controlling their resource use, (b) planning for it, (c ) assessing it, or (d) explaining it.
Self-Inspection Virtually no systems exist as of yet that has been demonstrated to be able to inspect (measure, quantify, compare, track, make use of) their own development for use in its continued growth - whether learning, goal-generation, selection of variables, resource usage, or other self-X.
Self-Growth No System as of yet has been demonstrated to be able to autonomously manage its own self-growth. Self-Growth is necessary for autonomous learning in task-environments with complexities far higher than the controller operating in it. It is even more important where certain bootstrapping thresholds are necessary before safe transition into more powerful/different learning schemes.
For instance, if only a few bits of knowledge can be programmed into a controller's seed (“DNA”), because we want it to have maximal flexibility in what it can learn, then we want to put something there that is essential to protect the controller while it develops more sophisticated learning. An example is that nature programmed human babies with an innate fear of heights.



Autonomy & Closure

Autonomy The ability to do tasks without interference / help from others in a particular task-environment in a particular world.
Cognitive Autonomy Refers to the mental (control-) independence of agents - the more independent they are (of their designers, of outside aid, etc.) the more autonomous they are. Systems without it could hardly be considered to have general intelligence.
Structural Autonomy Refers to the process through which cognitive autonomy is achieved: Motivations, goals and behaviors as dynamically and continuously (re)constructed by the machine as a result of changes in its internal structure.
Operational closure The system's own operations is all that is required to maintain (and improve) the system itself.
Semantic closure The system's own operations and experience produces/defines the meaning of its constituents. Meaning can thus be seen as being defined/given by the operation of the system as a whole: the actions it has taken, is taking, could be taking, and has thought about (simulated) taking, both cognitive actions and external actions in its physical domain. For instance, the meaning of punching your best friend are the implications - actual and potential - that this action has/may have, and its impact on your own cognition.
Self-Programming in Autonomy The global process that animates computational structurally autonomous systems, i.e. the implementation of both the operational and semantic closures.
System evolution A controlled and planned reflective process; a global and never-terminating process of architectural synthesis.



Approaches Compared on their Potential for Autonomy

“Autonomy comparison framework focusing on mental capabilities. Embodiment is not part of the present framework, but is included here for contextual completeness.” From Thorisson & Helgason 2012 source



Self-Programming

What it is Self-programming here means, with respect to some virtual machine <m>M</m>, the production of one or more programs created by <m>M</m> itself, whose principles for creation were provided to <m>M</m> at design time, but whose details were decided by <m>M</m> at runtime based on its experience.
Self-Generated Program Determined by some factors in the interaction between the system and its environment.
Historical note Concept of self-programming is old (J. von Neumann one of the first to talk about self-replication in machines). However, few if any proposals for how to achieve this has been fielded. Von Neumann's universal constructor on Wikipedia
No guarantee The fact that a system has the ability to program itself is not a guarantee that it is in a better position than a traditional system. In fact, it is in a worse situation because in this case there are more ways in which its performance can go wrong.
Why we need it The inherent limitations of hand-coding methods make traditional manual programming approaches unlikely to reach a level of a human-grade generally intelligent system, simply because to be able to adapt to a wide range of tasks, situations, and domains, a system must be able to modify itself in more fundamental ways than a traditional software system is capable of.
Remedy Sufficiently powerful principles are needed to insure against the system going rogue.
The Self of a machine C1: The processes that act on the world and the self (via senctors) evaluate the structure and execution of code in the system and, respectively, synthesize new code.
C2: The models that describe the processes in C1, entities and phenomena in the world – including the self in the world – and processes in the self. Goals contextualize models and they also belong to C2.
C3: The states of the self and of the world – past, present and anticipated – including the inputs/outputs of the machine.
Bootstrap code A.k.a. the “seed”. Bootstrap code may consist of ontologies, states, models, internal drives, exemplary behaviors and programming skills.



Programming for Self-Programming

Can we use LISP? Any language with similar features as LISP (e.g. Haskel, Prolog, etc.), i.e. the ability to inspect itself, turn data into code and code into data, should in theory be capable of sustaining a self-programming machine. (That is because no theory of intelligence exists that takes time pressure properly into their account of intelligence.)
Theory vs. practice “In theory” is most of the time not good enough if we want to see something soon (as in the next decade or two), and this is the case here too; what is good for a human programmer is not so good for a system having to synthesize its own code in real-time - in a way that makes its behavior temporally predictable.
Why is that important? Because the world presents deadlines, and if the controller is not capable of temporally predictable behavior deadlines cannot be dealt with properly by that controller.
Why? Building a machine that can write (sensible, meaningful!) programs means that that machine is smart enough to understand (to a pragmatically meaningful level) the code it produces. If the purpose of its programming is to become smart, and the programming language we give to it assumes it's smart already, we have defeated the purpose of creating the self-programming machine in the first place.
What can we do? We must create a programming language with simple enough semantics so that a simple machine (perhaps with some clever emergent properties) can use it to bootstrap itself in learning to write programs.
Does such a language exist? Yes. It's called Replicode.



Levels of Self-Programming

Level 1 Level one self-programming capability is the ability of a system to make programs that exclusively make use of its primitive actions from action set.
Level 2 Level two self-programming systems can do Level 1, and additionally generate new primitives.
Level 3 Level three self-programming adds the ability to change the principles by which Level one and Level two operate, in other words, Level three self-programming systems are capable of what we would here call meta-programming. This would involve changing or replacing some or all of the programs provided to the system at design time. Of course, the generations of primitives and the changes of principles are also controlled by some programs.
Infinite regress? Though the process of self-programming can be carried out in more than one level, eventually the regress will stop at a certain level. The more levels are involved, the more flexible the system will be, though at the same time it will be less stable and more complicated to be analyzed.
Likely to be many ways? For AGI the set of relevant self-programming approaches is likely to be a much smaller set than that typically discussed in computer science, and in all likelihood much smaller than often implied in AGI.
Architecture The possible solutions for effective and efficient self-programming are likely to be strongly linked to what we generally think of as the architectural structure of AI systems, since self-programming for AGI may fundamentally have to change, modify, or partly duplicate, some aspect of the architecture of the system, for the purpose of being better equipped to perform some task or set of tasks.



Existing Systems Which Target Self-Programming

Label What Example Description
[S] State-space search GPS (Newell et al. 1963) The atomic actions are state-changing operators, and a program is represented as a path from the initial state to a final state. Variants of this approach include program search (examples: Gödel Machine (Schmidhuber 2006)): Given the action set A, in principle all programs formed by it can be exhaustively listed and evaluated to find an optimal one according to certain criteria.
[P] Production system SOAR (Laird 1987) Each production rule specifies the condition for a sequence of actions that correspond to a program. Mechanisms that produce new production rules, such as chunking, can be considered self-programming.
[R] Reinforcement learning AIXI (Hutter 2007) When an action of an agent changes the state of the environment, and each state has a reward value associated, a program corresponds to a policy in reinforcement learning. When the state transition function is probabilistic, this becomes a Markov decision process.
[G] Genetic programming Koza’s Invention Machine (Koza et al. 2000) A program is formed from the system’s actions, initially randomly but subsequently via genetic operators over the best performers from prior solutions, possibly by using the output of some actions as input of some other actions. An evolution process provides a utility function that is used to select the best programs, and the process is repeated.
[I] Inductive logic programming Muggleton 1994 A program is a statement with a procedural interpretation, which can be learned from given positive and negative examples, plus background knowledge.
[E] Evidential reasoning NARS (Wang 2006) A program is a statement with a procedural interpretation, and it can be learned using multi-strategy (ampliative) uncertain reasoning.
[A] Autocatalytic model-driven bi-directional search AERA (Nivel et al. 2014)
&
Ikon Flux (Nivel 2007)
In this context the architecture is in large part comprised of a large collection of models, acting as hierarchically organized controllers, executed through a contextually-informed, continuous auto-catalytic process. New models are produced automatically, based on experience, their quality evaluated in light of this experience, and improvements produced as a result. Self-programming occurs at two levels: The lower one is concerned with performance in a set of domains, making models of how best to achieve goals in the external world at any point in time, the higher level is concerned with the operation of the lower one, implementing integrated cognitive control and meta-learning capabilities. Semantically closed auto-catalytic processes maintain the system’s growth after they are deployed.
source Thórisson & Helgason 2012



Design Assumptions in The Above Approaches

How does the system represent a basic action? a) As an operator that transforms a state to another state, either deterministically or probably, and goal as state to be reached [R, S]
b) As a function that maps some input arguments to some output arguments [G]
c) As a realizable statement with preconditions and consequences [A, E, I, P]
Relevant assumptions:
Is the knowledge about an action complete and certain?
Is the action set discrete and finite?
Can a program be used as an “action” in other programs? a) Yes, programs can be built recursively [A, E, G, I]
b) No, a program can only contain basic actions [R, S, P]
Relevant assumptions:
Do the programs and actions form a hierarchy?
Can these recursions have closed loops?
How does the system represent goals? a) As states to be reached [S]
b) As values to be optimized [G, R]
c) As statements to be realized [E, P, A]
d) As functions to be approximated [I]
Relevant assumptions:
Is the knowledge about goals complete?
Is the knowledge about goals certain?
Can all the goals be reached with a concrete action set?
Are there derived goals? a) Yes, and they are logically dependent to the original goals [I, S, P]
b) Yes, and they may become logically independent to the original goals [A, E]
c) No, all goals are given or innate [G, R]
Relevant assumptions:
Are the goals constant or variable?
Are the goals externally imposed or internally generated?
Can the system learn new knowledge about actions and goals? a) Yes, and the learning process normally converges [G, I, R]
b) Yes, and the learning process may not converge [A, E, P]
c) No, all the knowledge are given or innate [S]
Relevant assumptions:
Are the goals constant or variable?
Are the actions constant or variable?
What is the extent of resources demanded? a) Unlimited time and/or space [I, R, S, P]
b) Limited time and space [A, E, G]
Relevant assumption: Are the resources used an attribute of the problem, or of the solution?
When is the quality of a program evaluated? a) After execution, according to its actual contribution [G]
b) Before execution, according to its definition or historical record [I, S, P]
c) Both of the above [A, E, R]
Relevant assumption:
Are adaptation and prediction necessary?
source Thórisson et al. 2012



Integrated Cognitive Control

What it is The ability of a controller / cognitive system to steer its own structural development - architectural growth (cognitive growth). The (sub-) system responsible for meta-learning.
Cognitive Growth The structural change resulting from learning in a structurally autonomous cognitive system - the target of which is self-improvement.



Cognitive Growth

What it is Changes in the cognitive controller (the core “thinking” part) over and beyond basic learning: After a growth burst of this kind the controller can learn differently/better/new things, especially new categories of things.
Human example Piaget's Stages of Development (youtube video)



Predictability

What It Is The ability of an outsider to predict the behavior of a controller based on some information.
Why It Is Important Predicting the behavior of (semi-) autonomous machines is important if we want to ensure their safe operation, or be sure that they do what we want them to do.
How To Do It Predicting the future behavior of ANNs (of any kind) is easier if we switch off their learning after they have been trained, because there exists no method for predicting where their development will lead them if they continue to learn after the leave the lab. Predicting ANN behavior on novel input can be done statistically, but there is no way to be sure that novel input will not completely reverse their behavior. There are very few if any methods for giving ANNs the ability to judge the “novelty” of any input, which might to some extent possibly help with this issue. Reinforcement learning addresses this by only scaling to a handful of variables with known max and min.



Reliability

What It Is The ability of a machine to always return the same - or similar - answer to the same input.
Why It Is Important Simple machine learning algorithms are very good in this respect, delivering high reliability. Human-level AI, on the other hand, may have the same limitations as humans in this respect, i.e. not being able to give any guarantees.
Human-Level AI To make human-level AI reliable is important because a human-level AI without reliability cannot be trusted, and hence would defeat most of the purpose for creating it in the first place. AERA proposes a method for this - through continuous pee-wee model generation and refinement.



Explainability

What It Is The ability of a controller to explain, after the fact or before, why it did or intends to do something.
Why It Is Important If a controller does something we don't want it to repeat - e.g. crash an airplane full of people - it needs to be able to explain why it did what it did. If it can't it means we can never be sure of why this autonomous system did what it did, or even whether it had any other choice.
Human-Level AI Even more importantly, to grow and learn and self-inspect the AI system must be able to sort out causal chains. If it can't it will not only be incapable of explaining to others why it is like it is, it will be incapable of explaining to itself why things are the way they are, and thus, it will be incapable of sorting out whether something it did is better for its own growth than something else. Explanation is the big black hole of ANNs: In principle ANNs are black boxes, and thus they are in principle unexplainable - whether to themselves or others.
AERA tries to address this by encapsulating knowledge as hierarchical models that are built up over time, and can be de-constructed at any time.




High-Level View of AERA

AERA The Auto-Catalytic Endogenous Reflective Architecture is an AGI-aspiring self-programming system that combines feedback and feed-forward control in a model-based and model-driven system that is programmed with a seed.
High-level view of the three main functions at work in a running AERA system and their interaction with its knowledge store.

Models
All models are stored in a central memory, and the three processes of planning, attention (resource management) and learning happen as a result of programs that operate on models by matching, activating, and scoring them. Models that predict correctly – not just “what happens next?” but also “what will happen if I do X?” – get a success point. Every time a model 'fires' like that it gets counted, so the ratio of success over counts gives you the “goodness” of a model.
Models that have the lowest scores are deleted, models with a good score that suddenly fail result in the generation of new versions of itself (think of it as hypotheses for why it failed this time), and this process over time increases the quality and utility of the knowledge of the controller, in other words it learns.

Attention
Attention is nothing more than resource management, in the case of cognitive controllers it typically involves management of knowledge, time, energy, and computing power. Attention in AERA is the set of functions that decides how the controller uses its compute time, how long it “mulls things over”, and how far into the future it allows itself to “think”. It also involves which models the system works with at any point in time, how much it explores models outside of the obvious candidate set at any point in time.

Planning
Planning is the set of operations involved with looking at alternative ways of proceeding, based on predictions into the future and the quality of the solutions found so far, at any point in time. The plans produced by AERA are of a mixed opportunistic (short time horizon)/firm commitment (long time horizon) kind, and their stability (subject to change drastically over their course) depend solely on the dependability of the models involved – i.e. how well the models represent what is actually going on in the world (including the controllers “mind”).
Learning Learning happens as a result of the accumulation of models; as they increasingly describe “reality” better (i.e. their target phenomenon) they get better for planning and attention, which in turn improves the learning.
Memory AREA's “global knowledge base” is in some ways similar to the idea of blackboards: AERA stores all its knowledge in a “global workspace” or memory. Unlike (Selfridge's idea of) blackboards, the blackboard contains executive functions that manage the knowledge dynamically, in addition to “the experts”, which in AERA's case are very tiny and better thought of as “models with codelet helpers”.
Pervasive Use of Codelets A codelet is a piece of code that is smaller than a typical self-contained program, typically a few lines long, and can only be executed in particular contexts. Programs are constructed on the fly by the operation of the whole system selecting which codelets to run when, based on the knowledge of the system, the active goals, and the state it finds itself in at any point in time.

No “Modules”
Note that the diagram above may imply the false impression that AERA consists of these four software “modules”, or “classes”, or the like. Nothing could be further from the truth: All of AERA's mechanism above are a set of functions that are “welded in with” the operation of the whole system, distributed in a myriad of mechanisms and actions.
Does this mean that AERA is spaghetti code, or a mess of a design? On the contrary, the integration and overlap of various mechanisms to achieve the high-level functions depicted in the diagram are surprisingly clean, simple, and coherent in their implementation and operation.
This does not mean, however, that AERA is easy to understand – mainly because it uses concepts and implements mechanisms and relies on concepts that are very different from most traditional software systems commonly recognized in computer science.



Autonomous Model Acquisition

What it is The ability to create a model of some target phenomenon automatically.
Challenge Unless we know beforehand which signals cause perturbations in <m>o</m> and can hard-wire these from the get-go in the controller, the controller must search for these signals.
In task-domains where the number of available signals is vastly greater than the controller's resources available to do such search, it may take an unacceptable time for the controller to find good predictive variables to create models with.
<m>V_te » V_mem</m>, where the former is the total number of potentially observable and manipulatable variables in the task-environment and the latter is the number of variables that the agent can hold in its memory at any point in time.



Model Acquisition Function

The agent has a model generation function <m>P_M</m> implemented in its controller. The role of the function is to take observed chains of events and produce models intended to capture the events' causal relationships.
A learning agent is situated so as to perceive the effects of the relationships between variables.
The agent observes the interaction between the variables for a while, rendering some data about their relations (but not enough to be certain about it, and certainly not enough to create a complete model of it).
This generates hypotheses about the relation between variables, in the form of candidate relational models of the observed events.



Model Generation & Evaluation

Based on prior observations, of the variables and their temporal execution in some context, the controller's model generation function <m>P_M</m> may have captured their causal relationship in three alternative models, <m>M_1, M_2, M_3</m>, each slightly but measurably different from the others. Each can be considered a hypothesis of the actual relationship between the included variables, when in the context provided by <m>V_5, V_6</m>.
The agent's model generation mechanisms allow it to produce models of events it sees. Here it creates models (a) <m>M_1</m> and (b) <m>M_2</m>. The usefulness / utility of these models can be tested by performing an operation on the world (c ) as prescribed by the models. (Ideally, when one wants to find on which one is best, the most efficient method is an (energy-preserving) intervention that can only leave one as the winner.)
The result of feedback (reinforcement) may result in the deletion, rewriting, or some other modification of the original model selected for prediction. Here the feedback has resulted in a modified model <m>M{prime}_2</m>.





Demo Of AERA In Action

Demos The most complex demo of an AERA system was the S1 agent learning to do an interview (in the EU-funded HUMANOBS research project). Main HUMANOBS page
TV Interview In the style of a TV interview, the agent S1 watched two humans engaged in a “TV-style” interview about the recycling of six everyday objects made out of various materials.
Data S1 received realtime timestamped data from the 3D movement of the humans (digitized via appropriate tracking methods at 20 Hz), words generated by a speech recognizer, and prosody (fundamental pitch of voice at 60 Hz, along with timestamped starts and stops).
Seed The seed consisted of a handful of top-level goals for each agent in the interview (interviewer and interviewee), and a small knowledge base about entities in the scene.
What Was Given * actions: grab, release, point-at, look-at (defined as event types constrained by geometric relationships)
* stopping the interview clock ends the session
* objects: glass-bottle, plastic-bottle, cardboard-box, wodden-cube, newspaper, wooden-cube
* objects have properties (e.g. made-of)
* interviewee-role
* interviewer-role
* Model for interviewer
* top-level goal of interviewer: prompt interviewee to communicate
* in interruption case: an imposed interview duration time limit
* Models for interviewee
* top-level goal of interviewee: to communicate
* never communicate unless prompted
* communicate about properties of objects being asked about, for as long as there still are properties available
* don’t communicate about properties that have already been mentioned
What Had To Be Learned GENERAL INTERVIEW PRINCIPLES
* word order in sentences (with no a-priori grammar)
* disambiguation via co-verbal deictic references
* role of interviewer and interviewee
* interview involves serialization of joint actions (a series of Qs and As by each participant)

MULTIMODAL COORDINATION & JOINT ACTION
* take turns speaking
* co-verbal deictic reference
* manipulation as deictic reference
* looking as deictic reference
* pointing as deictic reference

INTERVIEWER
* to ask a series of questions, not repeating questions about objects already addressed
* “thank you” stops the interview clock
* interruption condition: using “hold on, let’s go to the next question” can be used to keep interview within time limits

INTERVIEWEE
* what to answer based on what is asked
* an object property is not spoken of if it is not asked for
* a silence from the interviewer means “go on”
* a nod from the interviewer means “go on”
Result After having observed two humans interact in a simulated TV interview for some time, the AERA agent S1 takes the role of interviewee, continuing the interview in precisely the same fasion as before, answering the questions of the human interviewer (see videos HH.no_interrupt.mp4 and HH.no_interrupt.mp4 for the human-human interaction that S1 observed; see HM.no_interrupt_mp4 and HM_interrupt_mp4 for other examples of the skills that S1 has acquired by observation). In the “interrupt” scenario S1 has learned to use interruption as a method to keep the interview from going over a pre-defined time limit.

The results are recorded in a set of three videos:
Human-human interaction (what S1 observes)
Human-S1 interaction (S1 interviewing a human)
S1-Human Interaction (S1 being interviewed by a human)





2019©K. R. Thórisson

EOF

/var/www/cadia.ru.is/wiki/data/attic/public/t-720-atai/atai-19/lecture_notes_w9.1568034405.txt.gz · Last modified: 2024/04/29 13:32 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki