User Tools

Site Tools


public:t-720-atai:atai-19:lecture_notes_architectures

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
public:t-720-atai:atai-19:lecture_notes_architectures [2019/10/01 07:54] thorissonpublic:t-720-atai:atai-19:lecture_notes_architectures [2024/04/29 13:33] (current) – external edit 127.0.0.1
Line 51: Line 51:
  
 ====Self-Programming ==== ====Self-Programming ====
-[S] State-space search (example: GPS (Newell 1963). The atomic actions are state-changing operators, and a program is represented as a path from the initial state to a final state. Variants of this approach include program search (examples: Gödel Machine (Schmidhuber 2006)): Given the action set A, in principle all programs formed by it can be exhaustively listed and evaluated to find an optimal one according to certain criteria. 
  
-[P] Production system (example: SOAR (Laird 1987)). Each production rule specifies the condition for a sequence of actions that correspond to a program. Mechanisms that produce new production rules, such as chunking, can be considered self-programming. 
-[R] Reinforcement learning (example: AIXI (Hutter 2007)). When an action of an agent changes the state of the environment, and each state has a reward value associated, a program corresponds to a policy in reinforcement learning. When the state transition function is probabilistic, this becomes a Markov decision process. 
  
-[G] Genetic programming (example: Koza’s Invention Machine (Koza et al2000). program is formed from the system’s actionsinitially randomly but subsequently via genetic operators over the best performers from prior solutionspossibly by using the output of some actions as input of some other actionsAn evolution process provides utility function that is used to select the best programs, and the process is repeated.+|  What it is  | //Self-programming// here means, with respect to some virtual machine <m>M</m>, the production of one or more programs created by <m>M</m> itself, whose //principles// for creation were provided to <m>M</m> at design time, but whose details were //decided by// <m>M</m>  //at runtime // based on its //experience//
 +|  Self-Generated Program  | Determined by some factors in the interaction between the system and its environment.   | 
 +|  Historical note  | Concept of self-programming is old (Jvon Neumann one of the first to talk about self-replication in machines). However, few if any proposals for how to achieve this has been fielded.  [[https://en.wikipedia.org/wiki/Von_Neumann_universal_constructor|Von Neumann's universal constructor on Wikipedia]]   | 
 +|  No guarantee  | The fact that a system has the ability to program itself is not a guarantee that it is in a better position than a traditional system. In factit is in a worse situation because in this case there are more ways in which its performance can go wrong.    | 
 +|  Why we need it  | The inherent limitations of hand-coding methods make traditional manual programming approaches unlikely to reach a level of a human-grade generally intelligent systemsimply because to be able to adapt to a wide range of tasks, situations, and domains, a system must be able to modify itself in more fundamental ways than a traditional software system is capable of.   | 
 +|  Remedy  | Sufficiently powerful principles are needed to insure against the system going rogue.    | 
 +|  The Self of machine  | **C1:** The processes that act on the world and the self (via senctors) evaluate the structure and execution of code in the system and, respectively, synthesize new code. \\  **C2:** The models that describe the processes in C1, entities and phenomena in the world -- including the self in the world -- and processes in the self. Goals contextualize models and they also belong to C2. \\ **C3:** The states of the self and of the world -- pastpresent and anticipated -- including the inputs/outputs of the machine. 
 +|  Bootstrap code  | A.k.a. the "seed". Bootstrap code may consist of ontologies, states, models, internal drives, exemplary behaviors and programming skills  |
  
 +\\
 +\\
 +==== Programming for Self-Programming ====
 +|  Can we use LISP?   | Any language with similar features as LISP (e.g. Haskel, Prolog, etc.), i.e. the ability to inspect itself, turn data into code and code into data, should //in theory// be capable of sustaining a self-programming machine.   |
 +|  Theory vs. practice  | "In theory" is most of the time //not good enough// if we want to see something soon (as in the next decade or two), and this is the case here too; what is good for a human programmer is not so good for a system having to synthesize its own code in real-time.  |
 +|  Why?  | Building a machine that can write (sensible, meaningful!) programs means that machine is smart enough to understand the code it produces. If the purpose of its programming is to //become//smart, and the programming language we give to it //assumes it's smart already//, we have defeated the purpose of creating the self-programming machine in the first place.    |
 +|  What can we do?  | We must create a programming language with //simple enough// semantics so that a simple machine (perhaps with some clever emergent properties) can use it to bootstrap itself in learning to write programs.  |
 +|  Does such a language exist?  | Yes. It's called [[http://alumni.media.mit.edu/~kris/ftp/nivel_thorisson_replicode_AGI13.pdf|Replicode]].   |
  
-[I] Inductive logic programming (c.f. Muggleton 1994). A program is a statement with a procedural interpretation, which can be learned from given positive and negative examples, plus background knowledge. +\\ 
-Besides the above fairly well-known AI approaches, we add here two less known ones that have recently become relevant in the context of self-programming: +\\
-[E] Evidential reasoning (example: NARS (Wang 2006)). A program is a statement with a procedural interpretation, and it can be learned using multi-strategy (ampliative) uncertain reasoning. The details of this approach is described in the article by Wang in this issue.+
  
-[A] Autocatalysis (example: Ikon Flux (Nivel 2007)). In this context the architecture is in large part comprised of a large collection of models, acting as hierarchically organized controllers, executed through a contextually-informed, continuous auto-catalytic process. New models are produced automatically, based on experience, their quality evaluated in light of this experience, and improvements produced as a result. Self-programming occurs at two levels: The lower one is concerned with performance in a set of domains, making models of how best to achieve goals in the external world at any point in time, the higher level is concerned with the operation of the lower one, implementing integrated cognitive control and meta-learning capabilities. Semantically closed auto-catalytic processes maintain the system’s growth after they are deployed. 
  
 \\ \\
Line 72: Line 82:
 ====The SOAR Architecture==== ====The SOAR Architecture====
  
-TBD+|  What it is  | One of the oldest cognitive architectures in history.   | 
 +|  Why is it important  | One of the oldest AGI-aspiring systems in history.    | 
 +|  How does it work  | Reasoning engine does pattern-matching with hand-coded 'production' rules and 'operators' to solve problems, with an ability to "chunk" - create 'shortcuts' for long transitive reasoning chains. Upon 'impasse' (break in the flow of reasoning/problemsolving) a reasoning process tries to resolve it via successive application of relevant rules.   | 
 +|  Recent Additionns  | Reinforcement learning for steering reasoning. Sub-symbolic processing for low-level perception.    | 
 +|  Missing in Action  | Attention (resource control, self-control), symbolic learning (other than chunking).   | 
 + 
 + 
 + 
 +SOAR is a relatively mature cognitive architecture that has been used by many researchers worldwide during its 20 year life span. During this time it has also been revised and extended in a number of ways. The architecture consists of heterogenous components that interact during each decision cycle. These are working memory and three types of long-term memory: semantic, procedural and episodic. Working memory is where information related to the present is stored with its contents being supplied by sensors or copied from other memory structures based on relevancy to the present situation. Working memory also contains an activation mechanism, used in conjunction with episodic memory, that indicates the relevancy and usefulness of working memory elements. Production rules are matched and fired on the contents of working memory during the decision cycle, implementing both an associative memory mechanism (as rules can bring data from long-term memory into working memory) and action selection (as rules propose, evaluate and apply operators). Operators are procedural data stored in procedural memory. The application of an operator is carried out by a production rule and either causes changes in the working memory or triggers an external action. In cases where operator selection fails due to insufficient knowledge, an impasse event occurs and a process to resolve the impasse is started. This process involves reasoning and inference upon existing knowledge using the same decision cycle in a recursive fashion, the results of this process are converted to production rules by a process termed chunking. Reinforcement learning is used for production rules relating to operator selection to maximize future rewards in similar situations. One of the most recent additions to the SOAR architecture is sub-symbolic processing used for visual capabilities, where the bridge between sub-symbolic to symbolic processing consists of feature detection. As the working memory can contain execution traces, introspective abilities are possible. 
 + 
 +The SOAR architecture provides one of the largest collection of simultaneous running cognitive processes of any cognitive architecture so far. However, there is no explicit mechanism for control of attention and the architecture is not designed for real-time operation. The latter may be especially problematic as execution is in strict step-lock form and in particular, the duration (amount of computation) in each decision cycle can vary greatly due to impasse events that are raised occasionally. One might argue that the development of SOAR has been somewhat characterized by "adding boxes" (components) to the architecture when it might be better to follow a more unified approach putting integration at the forefront. 
 + 
 +There are a few cognitive architectures that somewhat resemble SOAR and can be placed categorically on the same track. These include ICARUS, which has a strong emphasis on embodiment and has shown promise in terms of generality in a number of toy problems such as in-city driving, and LIDA which was developed for the US Navy to automatically organize and negotiate assignments with sailors but does not have embodiment as a design goal. As in SOAR, both of these implement different types of memory in specialized components and have a step-locked decision cycle. 
 + 
 +2013(c)Helgi P. Helgason
 \\ \\
 \\ \\
Line 78: Line 102:
 \\ \\
  
 +====The AERA System====
  
 +The Auto-catalytic Endogenous Reflective Architecture – AERA – is an AGI-aspiring architectural blueprint that was produced as part of the HUMANOBS FP7 project. It encompasses several fundamentally new ideas in the history of AI, including a new programming language specifically conceived to solve some major limitations of prior efforts in this respect, including self-inspection and self-representation, distributed representation of knowledge, and distributed reasoning. AERA systems are any-time, real-time, incremental/continuous learning, on-line learning systems.
 +
 +AERA's knowledge is stored in models, which essentially encode transformations on input, to produce output. Models have a trigger side (left-hand side) and a result side (right-hand side). In a forward-chaining scenario, when a particular piece of data matches on the left hand of a model (it is only allowed to test the match if the data has high enough saliency and the program has sufficient activation) the model fires, producing the output specified by its left-hand side and injecting it into a global memory store. The semantics of the output is prediction, and the semantics of the input is either fact or prediction. Notice that a model in AERA is not a production rule; a model relating A to B does not mean “A entails B”, it means A predicts B, and it has an associated confidence value. Such models stem invariably (read: most of the time) from the system's experience, and in early stages of learning an AERA-based system's set of models may mostly consist fairly useless and bad models, all with relatively low confidence values (“not all models are created equal – some are in fact better than others”).
 +
 +In backward-chaining – to implement the process of abduction – models act the other way around, namely, when some data match the right-hand side, a model produces new data patterned after its left side, whose semantics essentially state that “if you want a B (on the right-hand side) perhaps it would help to get an A (term on the left-hand side). The semantics of either (both the input and output) is “goal”.
 +
 +A key principle of AERA operation is that of a distributed production process: Each program in AERA has a level of activation that determines if it is allowed to run or not. Every piece of data has a corresponding saliency level that determines how visible it is inside the system. In AERA there is a single memory, but it (typically) embeds groups that allow sets of data and programs to be addressed, e.g. changing their activation, saliency, or existence (via creation or deletion). 
 +
 +\\
 +\\
  
 ====High-Level View of AERA==== ====High-Level View of AERA====
Line 95: Line 130:
 \\ \\
  
-====Autonomous Model Acquisition==== 
-|  What it is   | The ability to create a model of some target phenomenon //automatically//  | 
-|  Challenge  | Unless we know beforehand which signals cause perturbations in <m>o</m> and can hard-wire these from the get-go in the controller, the controller must search for these signals. \\ In task-domains where the number of available signals is vastly greater than the controller's resources available to do such search, it may take an unacceptable time for the controller to find good predictive variables to create models with. \\ <m>V_te >> V_mem</m>, where the former is the total number of potentially observable and manipulatable variables in the task-environment and the latter is the number of variables that the agent can hold in its memory at any point in time.   | 
  
 \\ \\
 \\ \\
  
-====Model Acquisition Function==== 
-|  {{public:t-720-atai:agent-with-model-gen-function1.png?300}}  || 
-|  The agent has a model generation function <m>P_M</m> implemented in its controller. The role of the function is to take observed chains of events and produce models intended to capture the events' causal relationships.   || 
-|  {{public:t-720-atai:causal-chain_agent1.png?400}}  || 
-|  A learning agent is situated so as to perceive the effects of the relationships between variables. \\ The agent observes the interaction between the variables for a while, rendering some data about their relations (but not enough to be certain about it, and certainly not enough to create a complete model of it). \\ This generates hypotheses about the relation between variables, in the form of candidate relational models of the observed events.     || 
  
- 
-\\ 
-\\ 
-==== Model Generation & Evaluation ==== 
- 
-|  {{public:t-720-atai:three-models-1.png?400}}  | 
-|  Based on prior observations, of the variables and their temporal execution in some context, the controller's model generation function <m>P_M</m> may have captured their causal relationship in three alternative models, <m>M_1, M_2, M_3</m>, each slightly but measurably different from the others. Each can be considered a //hypothesis of the actual relationship between the included variables//, when in the context provided by <m>V_5, V_6</m> | 
-|  {{public:t-720-atai:agent-with-models-1.png?300}}  | 
-|  The agent's model generation mechanisms allow it to produce models of events it sees. Here it creates models (a) <m>M_1</m> and (b) <m>M_2</m>. The usefulness / utility of these models can be tested by performing an operation on the world (c ) as prescribed by the models. (Ideally, when one wants to find on which one is best, the most efficient method is an (energy-preserving) intervention that can only leave one as the winner.)   | 
-|  {{public:t-720-atai:model-m2-prime-1.png?150}}  | 
-|  The result of feedback (reinforcement) may result in the deletion, rewriting, or some other modification of the original model selected for prediction. Here the feedback has resulted in a modified model <m>M{prime}_2</m> | 
- 
-\\ 
-\\ 
- 
- 
- 
-\\ 
-\\ 
- 
- 
- 
-====Demo Of AERA In Action==== 
-|  Demos  | The most complex demo of an AERA system was the S1 agent learning to do an interview (in the EU-funded HUMANOBS research project). [[http://www.mindmakers.org/projects/humanobs/wiki/HUMANOBS_Videos|Main HUMANOBS page]]  | 
-|  TV Interview  | In the style of a TV interview, the agent S1 watched two humans engaged in a "TV-style" interview about the recycling of six everyday objects made out of various materials.   | 
-|  Data  | S1 received realtime timestamped data from the 3D movement of the humans (digitized via appropriate tracking methods at 20 Hz), words generated by a speech recognizer, and prosody (fundamental pitch of voice at 60 Hz, along with timestamped starts and stops).   | 
-|  Seed  | The seed consisted of a handful of top-level goals for each agent in the interview (interviewer and interviewee), and a small knowledge base about entities in the scene.     | 
-|  What Was Given  | * actions: grab, release, point-at, look-at (defined as event types constrained by geometric relationships) \\ * stopping the interview clock ends the session \\ * objects: glass-bottle, plastic-bottle, cardboard-box, wodden-cube, newspaper, wooden-cube \\ * objects have properties (e.g. made-of) \\ * interviewee-role \\ * interviewer-role \\ * Model for interviewer \\ * top-level goal of interviewer: prompt interviewee to communicate \\ * in interruption case: an imposed interview duration time limit \\ * Models for interviewee \\ * top-level goal of interviewee: to communicate \\ * never communicate unless prompted \\ * communicate about properties of objects being asked about, for as long as there still are properties available \\ * don’t communicate about properties that have already been mentioned    | 
-|  What Had To Be Learned  | GENERAL INTERVIEW PRINCIPLES \\ * word order in sentences (with no a-priori grammar) \\ * disambiguation via co-verbal deictic references \\ * role of interviewer and interviewee \\ * interview involves serialization of joint actions (a series of Qs and As by each participant) \\ \\ MULTIMODAL COORDINATION & JOINT ACTION \\ * take turns speaking \\ * co-verbal deictic reference \\ * manipulation as deictic reference \\ * looking as deictic reference \\ * pointing as deictic reference \\ \\ INTERVIEWER \\ * to ask a series of questions, not repeating questions about objects already addressed \\ * “thank you” stops the interview clock \\ * interruption condition: using “hold on, let’s go to the next question” can be used to keep interview within time limits \\ \\ INTERVIEWEE \\ * what to answer based on what is asked \\ * an object property is not spoken of if it is not asked for \\ * a silence from the interviewer means “go on” \\ * a nod from the interviewer means “go on”   | 
-|  Result  | After having observed two humans interact in a simulated TV interview for some time, the AERA agent S1 takes the role of interviewee, continuing the interview in precisely the same fasion as before, answering the questions of the human interviewer (see videos HH.no_interrupt.mp4 and HH.no_interrupt.mp4 for the human-human interaction that S1 observed; see HM.no_interrupt_mp4 and HM_interrupt_mp4 for other examples of the skills that S1 has acquired by observation). In the "interrupt" scenario S1 has learned to use interruption as a method to keep the interview from going over a pre-defined time limit. \\ \\ The results are recorded in a set of three videos: \\ [[https://www.youtube.com/watch?v=SH6tQ4fgWA4|Human-human interaction]] (what S1 observes) \\ [[https://www.youtube.com/watch?v=SH6tQ4fgWA4|Human-S1 interaction]] (S1 interviewing a human) \\ [[https://www.youtube.com/watch?v=x96HXLPLORg|S1-Human Interaction]] (S1 being interviewed by a human)  | 
- 
- 
-\\ 
-\\ 
- 
-====The X Architecture==== 
- 
-TBD 
-\\ 
-\\ 
-\\ 
-\\ 
  
  
-2018(c)K. R. Thórisson +2019(c)K. R. Thórisson 
 \\ \\
 \\ \\
 //EOF// //EOF//
/var/www/cadia.ru.is/wiki/data/attic/public/t-720-atai/atai-19/lecture_notes_architectures.1569916492.txt.gz · Last modified: 2024/04/29 13:32 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki