//Course notes.// \\ \\ ====== AGI Architecture ====== ===== The Importance of Architecture ===== Main points: * Several cognitive functions are transversal in significant ways, including learning and attention * Many identifiable cognitive functions interact in substantial and important ways, including short-term and long-term memory, inferencing, generalization, prediction and anticipation, goal acquisition, to name a few * Because of this cognitive systems must be studied as a whole; the architecture of mind must be a first-class citizen in those studies When we say "architecture" we refer to a system's information structural layout and the key principles behind its operation; the constraints that are put on information intake, transfer, storage, and how computations are implemented, performed, controlled, as well as how these may be different over the lifetime of the system and the principles guiding their change over time. Just like people walking around in a building, the building imposing limitations on where they can walk (doors and walls) and what they can look at (windows and glass walls), a software architecture is a blueprint for the components and operations of a software system. The architecture of a system, as implemented in a substrate (hardware) controls how and where information flows, and what computations are performed, their order, and in what manner. The choice of programming language determines to a significant extent what kinds of systems can (theoretically and practically) be implemented. Programming languages impose constraints on what kinds of architecture is viable -- for example, LISP presents very easy ways to write code that writes code, while C++ makes this very difficult. If automatic generation of code, and subsequent runtime-dependent running of such code, is part of the requirements of a system we are planning to construct, we would naturally feel less constrained if we could choose Lisp over C++. There are of course other key factors that influence the choice of programming language, and we will get to these shortly. Since the vast majority of current programming languages can in theory support any feature encountered in any other language, when we decide which programming language to choose the choice depends primarily on practical factors: how effectively and efficiently does it allow us to implement the data and control structures that we already know we'll be needing. Prolog seemed at one point to be a good programming language for programming logic-based systems. Logic of some kind seems to be necessary for any AGI if we want it to be capable of reasoning. While humans can -- and quite often do -- engage in activities that most would agree to call "logic", it is not clear that an exclusively logic-based environment provides the best platform for AGI research. One of the problems with most Prolog-based development environments is that the control of its runtime operation is typically out the programmer's hands. Another is that time is not handled well -- to do anything involving time, such as move a robot arm from X to Y, can be extremely cumbersome to do, if it can be done reasonably at all. A third problem is a fairly limited extensions beyond first-order logic. A fourth -- and very serious problem -- has been a lack of thread and parallelization support. Lisp is another programming language with roots in artificial intelligence. Lisp development frameworks have typically provided more control of a system's behavior than Prolog, and object-oriented extensions such as CLOS (Common Lisp Object System) have been hailed as one of the most powerful commercially available object oriented frameworks. Multithreading has been part of Lisp environments from the early 1990s. Lately both Lisp and Prolog have fallen by the wayside in AI research, being in part replaced by Java (for convenience and library reasons), C++ (for efficiency reasons), and Haskel (for clean design and good support of parallelism). Many of the cognitive functions that have been studied implement essentially anytime algorithms, in one way or another. For example, if we decide that //right now// is the moment to try to memorize the country code of a phone number that keeps coming up when we receive foreign calls, we can do that with reasonable expectations for improved memorization results -- the memorization function can be called like that at virtually any point in time. It could even be said that intelligence as a whole is in itself an anytime system, because as a unit it must always be interruptible at any time, in order to balance its various goals. Because of the impoverished handling of time in modern programming languages, coupled with a lack of good support for parallelization, implementing an algorithm as an anytime algorithm is quite a challenge. What building blocks we use, how we put them together, how they interact over time to produce the dynamics of a system: The discussion ultimately revolves around architecture. The types of system architectures we choose to explore for building intelligent systems will determine the capabilities of the system as a whole. The nature of these architectures will of course directly dictate the methodologies that we use for building them. One issue that cuts at the core of intelligence architectures is that of transversal functions – functions that affect the design and organization of the whole system. A generally intelligent machine must be able to learn anything, meaning essentially an enormously large range of things, regarding the world as well as itself. It must be possible to apply learning not only to external things and tasks, but also to internal functions: A //generally// intelligent system should be able to improve its own bodily //and// cognitive performance on any task in any setting. We don't mean, of course, that it is equally good at everything -- that is an impossibility -- but we should make the assumption that it degrades gracefully between a wide range of extremes. The learning needs to be reasonably context- and content-independent, and it needs to be applicable to various external and internal processes, at various levels of detail, across a fairly wide range of detail. This has many implications which could be summarized as **general-purpose** learning. A critical feature of the learning mechanism in any AGI, and one that is typically not part of any discussion where the "G" in "AGI" is ignored, is that the learning must be **system-wide**. A requirement of **transversality** means that the learning must be able to address, or be applied to, virtually anything in the cognitive architecture. Transversal cognitive functions are a system architect's nightmare: They affect anything and everything in the system, and make its design orders of magnitude more difficult. There are several cognitive functions besides learning that are transversal, and still others that have transversal aspects. Among those that clearly fall into the class of transversal functions are attention, temporal grounding, and goal acquisition. By **goal acquisition** we mean a cognitive system's ability to identify the need for creating a new goal, and the ability to create an appropriate goal from a set of circumstances, often times defined by either a lack of a particular goal or the need for a goal to bridge between other goals. Goals should in fact be generatable from a variety of contexts, including from instructions of another cognitive system, e.g. when participating in some activity for the first time. To take an example, a goal of being on average //slightly less tense every day//, may be all that is needed for someone to get rid of sore muscles. Identifying the non-desired state, muscle soreness, inferring potential causes, and generating a plan with a root goal to reduce or remove those causes, is a necessary function of any AGI. The source of the need for the new goal should not matter: Without the //general ability// to perform this feat, a cognitive system is less likely to be an AGI. Attention is of course closely related to an AGI's ability to select the correct tasks to perform, attend to the right details, control sensory apparatuses appropriately in light of present goal(s). By the term "attention" is meant a rather broad set of skills and abilities relating to managing a cognitive system's resources. Like learning, the attention process -- or more appropriately, processes must be applicable not only to information which has its origins outside of the system, but also to processes inside it. Attention is a "helper" function to learning, to a system's ability to perform tasks and achieve goals, yet it is a central function without which any AGI would be unable to focus on the right information sources (and, conversely, filter out unwanted information and noise), stick to the right goals for sufficiently long to achieve them, and yet stay open to interruptions to appropriate levels. For engineers, "real-time" means the time as it elapses in the real world. //Hard real-time// systems are imposed real-world deadlines by their designer – without information that allows systems to understand their purpose or meaning. Intelligent autonomous systems, on the other hand, are bound to the laws governing the maximization of their //utility function//. To operate in the world – in real-time – means therefore something very different in the context of cognitive systems: machine-time must be expressed by the semantics of in the system-world’s state space. Internal processes of the system are essentially mapped onto world-time with regards to their contribution towards achieving goals. For example, a deadline in world-time could be grounded in a (time-bounded) process, getting the cake out of the oven before it burns, and contextualized by the goal to eat something sweet with this afternoon's coffeecup. In the act of opening the oven door, grabbing the cake (with gloves on), and putting it on the table, every every movement of every muscle is part of -- and contextualized by -- achieving the overarching goal of eating the cake. The actions and their goal are temporally grounded -- they have direct and observable relation with a real-world clock, in light of the cognitive agent's utility function. This temporal grounding affects pretty much any action, whether mental or physical, of a generally intelligent system at any point in time, and must therefore, by definition, be transversal. \\ \\ ===== How Does Architecture Matter? ===== Main points: * Compared to a linear composition of modules -- a LEGO-like plug and play of cognitive functions -- transversal functions sets AGI architecture design apart from virtually all other such undertakings * Architecture affects in significant ways how transversal functions such as temporal grounding are implemented * Software/hardware architecture has an effect in mainly two ways: the ability to parallelize processes, and the speed and nature of their interaction * These features fundamentally affect both the theoretical possibilities and practical implementations of cognitive systems As already mentioned, architecture is the structure and inherent nature of a system, with regards to its operation. The organization of an architecture's components includes anything and everything that cannot be derived from a simple list of the components/functions and their internal structure. The architecture imposes higher-level constraints on the components, making them serve a purpose -- a role in a bigger context. Transversal functions are necessary for achieving AGI. Designing systems with functions with global effects puts requirements on the system's architect, as taking multiple complex constraints into account when designing something makes the design a much more intensive undertaking. Take the design of a modern passenger airplane: Numerous iterations through the various goals of its design, from customer comfort to aerodynamics and fuel efficiency to controllability, are needed in the various design departments responsible for its various components. A head engineer is made responsible for all decisions, which are informed by a set of committees, coordination meetings, individual analyses, risk assessments, etc. At the top level is the goal of building a unified whole that can be used to carry passengers in a competitive way across vast distances. Increasing the number of seats will affect the weight of the airplane, which in turn affects the fuel efficiency, which in turn affects the airplane's competitiveness in the global market. Now imagine that airplane having the additional requirement of not only being able to learn to adjust its wings based on its acquired experience of flying for several years, but to change their operational characteristics by dynamically selecting appropriate wing shape depending on weather conditions. This is what we need an AGI to be capable of -- in which the wings are e.g. the methods for learning, the ways of doing tasks, the invention of tools when the hands/grippers/manipulators don't suffice. The ability to do reasonably risk-free trial and error runs, to collect information and produce hypotheses about how some goal(s) could be achieved sooner, better, faster, etc. are a big part of such a system's abilities. The implication of system-wide, transversal functions is that architectural design plays //a critical role// in achieving AGI. Depending on our choice of programming languages, hardware setup, and processing speed, certain architectural designs may impact information access, transfer, and processing, differently than others, and information access, transfer, and processing are in fact critical to a system's practical viability. And as we have already established, practical viability is in fact a key factor of intelligence, because intelligence exists in essence to handle complexities with non-obvious implications to an intelligent system's utility function. The ratio between environment complexity and the computing power available to the cognitive system, from the bottom of atomic actions to the top of the perception-action loop, determines in fact whether the system has any chance of being called intelligent. Then there is the issue of temporal grounding. Imparting anytime capabilities to an AI architecture is not just a simple matter of combining a bunch of anytime algorithms -- at the gross architectural level the //system itself// must have certain (soft realtime) capabilities, to be dynamically interruptible and to be able to balance several goals. Parallelization is one obvious way to go, but developing small programs with many threads is currently a challenge, imagine large architectures -- and AGI systems are bound to be fairly large -- is yet another level with vastly greater challenges. {{ :public:temporal-grounding.png?500 }} The real world has a clock - the passage of time - that all matter must obey; to operate in the real world a cognitive agent must understand the meaning of this clock in relation to its own goals. This is done by sensing the external world, creating perceptions out of the sensory operations, presenting these perceptions to internal processes that create correlations or a predictable relationship (r) between internal events, such as intending to catch a frisbee, and the sensed data, e.g. the color and boundary of the frisbee - which are perceptual data created from raw sensory data - indicating the particular path of the frisbee in three-space. The closer this correspondence is predicted, and the more generally across all cognitive and perceptual events and processes, the more temporally grounded we can say the agent is. \\ \\ //2012(c)Kristinn R. Thórisson//