Course notes.
What is Intelligence?
It seems like such a very easy question to answer, but people frequently argue about what intelligence is. And not just laypeople, experts in fields claiming to study some of the very phenomena related to intelligence don't seem to agree on what it is either. Yet there is one example of intelligence that everyone seems to agree on – most people agree that most if not all average human beings harbor intelligence. Even those who do not agree with that will tend to agree that intelligence is a capability inherent in human minds and which it sometimes exhibits, if not every day then at least every once in a while, or at the very least they agree that some individuals on planet Earth have exhibited intelligence at some point or other in past times.
Intelligence exists because there is complexity in the world. The purpose of intelligence is to deal with the complexity of the world. Complexity, as some readers probably have already discovered, is different from both simplicity – which of course is as the word implies, simple, and complete randomness, because randomness has no structure and therefore cannot be measured on a scale from simple to complex. However, randomness is not totally irrelevant to complexity, as we shall see, but it is more of a footnote than a main player. Complexity can be attributed to any system implementing long causal chains. The causation can be of various kinds, but let's for a moment stick to completely deterministic mechanical couplings, of which a mechanical wind-up clock is an excellent example. The complex combination of intertwining gears, springs and escapement (a way to generate mechanical rhythm) causes the hands on the clock to map in a predictable and reasonably consistent way (for all practical purposes) to an external reality (called the real world) so that those who can read a clock can coordinate temporally-depended actions ahead of time. Complex as it may be, the complexity inherent in a mechanical clock is not nearly as complex though as many of the subjects that science has made a point of studying through the ages, some examples being ecosystems, societies, natural language, biochemistry, genetics, and so on. What do these topics of study have, in addition to the complex mechanical causal chains of the wind-up clock? They have a mixture of causal chain types.
There are at least two axes along which causal chains can be classified, coupling density and coupling strength. Density refers to the number of causal connections between any identifiable intermediate stages in the causal chain; strength refers to the strength of the effect of each identifiable cause on subsequent events. This figure helps explain this:
Nodes stand for identified causal factors, edges stand for the causal relationship between these. a. Sparsely coupled system; b. Densely coupled system; c. Tightly coupled system; d. Loosely coupled system; e. System with multi-variable couplings; f. Densely coupled system with mixed loose and tight, and multi-variable causal chains.
In most natural systems worthy of study, causal chains include subsystem whose causal connections vary to different degrees along these two dimensions. In the real world of course there are oftentimes many nodes that are not directly observable in any way (as for example atoms were not observable before the scanning tunneling microscope) yet can be inferred, and their existence can be verified by experimentation (whether formal or informal). In cases where causal chains have factors that are not directly observable whose couplings with the rest of the system are loose, the best way to characterize the system is to note correlations between surface phenomena and/or system behavior. For example, if there are two types of plant very similar to others, one being edible and the other deadly poisonous, and where the main cause of this difference can only be traced to the plants' DNA (an unobservable cause for anyone without DNA inspection capabilities), an intelligent being's only recourse would be to identify key correlates between edibility and surface features such as subtle differences in color, leaf structure, or perhaps the place where the plant grows. The picture gets significantly more complex when the edibility of a particular type of plant depends in fact on the type of soil, place of growth, freshness, etc. This is of course how most things in the real world are: there is a lot of gray area – difficult-to-identify-and-classify features – that can mean the difference between life and death of a being. This is where intelligence comes in. This is in fact what intelligence exists for. And this example is not a coincidence, for the reason intelligence exists at all is precisely because energy, which is necessary for life to exist, is a scarce resource. Of course, to some mammals energy comes in the form of plants. This, and the fact that there is a real-world clock ticking for every biological entity on the planet, are the only reason intelligence exists at all.
In the real world the kinds of systems that are important to various intelligent species will of course vary depending on the species, and the complexity of the basic environments that the beings typically inhabit. There is, for example, no animal other than humans that could consider living in outer space (some would say this is beyond our capabilities at present, and even for the foreseeable future). Communication can help overcome some of the limitations time exercises over us, so that time-consuming exploration of how the world works that cannot fit in a lifetime can be communicated from generation to generation, instead of having to be repeated by every individual – a natural-language summary of many year's worth of research efforts does not have to list all the dead ends that nevertheless were explored; symbols whose transmission may take 300 milliseconds can stand in for actions that took days to perform (e.g. “He rode his motorbike across most of Europe”). Telling your fellow citizens not to eat a certain plant because it is poisonous can save hundreds of lives, so once on the scene, these are surviving traits: to be able to communicate and to be motivated to share knowledge.
And so it is with a huge number of functions that we normally count as part of the repertoire of any system we call “intelligent” – being able to perceive to a level at a sufficient spatio-temporal resolution, the ability to direct this perception for various purposes (steerable, learnable attention), being able to mentally classify and catalogue stimuli along a vast amount of multi-modal dimensions (and to even learn new such dimensions as necessary), to be able to retrieve any and all (or most) relevant such information at the time when it is needed, to communicate physical and cognitive events to fellow beings as deemed necessary and needed, whether from the past, present, or future (and of course being cognizant of whether they actually are from the past, present, or future), etc. – these are hallmark traits of intelligent systems.
So, because the behavior of systems that are built out of a variety of causal chains is predictable yet non-obvious, intelligence exists. Intelligence is a practical solution to a practical problem. Since all events in the real world take time, but of course not infinite time, the speed of thought matters. For many tasks that require intelligence to be solved, individuals – the harbingers of the “natural intelligence engines” (brains) – may sometimes take days or weeks to ponder what to do. In other cases immediate action is required. In some cases the fastest and most immediate action possible (approximately 70 milliseconds - the fastest possible choice reaction time recorded in humans) is not fast enough to steer the individual away from deadly danger (being hit by lightning is one example). But by far the most numerous cases of a human-world encounters in which a fast reaction is required are much slower than this lower limit. Because the world consists of a mixture of systems implementing various types of causal chains, intelligence too employes a mixture of techniques to deal with them. However, at the core of any intelligent system is a perception capabilities – selectively sampling states of the inhabited world, one or more types of memory, decision making capabilities – deciding to do something (or nothing – which is also a decision, whether deliberate or not), and the ability to affect the world in some way. This all put together into a perception-action loop, whereby the same entity samples the world, makes some decision about doing some cognitive and/or real-world action, and subsequently executes those decisions, is what all natural intelligences consist of. There exist of course possibilities to do things differently when designing an artificial intelligence, but by and large it is difficult to fit any system that does not have these features under the definition of being “intelligent”. That is, there is something about the model for intelligence, provided to us by nature, that makes it a unified whole: Take any one part out and the whole thing becomes less like the thing we are trying to imitate/understand and starts to look more like something else (and we are invariably hard pressed to call that something else “intelligent” or “intelligence”).
The ability to adapt and learn are an important feature of naturally intelligent systems – these seem so critical to our understanding of the phenomenon of intellingece that many researchers (and laymen alike) will not call a system “intelligent” unless it can learn and adapt. There are other ways to approach a definition of intelligence that do not require adaptation and learning, emphasizing the ability to act in environments where the number of possible states is not enumerable. In the context of necessary and sufficient features a useful concept is that of “marginal necessity”. It is defined such that for a given entity <m>E</m> to belong to category <m>C</m> it must posses features <m>f1</m>, <m>f2</m> … <m>fn</m>, some notable number of which are marginally necessary for the entity to qualify as being of category <m>C</m>. Such features are thus called “marginally necessary” for <m>E</m> <m>epsilon</m> <m>C</m> to hold – i.e. it is in some sense disputable whether they are necessary or optional. For any system of reasonable complexity, that we nevertheless regularly use a single concept label for, including “society”, “ecosystem”, “social system”, “living system”, and of course “intelligent system”, have a large number of such marginally necessary features. Instead of spending our time arguing whether something is or isn't intelligent, we should instead spend our time trying to characterize better the systems within immediate and semi-distal range of our target category or categories. Some of that work will of course include talking about definitions, but we should remember that it is not the definitions that we are after, what we are after is understanding particular natural (and artificial) phenomena better.
Since intelligence exists because the world is complex yet predictable, and to do anything in the world one needs to know some of the states of the world to do anything with a particular goal in mind, inevitably some sort of feedback loop is a necessary part of any and all intelligence. We call this the perception-action feedback loop. Put this way it may sound to some like a pipeline process, sample-decide-act, but in fact that is not true of any intelligence in nature. Brains, being the only way which we are aware of (as of yet) for producing intelligence, are inherently parallel processing systems. At any point in time they carry on many “threads” of processing, so to speak, at the same time. Otherwise we would not be able to avoid that tree falling towards us while we recite our favorite poem. In some sense the parallelism of the brain is trivially obvious, and thus one might be led to believe that it should not be too hard to replicate artificially. But when trying to build a machine that does that with its processing tasks it is surprisingly difficult to implement the kind of “parallel multitasking” that even simple brains such as those of squirrels must be able to implement.
One of the core assumptions of artificial intelligence is that intelligence and thought is essentially an information process. While for practical purposes it may matter how the processing is implemented, in principle one can do computations in a number of ways: a change in the substrate on which these are carried out does not change the outcome, addition is still addition, whether one does it on a souped-up modern computer or on an abacus. (Granted – there is no “run” button on an abacus.)
Some have taken this computational foundation to mean that it is not necessary to discuss matters of implementation in A.I. – any limitations on the speed of the processes would be solved over time by faster processors. This seemingly innocent and minor assumption has permeated the whole field for over three decades or more. It is totally incorrect assumption. It is incorrect because intelligence is a response to an environment; the relationship between a mind and its environment is key to defining numerous aspects of an intelligence, including how smart it is. The speed of change of the environment places requirements on the intelligence; solutions for computing actions for environments moving at one pace may prove significantly different to those good for an environment moving at another pace. If the environment changes rapidly, a mind capable of computing useful responses at rates that match the environment better than those produced by another mind should by all accounts be considered more intelligent than the other. This is an important fact about the phenomenon of intelligence: It can only be evaluated in relation to the world in which it operates. In the real world, of course, we are bound to a certain reality, to certain physics, and it is this reality that is the ultimate target of any and all A.I. systems that we might implement. Therefore, to discuss A.I. systems without reference to the limitations of the real world includes the making of two mistakes. First, the relevance of those systems to any real-world application is seriously cast in doubt. Second, because intelligence is a response to our physical reality the systems we may come up with during those discussions are bound to ignore critical features that are necessary (but not necessarily sufficient on their own) for implementing intelligence. One example of this classic mistake is ignoring time. Time is one of the very reasons for which intelligence exits – that is, the passage of time, and our lack of infinite time and resources to simply lie in bed and think. By de-emphasizing time to the point of complete ignorance, a critical feature of the environment for which intelligence exists, is banished from the discussion. The result can only be to reduce the concept of intelligence to a level where it does not resemble that which was its inspiration – natural intelligence. As a result, people end up arguing endlessly about what “is and isn't intelligent” – i.e. the nature and definition of intelligence.
What must be done is to create a list of reasonable marginally necessary features so that we can replicate what we perceive to be valuable in natural intelligences, in order to fulfill the full vision of artificial intelligence – that is to replicate human intelligence in all major ways, in a way that allows us to mold it into a form that is of practical use for various ends and means – and allows us to bring it to the next level – superhuman levels – at which point we hope to start using it to help solve the many complex problems that humanity faces.
One way to start listing the things that an intelligence must be capable of to deserve the label is to look at one extreme end of the intelligence spectrum, starting with human intelligence, and stripping it down – inspecting key functions of this system one by one to see if it is a nice-to-have or a must-have.
For this exercise I will start with a somewhat unconventional feature: The human mind's ability to break down problems in to smaller, more addressable problems. Given a problem <m>P</m>, a human generates a goal <m>G</m> of solving the problem. Identifying potential roadblocks towards achieving the goal, from a set of more than one roadblocks, the goal is broken down into sub-goals that could remove those roadblocks. This is called sub-goaling, and is illustrated in this picture:
A root goal (top) represents an end-state that is supposed to solve a particular novel problem. Part of meeting the root goal is generating simpler goals (sub-goals) for which a solution can more easily be found. In this figure sub-goal 4 cannot be further sub-goaled because some knowledge is missing for how to generate it. In some cases new sub-goals may be blocked by other goals, i.e. depend on other goals being achieved (d).
As the sub-goals are created, action can be taken immediately to start to meet some of them while the rest of the space is being generated (continuous planning and execution), or the full sub-goal space may be generated before any action is taken (pre-planning). When doing sub-goaling the intelligence brings in knowledge from prior experience, and its ability to do this effectively for an unknown problem depends in part on how quickly it can retrieve relevant knowledge to bring to bear on finding a solution: creating a sub-goal which there is little or no hope of meeting will be highly detrimental to solving the root goal. As goals are solved one by one the intelligence gets closer and closer to having solved the initial problem <m>P</m>.
While it is unclear whether intelligences other than humans (e.g. dogs, birds, sea lions, etc.) do such sub-goaling in some way explicitly, the behavior of any animal solving a multi-step puzzle of one kind or other (e.g. how to get around an obstacle) can be mapped by an observer onto a analysis using such goal-subgoal analysis – whenever any intelligence solves any problem, it can in an abstract sense be said to be doing goal-generation. Any intelligence worthy of being called general must for sure be capable of doing goal-subgoal generation, because for certain classes of problems it is the only sensible and available way of attacking them. It can also be argued that in fact whether or not the cognitive mechanism implementing it is explicit, some sort of approximation, or an emulation of such activity, or doing realtime lookup into a giant goal+condition+action triplet table, sub-goaling is what any mind must do when faced with any challenge for the first time, no matter how trivial: A cat chasing a mouse, a bee centering its flight on a flower, a sea lion balancing a ball on its nose. Because solving a problem (e.g. catching a mouse) requires a logical combination of actions without which the goal would never be met, and certain combinations of conditions, sometimes across a very large set of possibilities, must be perceived and evaluated as constituting meeting the goal.
In any case, from the above analysis we see that there one of the functions that any intelligence must have is a memory. A memory system that is implemented in the real world will of course have features of its physical operation that include the speed of writing to it and the speed of retrieving from it. It will also have a certain character that may make it more or less suited for particular tasks and environments. Humans have a highly associative memory that seems to be reasonably general for the many kinds of things that humans do and are capable of doing. Another is some way of perceiving a complex environment: An intelligence's sensors must be suited to retrieve information from the environment in a sufficiently fast manner so as to allow processing of the information and production of decisions quickly enough to ensure the survival of the being – this is of course not part of a system's intelligence but a necessary precondition. While sensors are always limited in some ways, e.g. by field of view or color spectrum sensitivity, sufficient planning and anticipatory action may enable the being to live with those constraints. The capabilities of a perceptual system must also encompass an ability to filter out unimportant from important information in the environment. An action repertoire will also be required, for if the intelligence doesn't do anything in the world there is really no point in its existence.
As we have seen above, the ability to dissect problems into smaller parts, each of which moves the system in some way towards achieving a target goal, must be possessed (in some form) by this system – this is part of the system's intelligence. In fact, the ability to generate goal-subgoal structures, and accompanying actions to achieve the sub-goals as appropriate, follows almost deductively from the assumption that can be made for any intelligent system: that it must operate in environments vastly more complex than they themselves are able to process at any point in time (limited computational resources) with limited knowledge and memory.
A system limited at any point in time by what it can hold in memory and what it can compute must, by definition, pick and choose what it works on at any point in time. To perform things that in an ideal world should be given more “working memory” than is available, i.e. to address the issue of limitations on working memory, planning and scheduling can be called on to offset some of these limitations. Given limited computational speed and memory capacity, and the fact that the environment presents vastly more potential information at any point in time than the system needs, or for that matter can ever hope to process, some sort of resource management and coordination system is needed. In human psychology these are called attention and task planning capabilities. The latter depends a lot on the particular task that a system is doing, and the class of tasks it belongs to, so it is task-dependent (and hence possibly not eligible for being counted as part of the intelligence proper); the former tends to be fairly task-indepenent, having more to do with internally generated goals referencing the management of the memory itself, the external sensors, from moment to moment, and the short-term coordination of external actuators.
One aspect of intelligence that we must admit to being central is learning. A system that cannot learn cannot adapt; a system that cannot adapt is essentially incapable of operating in complex environments, because by definition these will have so many possible “legal” states as to be innumerable, and thus, as long as they are logical worlds (i.e. with a significant amount of reliable causal connections), will have a large set of operational rules that the intelligence will not be provided with from the start. The intelligence must therefore have the capability of extracting “rules” or regularities from the environment. The more it can generalize these, the smarter it is. The faster and better it can figure out which generalizations to use to act in the world, understand, and solve problems in it, the more intelligent it is. Acquisition and “insightful” or creative application of knowledge is therefore a requirement for any higher intelligence.
For any system we may want to decide whether is intelligent or not, removing the above features one by one will make us more hard-pressed to agree on applying the label “intelligent” to describe the system.
There are at least two different perspectives to take when talking about what intelligence is. One tries to reference the absolute complexity of the real world and combination of the world and the intelligent being acting in it, the other references relative changes in the being itself – its starting point (e.g. complexity of system architecture or initial knowledge) and growth over time in a particular environment.
Taking the first perspective, we can say that some real-world phenomenon such as a rock is less intelligent than another one, e.g. bee. In fact, most would agree that a rock is not intelligent, so we can take this as level zero: a static structure such as a rock does not act on its own and hence is not intelligent. It moves if you kick it, and it can roll down hills, so it does act, but these are passive actions devoid of goals. Goals can, however, sensibly be ascribed to thermostats: they have three states (at least simple ones), “too hot”, “too cold”, and “just right”. These conditions are monitored by the device, and its actions are mapped to observations so as to maximize the last-mentioned state, i.e. keeping the temperature “just right”. So in some sense a thermostat is more intelligent than a rock. Even if we don't like calling a thermostat “intelligent” we have started to define a scale that can be extrapolated relatively easily upwards, to complex tasks in complex environments, and at some point presumably one might agree that a threshold into the class of “intelligent systems” will be reached. To make this approach work we would need some measure of complexity for task and environment, along with a measure of a system's ability to identify problems, and to create goals whose achievement would solve the problems. Various tasks and environments could then be organized into calling for more or less intelligence, based purely on their complexity and demand they made for memory systems, planning capabilities, goal-subgoal generation, and action potential. A system's ability to perform complex tasks would be one way to measure its intelligence, and another would be to measure its ability to solve problems. One more level up would be its ability, in general, to solve problems of a relatively new kind – a sort of second-order measure of problem-solving intelligence – which would of course call for some way of measuring “newness”.
The second way of talking about intelligence is in relative terms: Given some initial starting conditions of a system, how quickly can it learn to adapt in particular environments, and how far can it go in mastering that environment? In this case too we would have to have some measure of the complexity of the surroundings, the complexity of the initial system, and the complexity or power of the system at some point later in time. The amount of intelligence ascribed to the system would in this case be based on how quickly it moved from some initial knowledge level or architectural structure to a more powerful one, or conversely, the distance it moved, from relatively dumb to relatively intelligent, in a given time period. Another way to measure intelligence relatively would be to look at the ratio between the complexity of the system and the complexity of the world, and the ability of the system operate properly (achieve its goals) in that world. If one of the system's tasks is to sensibly create goals and subgoals, then that would of course also enter into the evaluation.
Neither the relative nor the static approach is better than the other – probably some combination of the two might become a foundation for a reasonably comprehensive and useful measure to assess the capabilities and levels of any artificial and natural intelligence.
A few more words about goals, which can be of many kinds. For any intelligent system there is a boundary outside of which the system is not expected to operate at all; for example, humans are not intended for breathing under water, and no matter how much we practice we do we have no hope of being able to learn that. We can call the range of things that the intelligence is supposed to address the scope of the system. Within its scope we can oftentimes identify a system's top-level goal, <m>G_{top}</m>, which is implicitly part of the system's operation. For the animal kingdom this is survival, for the purpose of creating offspring. Several sub-goals may be identified as necessarily belonging to, or being generated from, any top-level goal, but the top-level goal is a fundamental concept and has therefore a special name in biology and psychology, drive. For most intelligences we can assume that <m>G_{top}</m> is implicit, in that the system cannot help but work towards achieving that goal, and does not generally have to do any deliberation to do so. But there are other kinds of goals, and goals can be at very different levels (to take some examples, existential goals such as survival, task-level goals such as avoiding burning one's fingers when lighting a match, and internal operations goals such as trying to avoid thinking bad thoughts). While we could enter into a deep and detailed discussion on how to classify goals, the many uses and abuses of third-party goal ascription, and the potential pitfalls of relying too much on goal-based analysis when thinking about the phenomenon of intelligence, we can leave that discussion for now and assume that a moderate use of analysis and thinking based on the concept of goal structures will help us, rather than hinder us, in studying what intelligence is.
While it is now time to start thinking about the requirements for artificial general intelligence, we have by no means left the discussion of what intelligence is – quite the contrary, we are just beginning.
What is Understanding?
Understanding seems like a key component of intelligence: The better one understands something, the better equipped one is to deal with that something. The faster you are able to understand – “grasp” – something, the better equipped you are to deal with reality. Without understanding there is no telling what you will do in response to that something, probably the wrong thing!
So what is understanding? Understanding something is the potential to act towards a particular thing in a way that enables goals related to that thing to be successfully achieved. The goal could be explanation – some say you don't really understand something until you can explain it simply to others. That may be a good way to measure the level of understanding, but it's certainly not the only way. In any case, understanding something always means relating that particular something to your prior knowledge in a way that enables you to use it, act in relation to it, to achieve one or more goals involving that thing. By relating a particular fact, phenomenon, event, or object to what you already know – putting it in context with priorly learned knowledge – you are to some extent classifying this phenomenon or fact using your prior experience. For example, to see that a particular event was similar to another one is to say that some generalizations applicable to the old event might be relevant for the new event. To understand that a cog can hook to other cogs and is a kind of spinning object with an axis, you set yourself up for predicting that cogs are not meant to be free-rolling like tumbleweeds – something must hold them in place so that their teeth can interlock. Given even this meager level of understanding you may be able to outline the plans for a simple drive where one cogwheel turns another. But you are unlikely to be able to be able to construct a clock – to do that you need to understand more about the behavior and nature of cogwheels.
The more goals you are able to achieve, with respect to some phenomenon, event, concept, or object, the deeper your understanding is said to be of that thing. You understand something better if the goals you are able to achieve that involve that something are numerous and diverse, the fewer goals you can achieve and the more similar they are, the less you understand it.
Even for very simple pieces of knowledge, e.g. X is a bird; X lives in Iceland; X is on average 15 cm long, you don't have to collect very many before your prior knowledge allows you to make a vast amount of useful inferences and assumptions about the bird – you might be able to cook it for dinner, or at least be able to say whether a skilled chef might or might not be able to create a feast featuring this bird. You might even be able to identify precisely what type of bird this is, simply based on 6 or 8 such facts. In short, you would be able to understand the phenomenon X very well based on a few pieces of facts. This goes for all knowledge for which we already have a lot of related knowledge. For other things, that are far from what we already know, we may have to study them longer, gather more information about them, and experience them firsthand before we can cook them for dinner. But the process is the same – it just takes longer to select the right connections to prior knowledge to fill the gaps to the extent of being able to achieve goals related to those things.
2013©K.R.Thórisson