Next revision | Previous revision |
public:t-720-atai:atai-22:understanding [2022/09/16 13:19] – created thorisson | public:t-720-atai:atai-22:understanding [2024/04/29 13:33] (current) – external edit 127.0.0.1 |
---|
[[/public:t-720-atai:atai-22:main|T-720-ATAI-2022 Main]] \\ | |
[[/public:t-720-atai:atai-22:Lecture_Notes|Links to Lecture Notes]] | [[public:t-720-atai:atai-22:main|T-720-ATAI-2022 Main]] \\ |
| [[public:t-720-atai:atai-22:Lecture_Notes|Links to Lecture Notes]] |
| |
| |
\\ | \\ |
\\ | \\ |
| ======UNDERSTANDING: Understanding, Curiosity, Creativity====== |
\\ | \\ |
=====SYMBOLS, MODELS, CAUSALITY: Knowledge Representation===== | |
\\ | \\ |
| \\ |
| |
| ===== Curiosity ====== |
| |
\\ | \\ |
\\ | \\ |
| |
| ====What is Curiosity?==== |
| |
| | What It Is | The tendency of a learner to seek out novel inputs that may not have any relevance to its currently active goals. | |
| | Why It Matters | Curiosity may be an inherent/inevitable feature of all intelligent systems that live in an uncertain environment: Because one of their top-level goals will always be self-preservation, and because they cannot fully predict what threats to their existence the future may hold, they are forced to collect information which //might// become useful at a later time. | |
| | Other Meanings | We sometimes call people "curious" who keep sticking their nose into things which may be relevant (or even irrelevant) to them but which societal norms consider outside their obvious "permissible" range of access - this is a different, more anthropocentric side of 'curiosity' which is less interesting for our purposes. | |
| |
==== Foundational Concepts ==== | |
| Data | Measurement. | | |
| Information | Data that can be / is used or formatted for a purpose. | | |
| Knowledge | A set of interlinked information that can be used to plan, produce action, and interpret new information. | | |
| Thought | The drive- and goal-driven processes of a situated knowledge-based agent. | | |
\\ | \\ |
| |
====Representation==== | ====Why Curiosity?==== |
| \\ What it is | A way to encode data/measurements. \\ A representation is what you have when you pick something to stand for something else, like the lines and squiggles forming the word "cup" used in particular contexts to **represent** (implicate, point to) an object with some features and properties. | | |
| //All knowledge \\ used for \\ intelligent action \\ must contain \\ representation// | \\ This follows from the facts that \\ (a) anything that references something but is //not// that thing itself contains some sort //representation of//, that thing; \\ (b) knowledge that does not inform action - i.e. cannot be used to //control// - is not knowledge, and \\ ( c) 'intelligence' requires consistent and persistent //informed action.// | | |
| \\ What it Involves | A particular process (computation, thought) is given a particular pattern (e.g. the text "cup" or the word "cup" uttered -- or simply by the form of the light falling on a retina, at a particular time in a particular context) that acts as a "pointer" to an //internal representation//, an information structure that is rich enough to answer questions about a particular phenomenon that this "pointer" pattern points to, without having to perform any other action than to manipulate that information structure in particular ways. | | |
| \\ Why it is Important | **Mathematically**: \\ With the amount of information in the physical world vastly outnumbering the ability of any system to store it all in a lookup table, methods for information storage and retrieval with greater compression are needed. \\ **Historically**: \\ - The founding fathers of AI spoke frequently of //representations// in the first three decades of AI research. \\ - //Skinnerian psychology// and //Brooksian AI// are both "representation-free" methodologies in that they de-emphasized representation (but failed to offer theory that eliminated them). Brooksian AI largely "outlawed" the concept of representation from AI from the mid-80s onward. \\ - Post 2000s: The rise of ANNs has helped continue this trend. | | |
| \\ \\ **Good \\ Regulator \\ Theorem** | \\ Meanwhile, Conant & Ashby's //Good Regulator Theorem// proved (yes, //proved//) that \\ \\ //every good controller ("regulator") of a system **must** be a **model** of that system//. \\ \\ [[http://pespmc1.vub.ac.be/books/Conant_Ashby.pdf|Good Regulator Paper]] \\ \\ | | |
| Why That Matters | A //model// is by definition a //representation// (of the thing that it is a model of) - an information structure. \\ This means that any theory of intelligence //**must**// not only rely on the concept of representation, it also //**must**// rely on the concept of models. | | |
| \\ Bottom Line | AGI is unlikely to be achieved without sophisticated methods for representation of complex things, and sophisticated methods for their creation, manipulation, and management. \\ **This is the role of a //cognitive architecture.//** | | |
| |
| | Why Are We Talking About Curiosity? | \\ Curiosity is a really great term for a very complex systemic phenomenon of significant importance to AGI: Motivation. | |
| | \\ Motivation | A learner without "internal motivation" will not have any reason to learn anything - we call it 'internal motivation' because it is a mechanism (complex or simple) of the cognitive architecture itself (without which "nothing would happen"), that gives an agent a tendency to act in a certain way in certain circumstances (and possibly: in general). | |
| | \\ How is Motivation Programmed? \\ \\ **Drives** | Fundamental motivation is not something that a learner can learn (unless we assume that as it's "born" there is something in the environment to program that in; assuming something that highly-specific exists and is available in the environment is not a good strategy for ensuring survival if the creature is intended to grow cognitively in a predictable way). \\ The way that nature does this is to provide newborns with some sort of impetus to act in certain ways in certain situations, e.g. cry when hungry. This works most of the time because all living creatures have parents. \\ We call internal motivational factors //**drives**//. | |
| | \\ Baby Machines | General learners can learn over their lifetime vastly larger amounts of knowledge than they are born with. Such machines are sometimes called 'baby machines'. The drives of baby machines typically must change over their lifetime, especially if they are very good and general learners. \\ In psychology this is called //cognitive development//. \\ Very few - if any - AI systems exist that have demonstrated such a capability.[1] But some form of cognitive development is probably unavoidable in any powerful learning scheme, because what motivational mechanisms you need when you know very little are likely to be very different from useful motivational mechanisms that work well when you know a lot (when you have learned most of the fundamental principles of how your world works, your old learning mechanisms are unlikely to be as efficient or relevant as they were in the beginning). | |
| | Footnote | [1] Mind you, it should not be too hard to create a system that //appears// to demonstrate cognitive development, just as it isn't difficult to write a for-loop called "thinking". The mechanisms demonstrated in a real cog-dev system should also demonstrate the //need// for such a capacity, and that they happen //autonomously// in the learning process that the system implements. | |
\\ | \\ |
==== Symbols ==== | \\ |
| ===== Creativity ====== |
| \\ What are Symbols? | Peirce's Theory of Semiotics (signs) proposes 3 parts to a sign: \\ **a //sign/symbol//, an //object//, and an //interpretant//**. \\ Example of symbol: an arbitrary pattern, e.g. a written word (with acceptable error ranges whose threshold determine when it is either 'uninterpretable' or 'inseparable from other symbols'). \\ Example of object: an automobile (clustering of atoms in certain ways). \\ Example of interpretant: Your mind as it experiences something in your mind's eye when you read the word "automobile". The last part is the most complex thing, because obviously what //you// see and what //I// see when we read the word "automobile" will never be exactly the same. | | |
| Do Symbols Carry Meaning Directly? | No. Symbols are initially meaningless arbitrary patterns, and without an interpretant they are also meaningless. \\ What gives them the ability to //carry// meaning (see below) is a mutual //contract// between two communicators (or, more strictly, and encoding-decoding process pair). | | |
| \\ "Symbol" | Peirce used various terms for this, including "sign", "representamen", "representation", and "ground". Others have suggested "sign-vehicle". What is meant in all cases is that a pattern that can be used to stand for something else, and thus requires an interpretation to be used as such. | | |
| Peirce's Innovation | Detaching the symbol/sign from the object it signified, and introducing the interpretation process as a key entity. This makes it possible to explain why people misunderstand each other, and how symbols and meaning can grow and change in a culture. | | |
| |
\\ | \\ |
==== Where the Symbols 'Are' ==== | \\ |
| |
| ====What is Creativity?==== |
| |
| | \\ The Word | The word 'creativity' has many meanings. \\ The simplest meaning is typically that "you're creative if you think of something that nobody else thought of". \\ A better meaning in our context is the ability of an intelligent system to produce non-obvious solutions to problems. \\ Creativity is about **producing** something. | |
| | Why it's Important | Ultimately we want creative machines. It is difficult to tease apart the concepts of intelligence and creativity: It is hard to imagine a great intelligence that is not creative. Likewise, it is also difficult to imagine a creative agent that is also not intelligent. | |
| | Creativity Without Intelligence? | The relation between creativity may not be bijective: while it is difficult to imagine a highly intelligent system that is not creative, it is not AS difficult to imagine an (artificial) system that is creative but not intelligent. This is especially true if we assume there are other (natural) intelligences around to make sense of what this "non-intelligent creative system" produces. | |
| | Are Only Artists Creative? | Short answer: No. \\ Longer answer: The word "creativity" has many meanings, and is used in everyday language in numerous ways. | |
| | \\ How it is Measured | Creativity is always measured with respect to some goal: If I just "do something" how could anyone tell if I am creative? It is only when I tell you what my goal was (and even better, if I show you what others did with respect to that goal) that you can say for sure whether what I did qualifies as "creative" in your view. (Jackson Pollock was not creative because he splattered paint onto canvas, his work was creative because of the context in which it was done.) \\ Is creativity always subjective? It probably doesn't have to be, but until we have good theories of goals, actions, and tasks, it will probably remain a rather loose concept. | |
| |
| Words | A word - e.g. "chair" - is not the symbol itself; the word is a **token** that can //act// as symbol (to someone, in some circumstance). | | |
| An Example | "That is not a chair!" vs. "Ahh.. that is a //great// chair!" | | |
| Symbolic Role | Being a "symbol" means serving a **function**. In this case the token stands as a "pointer" to **information structures**. | | |
| Tokens as Symbols | The association of a token with a set of information structures is //arbitrary// - if we agree to call "chairs" something else, e.g. "blibbeldyblabb", well, then that's what we call "chairs" from now on. "Go ahead, take a seat on the bibbeldyblabb over there". | | |
| \\ Temporary vs. Permanent \\ Symbols | As you can see in the blibbeldyblabb example, we can associate arbitrary patterns with "thoughts" temporarily, just for fun. In this case we took something familiar (chair) and associated a new pattern ("blibbeldyblabb") with it, replacing an old one ("chair"). Because it was familiar, this was very easy to do. \\ When we see something completely unfamilar and someone says "oh, that's just a blibbleldyblabb" it usually takes longer to know the scope and usage of that new "concept" that we are learning - even thought using the name is easy, it may take a while to learn how to use it properly. \\ A culture can use a coherent set of symbols (with compositional rules we call 'grammar') because they get established through mutual coordination over long periods of time. This is also why languages change: Because people change and they change their usage of language also. | | |
| \\ Context | Using the token, these information structures can be collected and used. But their ultimate meaning depends on the **context** of the token's use. \\ When you use a token, **which information structures** are rounded up, and how they are used, depends on more than the token... | | |
| \\ What Are These \\ Information Structures? | They have to do with all sorts of **experience** of the world. \\ In the case of chairs this would be experience collected, compressed, abstracted and generalized from indoor environments in relation to the physical object we refer to as 'chair' (//a lot// of information could be relevant at any point in time - color, shape, size, usage, manufacturing, destruction, material properties, compositions into parts, ... the list is very long! - which ones are relevant //right now// depends on the //context//, and context is determined primarily by the current state and which //goals// are currently active at this moment). | | |
| Models & Symbols | Both are representations - but //models contain more than symbols;// if symbols are **pointers** models are **machines**. | | |
\\ | \\ |
| |
| ==== Creativity & AI ==== |
| |
====Meaning==== | | Examples of creative machines | Do good examples of creative machines exist? | |
| | Aaron | http://prostheticknowledge.tumblr.com/post/20734326468/aaron-the-first-artificial-intelligence-creative \\ https://www.youtube.com/watch?v=3PA-XApZkso | |
| | \\ Thaler's \\ Creativity Machine | http://www.imagination-engines.com \\ CM patented in 1994 \\ A few years later: the CM makes an invention that gets a patent from the United States Patent & Trademark Office (USPTO) \\ [[http://patft.uspto.gov/netacgi/nph-Parser?Sect2=PTO1&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&d=PALL&RefSrch=yes&Query=PN%2F5659666|CM patent]] \\ What it is: ANN, becomes "creative" by "relaxing some parameters" so that the ANN "begins to hallucinate". | |
| | Are these machines creative \\ ? | Maybe - in some sense of the concept. \\ Are they //truly// creative? Probably not. \\ How so? What does it mean to be "truly creative"? \\ That would be the **full monty**: The ability to see unique, valuable solutions to a wide range of challenging problems better than others. | |
| | Creativity is \\ a Relative Term | It is somewhat unavoidable to interpret the concept of 'creativity' as a relative term - i.e. "person (or system) X is more creative than person (or system) Y", as no absolute scale for it exists, as of yet. (It is possible that AI / AGI research may one day develop such a scale.) | |
| | Intermediate Conclusion | To answer the question "Do creative machines exist?" we must inspect the concept of creativity in more detail. | |
| |
| \\ Meaning | Philosophers are still grappling with the topic of "meaning", and it is far from settled. It is highly relevant to AI, especially AGI - an AGI that cannot extract the meaning of a joke, threat, promise, or explanation - to some level or extent - is hardly worthy of its label. | | |
| \\ Current Constructivist Approach | Meaning rests on many principles. Two main ones that could be called "pillars" are. \\ Firstly, acquired and tested //models// that form a graph of relations; the comprehensiveness of this graph determines the level of understanding that the models can support with respect to a particular phenomenon. \\ This means that //meaning cannot be generated without (some level of) understanding//. \\ Secondly, meaning relies on the //context// of the usage of symbols, where the context is provided by (a) who/what uses the symbols, (b) in what particular task-environment, using ( c) particular //syntactic constraints//. | | |
| \\ Production of Meaning | Meaning is produced //on demand//, based on the //tokens// used and //contextual// data. If you are interpreting language, //syntax// also matters (syntax is a system of rules that allows serialization of tokens.) \\ How important is context? Seeing a rock roll down a hill and crushing people is very different if you are watching a cartoon than if you're in the physical world. This is pretty obvious when you think about it. | | |
| \\ Example of Meaning Production | In the case of a chair, you can use information about the **functional aspects** of the chairs you have experienced if you're, say, in the forest and your friend points to a flat rock and says "There's a chair". Your knowledge of a chair's function allows you to **understand** the **meaning** of your friend's utterance. You can access the material properties of chairs to understand what it means when your friend says "The other day the class bully threw a chair at me". And you can access the morphological properties of chairs to understand your friend when she says "Those modern chairs - they are so 'cool' you can hardly sit in them." | | |
| \\ Communication Process | When two agents communicate there are three parts that create meaning: The encoding process of the meaning communicator, the resulting //message//, and the interpreter or //decoding communicator//. Getting the message from one agent to the other is called the //transmission//. However, the "meaning" does not "reside in the message", it emerges from (is **generated** during) the whole **encoding-transmission-decoding process**. | | |
| Syntactic Constraints | Rules that a decoder-encoder pair share about how symbols are allowed to be strung together to create relational graphs which may have particular purposes in particular situations. | | |
| \\ Prerequisites for using symbols | A prerequisite for communication are thus shared knowledge (object - referred concepts, i.e. //sets// of models), and shared encoding and interpretation methods: How syntax is used. And last but not least, shared //cultural// methods for how to handle //context// (background knowledge) including missing information. \\ This last point has to do with the tradeoff between compactness and potential for misunderstanding (the more compact, the more there is a danger of misinterpretation; the less compact, the longer it takes to communicate. | | |
| \\ What About 'Signs from the Gods'? | Is stormy ocean weather a "sign" that you should not go rowing in your tiny boat? \\ No, not directly, but the situation makes use of exactly the same machinery: The weather is a pattern. That pattern has implications for your goals - in particular, your goal to live, which would be prevented if you were to drown. Stormy weather has potential for drowning you. When someone says "this weather is dangerous" the implication is the same as looking out and seeing it for yourself, except that the //arbitrary patterns// of speech are involved in the first instance, but not the second. | | |
| \\ Prediction Creates Meaning | Hearing the words "stormy weather" or seeing the raging storm, your models allow you to make predictions. These predictions are compared to your active goals to see if any of them will be prevented; if so, the storm may make you stay at home. In which case its meaning was a 'threat to your survival'. \\ In the case where you //really really// want to go rowing, even stormy weather may not suffice to keep you at home - depending on your character or state of mind you may be prone to make that risk tradeoff in different ways. | | |
| \\ Models | When interpreting symbols, syntax and context, information structures are collected and put together to form //composite models// that can be used for computing the meaning. By //meaning// we really mean (no pun intended) the //implications// encapsulated in the //**now**//: What may come next and how can goals be impacted? | | |
\\ | \\ |
| |
====So, What Are Models?==== | ==== Theories & Definitions ==== |
| |
| \\ Model | A model of something is an information structure that behaves in some ways like the thing being modeled. \\ ‘Model’ here actually means exactly the same as the word when used in the vernacular — look up any dictionary definition and that is what it means. A model of //something// is not the thing itself, it is in some way a ‘mirror image’ of it, typically with some unimportant details removed, and represented in a way that allows for various //manipulations// for the purpose of //making predictions// (//answering questions//), where the form of allowed manipulations are particular to the representation of the model and the questions to be answered. | | | \\ Attempts at Definitions | **1.** In its simplest sense it is the //ability to produce solutions to problems//. - This meaning treats it as a single continuous dimension (or many that may be collapsed into one) along which we simply put a threshold for when we will classify something as "creative". \\ **2.** A more complex version references in some way the complexity of a problem, such that //solutions that address the problem in a better way (other things being equal) or achieve a similar solution with less cost// (other things being equal) are more //creative// than others. \\ **3.** In reference to some sort of "obviousness", //a solution to a problem may be more creative if it is "less obvious"//, with respect to some population, time, society, education, etc. In this case a more creative agent is one that repeatedly uncovers solutions that are "less obvious" - even to itself. This definition is relative to the knowledge it operates on (which is a good thing, because it removes the reference to background knowledge): Out of a set of processes <m>P</m> that can produce solutions to problems from a set of knowledge <m>K</m>, the process <m>p \in P</m> that reliably and repeatedly uncovers valid solutions with a small or no intersection with the output from the others is "more creative" than the others. \\ (The trouble with this approach is that the difference between these processes might also be construed as knowledge, in which case we cannot in principle keep the knowledge constant.) | |
| \\ Example | A model of Earth sits on a shelf in my daughter’s room. With it I can answer questions about the gross layout of continents, and names assigned to various regions (as they were around 1977 - because that’s when I got it :-) ). A model requires a //process for using// it. In this example that process is humans that can read and manipulate smallish objects. \\ The nature, design, and limitations of these processes determine in part what can be done with the models. | | | \\ Schmidhuber's theory of creativity | [[http://people.idsia.ch/~juergen/creativity.html|Schmidhuber's theory of creativity]]. \\ The observer's //learning process causes a reduction of the subjective complexity of the data, yielding a temporarily high derivative of subjective beauty:// a temporarily steep learning curve. \\ The current predictor / compressor of the observer or data creator tries to compress his history of (e.g. acoustic and other) inputs where possible (whatever you can predict you can compress as you don't have to store it extra). The action selector tries to find history-influencing actions such that the continually-growing historic data allows for improving the performance of the predictor / compressor. The interesting or aesthetically rewarding (e.g. musical and other) sub-sequences are precisely those with previously unknown yet learnable types of regularities, because they lead to compressor improvements. The boring patterns are those that are either already perfectly known or arbitrary or random, or whose structure seems too hard to understand. | |
| Computational Models | A typical type of question to be answered with computational (mathematical) models are what-if questions, and a typical method of manipulation is running simulations (producing deductions). Along with this we need the appropriate computational machine. | | |
| \\ Model (again) | A 'model' in this conception has a target phenomenon that it applies to, and it has a form of representation, comprehensiveness, and level of detail; these are the primary features that determine what a model is good for. A computational model of the world in raw machine-readable form is not very efficient for a human to quickly identify all the countries adjacent to Switzerland - for that a traditional globe is much better. For a machine, the exact opposite is true. Which means: The available //mechanisms for manipulating the models// matters! | | |
| |
\\ | \\ |
| |
| ==== Creativity & Understanding ==== |
| |
| |
==== Symbols, Models, Syntax ==== | | Understanding | To consistently solve problems regarding a phenomenon <m>X</m> requires //understanding // <m>X</m>. \\ Understanding <m>X</m> means the ability to extract and analyze the //meaning// of any phenomena <m>\phi</m> related to <m>X</m>. | |
| What Now? | Here comes some "glue" for connecting the above concepts, ideas, and claims in a way that unifies it into a coherent story that explains intelligence. | | | Meaning | Meaning is closely coupled with understanding -- the two cannot exist without the other. Are they irreducible? | |
| \\ \\ Knowledge | Knowledge is "actionable information" - information structures that can be used to //do stuff//, including \\ (a) predict (deduce), \\ (b) derive potential causes (abduce - like Sherlock Holmes does), \\ ( c) explain, and \\ (d) re-create (like Einstein did with <m>E=mc^2</m>). | | | Bottom Line | Can't talk about creativity without talking about understanding, and can't talk about understanding without talking about meaning. No good scientific theory of meaning (but some philosophical ones) exists. | |
| \\ Knowledge \\ = \\ Models | Sets of models allow a thinking agent to do the above, by \\ (a) finding the relevant models for anything (given a certain situation and active goals), \\ (b) apply them according to the goals to derive predictions, \\ ( c) selecting the right actions based on these predictions such that the goals can be achieved, and \\ (d) monitoring the outcome. \\ (Learning then results from correcting the models that predicted incorrectly.) | | |
| \\ What's Contained \\ in Models? | To work as building blokcs for knowledge, models must, on their own or in sets, capture in some way: \\ - Patterns \\ - Relations \\ - Volitional acts \\ - Causal chains | | |
| Where Do The Symbols Come In? | Symbols are mechanisms for rounding up model sets - they are "handles" on the information structures. \\ In humans this "rounding up" happens subconsciously and automatically, most of the time, using similarity mapping (content-driven association). | | |
| \\ Syntactic Autonomy | To enable autonomous thought, the use of symbols for managing huge sets of models must follow certain rules. For determining the development of biological agents, these rules - their syntax - must exist in form //a priori// of the developing, learning mind, because it determines what these symbols can and cannot do. In this sense, "syntax" means the "rules of management" of information structures (just like the use of symbols in human communication). | | |
| \\ Historical Note | Chomsky claimed that humans are born with a "language acquisition device". \\ What may be the case is that the language simply sits on top of a more general set of "devices" for the formation of knowledge //in general//. | | |
| Evolution & Cognition | Because thought depends on underlying biological structures, and because biological structure depends on ongoing maintenance processes, the syntax and semantics for creating a biological agent, and the syntax and semantics for generating meaningful thought in such an agent, both depend on //syntactic autonomy// - i.e. rules that determine how the referential processes of **encode-transmit-decode** work. | | |
| |
\\ | |
====Closure==== | |
| \\ Operational Closure | \\ What it is | The ability of a system to map input from the environment to output in pursuit of its purpose autonomously - i.e. its //autonomous operation//. \\ In terms of goal-directed learning controllers, it is the ability to create and appropriately apply **efficient-cause models**. | | |
| | Why it is important | Life and cognition depend on it. | | |
| Organizational Closure | What it is | The ability of a system to maintain its own structure autonomously, in light of perturbations. | | |
| | Why it is important | Life depends on it. | | |
| | Subsumed By | Operational closure. | | |
| Semantic Closure | What it is | The ability of a system to generate meaning autonomously for itself. \\ For a goal-directed learning controller this is the ability to model self and environment. | | |
| | Why it is important | Cognition depends on it. | | |
| | Subsumed By | Organizational closure. | | |
\\ | \\ |
| |
====Reasoning==== | ====Reasoning==== |
| |
| What It Is | The establishment of axioms for the world and applying logic to these. | | | What It Is | The establishment of axioms for the world and applying logic to these. \\ (Creating reasonable assumption and using these as a basis for applying logic according to given rules.) | |
| Depends On | Semantic closure. | | | \\ But The World Is Non-Axiomatic ! | Yes. But there is no way to apply logic unless we hypothesize some pseudo-axioms. The only difference between this and mathematics is that in science we must accept that the so-called "laws" of physics may be only conditionally correct (or possibly even completely incorrect, in light of our goal of figuring out the "ultimate" truth about how the universe works). | |
| But The World Is Non-Axiomatic ! | \\ Yes. But there is no way to apply logic unless we hypothesize some pseudo-axioms. The only difference between this and mathematics is that in science we must accept that the so-called "laws" of physics may be only conditionally correct (or possibly even completely incorrect, in light of our goal of figuring out the "ultimate" truth about how the universe works). | | |
| Deduction | Results of two statements that logically are necessarily true. \\ //Example: If it's true that all swans are white, and Joe is a swan, then Joe must be white//. | | | Deduction | Results of two statements that logically are necessarily true. \\ //Example: If it's true that all swans are white, and Joe is a swan, then Joe must be white//. | |
| Abduction | Reasoning from conclusions to causes. \\ //Example: If the light is on, and it was off just a minute ago, someone must have flipped the switch//. | | | \\ Abduction | Reasoning from conclusions to (likely) causes. \\ //Example: If the light is now on, but it was off just a minute ago, someone must have flipped the switch//. \\ Note that in the reverse case different abductions may be entertained, because of the way the world works: //If the light is off now, and it was on just a minute ago, someone may have flipped the switch OR a fuse may have been blown.// | |
| Induction | Generalization from observation. \\ //Example: All the swans I have ever seen have been white, hence I hypothesize that all swans are white//. | | | \\ Induction | Generalization from observation. \\ //Example: All the swans I have ever seen have been white, hence I hypothesize that all swans are white//. \\ Induced knowledge can always be refuted by new evidence. This is the general principle of empirical science. | |
| | Analogy | The ability to find similarity between even the most disparate phenomena. | |
| | \\ Why This Matters | Logic is one of the most effective ways to compress information. Reasoning is the process of applying logic to information according to rules. Because of the high ratio of possible states in the physical world to the storage capacity of the human (and machine) mind/memory, it is not conceivable that understanding ("//true// knowledge" - i.e. useful, reliable knowledge) of a large amount of phenomena in the physical world can be achieved without the use of reasoning. | |
| | \\ How is Reasoning Applied in Understanding & Creativity? | Based on knowledge about objects, parts and relations between these, as well as transformation rules by which these behave and interact, one can \\ \\ - construct predictions for what will happen next (predictive control), \\ - abduce what may have happened before (how the world got to where it is - constructing explanations), \\ - determine what to do next (make plans), and \\ - make analogies - drawing parallels to create hypotheses about novel things. \\ \\ It is in particular this last item (not in isolation with the others but in tandem with them) that is a very useful tool for producing novel insights (i.e. "being creative"). | |
| |
| \\ |
| |
| |
| ===== Understanding ===== |
\\ | \\ |
| |
====System & Architectural Requirements for Using Models==== | \\ |
| Model Acquisition | ≈ model generation: The ability to create models of (observed) phenomena. | | ====In the Vernacular==== |
| Effectiveness | Creation of models must be effective - otherwise a system will spend too much time creating useless or bad models. \\ Making the model creation effective may require e.g. parallelizing the execution of operations on them. | | | What It Is | A concept that people use all the time about each other's cognition. With respect to achieving a task, given that the target of the understanding is all or some aspects of the task, more of it is generally considered better than less of it. | |
| Efficiency | Operations on models listed above must be efficient lest they interfere with the normal operation of the system / agent. \\ One way to achieve temporal efficiency is to parallelize their execution, and make them simple. | | | Why It Is Important | Seems to be connected to "real intelligence" - when a machine does X reliably and repeatedly we say that it is "capable" of doing <m>X</m> qualify it with "... but it doesn't 'really' understand what it's doing". | |
| Scalability | For any moderately interesting / complex environment, a vast number of models may be entertained and considered at any point in time, and thus a large set of //potential// models must be manipulatable by the system / agent. | | | What Does It Mean? | No well-known scientific theory exists. \\ Normally we do not hand control of anything over to anyone who doesn't understand it. All other things being equal, this is a recipe for disaster. | |
| | Evaluating Understanding | Understanding any <m>X</m> can be evaluated along four dimensions: \\ 1. Being able to predict <m>X</m>, \\ 2. being able to achieve goals with respect to <m>X</m>, \\ 3. being able to explain <m>X</m>, and \\ 4. being able to "re-create" <m>X</m> \\ ("re-create" here means e.g. creating a simulation that produces X and many or all its side-effects.) | |
| |
\\ | \\ |
| |
| ====It Used To Be Called "Common Sense" (in AI circles)==== |
| |
| | Status of Understanding in AI | Since the 70s the concept of //understanding// has been relegated to the fringes of AI research. The only AI contexts the term regularly appears in are "language understanding", "scene understanding" and "image understanding". | |
| | What Took Its Place | What took the place of understanding in AI is //common sense//. Unfortunately the concept of common sense does not necessarily capture at all what we generally mean by "understanding". | |
| | Projects | The best known project on common sense is the CYC project, which started in the 80s and is apparently still going. It is the best funded, longest running AI project in history. | |
| | Main Methodology | The foundation of CYC is formal logic, represented in predicate logic statements and compound structures. | |
| | Key Results | Results from the CYC project are similar to the expert systems of the 80s - these systems are brittle and unpredictable. \\ The state of the CYC system in 2016 is provided in this nicely written essay: [[https://www.technologyreview.com/s/600984/an-ai-with-30-years-worth-of-knowledge-finally-goes-to-work/|REF]]. | |
| | \\ Two Problems | Upon further scrutiny, no good analysis or arguments exist of why 'understanding' should be equated with 'common sense'. The two are simply not the same thing. \\ Furthermore, progress under the rubric of 'common sense' in AI has neither produced any grand results nor evidence that the methodology followed is a promising one. And it certainly doesn't seem to have inspired fresh ideas in a very long time. | |
| \\ |
| |
| ==== A Scientific Theory of Understanding ==== |
| |
| | \\ Why Do We Need One \\ ? | Normally we do not hand over control of anything to anyone who //doesn't understand it//. Other things being equal, this is a recipe for disaster. \\ If we want machines to control ever-more complex processes, we //must// give them some level of understanding of what they are doing. \\ We need to build systems that we can trust. \\ We cannot trust an agent that doesn't understand what its doing or the context it operates in. | |
| | Cumulative Understanding | A scientific theory of understanding proposed by K.R.Thórisson et al. \\ [[http://alumni.media.mit.edu/~kris/ftp/AGI16_understanding.pdf|About Understanding]] by Thórisson et al. | |
| | Why a Scientific Theory \\ ? | A scientific theory, as opposed to a philosophical one, proposes actionable principles for //how to construct the phenomenon in question//. \\ An AI system is a //control// system: It affects the world in some way. | |
| | What Does It Mean \\ for AI? | Understanding is a prerequisite for being trustworthy. AI that //truly understands// has met this prerequisite. With a scientific theory of understanding can build //trustworthy AI//. | |
| |
\\ | \\ |
| |
| ====Theory of Cumulative Understanding==== |
| |
| | What It Is | The only theory of understanding in the field of AI; only scientific theory of understanding (AFAIK). | |
| | In a Nutshell | Understanding involves the manipulation of causal-relational models (e.g. those in the AERA AGI-aspiring architecture - see next set of lecture notes). | |
| | \\ Phenomenon \\ <-> \\ Model | Phenomenon <m>Phi</m>: Any group of inter-related variables in the world, some or all of which can be measured.\\ Models <m>M</m>: A set of information structures that reference the variables of <m>Phi</m> and their relations <m>R</m> such that they can be used, applying processes <m>P</m> that manipulate <m>M</m>, to \\ (a) predict, \\ (b) achieve goals with respect to, \\ (c ) explain, and \\ (d) (re-)create <m>Phi</m>. | |
| | \\ Definition of Understanding | An agent's **understanding** of a phenomenon <m>Phi</m> to some level <m>L</m> when it posses a set of models <m>M</m> and relevant processes <m>P</m> such that it can use <m>M</m> to \\ (a) predict, \\ (b) achieve goals with respect to, \\ (c ) explain, and \\ (d) (re-)create <m>Phi</m>. \\ \\ Insofar as the nature of relations between variables in <m>Phi</m> determine their behavior, the level <m>L</m> to which the phenomenon <m>Phi</m> is understood by the agent is determined by the //completeness// and the //accuracy// to which <m>M</m> matches the variables and their relations in <m>Phi.</m> | |
| | \\ Why \\ 'Cumulative' \\ ? | According to the theory, 'understanding' is a learning process: A dynamic process that creates knowledge of a particular functional kind, namely the kind that can be used to predict, achieve goals, explain and (re-)create (see the point above in this table). \\ Unlike 'learning', however, in the vernacular the term 'understanding' is also used to describe a //state// - the state reached when the process completes (the output of 'learning' is 'knowledge' - the output of the 'understanding' is '//an// understanding'). We can see the difference if we replace 'understanding' with the term 'know': "A understands X" vs. "A //knows// X". \\ The term 'cumulative' emphasizes the 'learning' part of understanding - a //process of creation//. \\ This generation process is key to human understanding because \\ (a) it is used many times a day - every time we learn anything new: It is an integral part of the //learning process// proper, and \\ (b) it a //general// process of knowledge acquisition that builds knowledge graphs incrementally, from experience, relying on reasoning processes as well as existing knowledge, and \\ ( c) the processes that //build// the knowledge and those that //use it// are the //**same **// processes. | |
| | REF | [[http://alumni.media.mit.edu/~kris/ftp/AGI16_understanding.pdf|About Understanding]] by Thórisson et al. | |
\\ | \\ |
| |
| |
| |
| ==== Self-Explaining Systems ==== |
| |
| | What It Is | The ability of a controller to explain, after the fact or before, why it did something or intends to do it. | |
| | 'Explainability' \\ ≠ \\ 'self-explanation' | If an intelligence X can explain a phenomenon Y, Y is 'explainable' by Y, through some process chosen by Y. \\ \\ In contrast, if an intelligence X can explain itself, its own actions, knowledge, understanding, beliefs, and reasoning, it is capable of self-explanation. The latter is stronger and subsumes the former. | |
| | Why It Is Important | If a controller does something we don't want it to repeat - e.g. crash an airplane full of people (in simulation mode, hopefully!) - it needs to be able to explain why it did what it did. If it can't, it means it - and //we// - can never be sure of why it did what it did, whether it had any other choice, whether it is likely to do it again, whether it's an evil machine that actually meant to do it, or even how likely it is to do it again. | |
| | \\ Human-Level AI | Even more importantly, to grow and learn and self-inspect the AI system must be able to sort out causal chains. If it can't it will not only be incapable of explaining to others why it is like it is, it will be incapable of explaining to itself why things are the way they are, and thus, it will be incapable of sorting out whether something it did is better for its own growth than something else. Explanation is the big black hole of ANNs: In principle ANNs are black boxes, and thus they are in principle unexplainable - whether to themselves or others. \\ One way to address this is by encapsulating knowledge as hierarchical models that are built up over time, and can be de-constructed at any time (like AERA does). | |
| |
\\ | \\ |
\\ | \\ |
| |
| |
//2022(c)K.R.Thórisson// | |
| |
| \\ |
| \\ |
| \\ |
| \\ |
| |
| //2022(c)K.R.Thórisson// |