Table of Contents

DCS-T-713-MERS-2023 Main
Lecture Notes



Empirical Reasoning (2)




Uncertainty in Physical Worlds

What it is In a dynamic world with a large number of elements and processes, presenting infinite combinatorics, knowing everything is impossible and thus predicting everything is also impossible.
Stems From —— Unknown Things / Phenomena ——  
Variable Values E.g. we know it will eventually rain, but not exactly when.
Variables E.g. a gust of wind that hits us as we come around a skyscraper's corner.
Goals of Others E.g. when we meet someone in the street and move to our right, but they also move in that direction (to their left), at which point we move to our left, but they move to their right, etc., for a sequence of synchronized stalemate.
Imprecision in Measurements E.g. the position of your car on the road relative to other cars and the boundaries of the road.
—— Unknowable Things / Phenomena —— 
Chains of Events E.g. for most things which are not possible (or utterly impractical) to measure, for any given time period.
Living Things E.g. bacteria, before they were hypothesized and observable through a microscope.
Values Beyond Measurement E.g. everything outside the reach of our senses and for which no alternative measurement mechanisms are available (e.g. telephone, telescope, microscope, etc.).
Infinite Combinatorics Since there is a large number of atomic elements (building blocks), many of which no-one knows about, and an infinite number of combinations that these can create, it is impossible to know any and every way in which the world may organize itself.

Axioms of the Universe
Since agents in the physical world, even those that are extremely intelligent and knowledgeable, the very operation of their minds depends on universe and its operating principles. Even if they were to figure out the actual and complete set of rules that govern the universe, they would have to step outside of the universe to verify that it was so. But if that were possible – if they could step outside of the universe to verify that these rules were the complete and correct ruleset that governs the universe, what world would they step into? This would in essence be proof that these rules are not the complete set governing the universe, because there is another world that they can step into.


Signal & Noise

Modeling the World A fundamental method in engineering is to model dynamic systems as part “signal” and part “noise” – the former is what we have a good handle on, so we can turn it into a 'signal', and the latter is what we (currently) are unable to model, making it random (hence 'noise').
Infinite Worlds as 'signal' & 'noise' In artificial general intelligence it can be useful to think of knowledge in the same way: Anything for which there exists good models (read: useful knowledge) we look at as 'signal' and anything else (which looks more or less random to us) is 'noise'.
Mind as 'Engineer' or 'Scientist' In this view the mind is the engineering trying to get a better handle on the noise in the world, by proposing better (read: more useful) models.
The model creation in an intelligent system is essentially induction, i.e. the creation of imagined explanations for how the world hangs together, resulting in the minds experience of it.

Engineer or Scientist?
In this view, is a general intelligence more like an engineer or a scientist? A scientist produces theories of the world, while an engineer uses theories to do measurements and meet certain requirements in going about the world.
A general intelligence is both rolled into one: It must be able to create theories of the world while also measuring it, sometimes at the same time – we could say that learning and cognitive development is more like the process of the scientific enterprise and using your acquired (useful) knowledge is more like the process of engineering.


Methods For Dealing With Uncertainty

Model Creation Model creation from experience is the key method for dealing with uncertainty.
Models, combined with reasoning, can produce generalized information structures that can be used for several purposes, including prediction, explanation, planning, goal selection, classification, and many other cognitive activities.

Reasoning
Aka “rule creation” aka “generalization”.
'Induction' is another term for generalization, but it's not only reasoning through induction that matters - deduction, abduction and analogy (all defeasible, non-axiomatic) that uncertainty handling relies on.
Rules About Rules Reasoning allows rules to be hierarchical – creating rules about rules (also called 'metarules'. This makes organization of rules become more practical.

Hierarchy
By organizing knowledge in a hierarchy, or better yet multi-dimensional hierarchies, a learner can sort through, prioritize, and select the appropriate level of detail for any situation.
An example of this is that the rule “the same object can only be in one place for any particular period” has a higher priority than the rule “my mom comes home from work around 4 pm”, should there be any doubt about her spatial position.
Causal Models A key method for dealing with uncertainty is to create models/rules about causal relations.
Only one method for causal-relational model creation is known.
Causal Model Creation The only known method for creating causal models is the combined forward-backward chaining (deduction-abduction, respectively), where the same model is tested in its usefulness for supporting both abduction and deduction. Any model which works in both directions, or better than another model, is closer to being a good (useful) representation of causal relations.


Backward & Forward Chaining in Production Systems

Matching Rules are matched to conditions by matching – if a rule matches is found to match a pattern in a particular dataset, the rule fires. When a rule fires it means that its statements will be executed.

Production System
Sometimes 'production system' is used as a synonym to 'reasoning system'. However, while both are rule-based, reasoning systems often come with requirements and limitations that production systems are (typically) not subject to. 'Production systems' is a larger set of systems than 'reasoning systems', but the strictest sense of 'reasoning system' (e.g. first-order logic) is not part of the set of 'production systems'.

Forward Chaining
Uses matching to produce what might happen next, after a particular state is reached, starting with existing data, until an endpoint (typically a goal) is reached. The resulting chain of events can represent a predicted successful plan or simply a predicted chain of events.
Forward chaining starts with a particular premise, e.g. the here-and-now, and proceeds to trace the cause-effect chain, through pattern matching, until the end-point is reached.
In AI this is used to produce predictions.
An example is a chain of dominos: If the first domino falls, the second domino falls, which makes the third one fall, etc. The premise is the line of dominos, spaced less than the length of one domino apart, and the effect that a falling domino has on a free-standing domino that it falls on.

Backward Chaining
Starts with a given goal or state to be achieved, and proceeds through matching to produce what could possibly have been the state just prior to that particular state. A goal-driven reasoning method for inferring unknown truths from known conclusions (goal) by moving backward from a solution to determine the initial conditions and rules.
In AI this method can be used for producing plans.
An example could be producing an answer to the question “How can I make the last domino fall?”
BW+FW Chaining Combination Backward chaining is often applied in artificial intelligence (AI) and may be used along with its counterpart, forward chaining. 


Guided Experimentation for New Knowledge Generation


Experimenting on the World
Knowledge-guided experimentation is the process of using one's current knowledge to create more knowledge. When learning about the world, random exploration is by definition the slowest and most ineffective knowledge creation method; in complex worlds it may even be completely useless due to the world's combinatorics. (If the ratio of complexity to lack of knowledge guidance is too high, no learning can take place.)
Strategic experimentation for knowledge generation involves conceiving actions that minimize energy and time while optimizing the exclusion of families of hypotheses about how the world works.
Inspecting One's Own Knowledge Inspection of knowledge happens via reflection – the ability to apply learning mechanisms to the processes and content of one's own mind. Reflection enables a learner to set itself a goal, then inspect that goal, producing arguments for and against that goal's features (usefulness, justification, time- and energy-dependence, and so on…). In other words, reflection gives a mind a capacity for meta-knowledge.
Cumulative Learning Learning that is always on and improves knowledge incrementally over time.


Cumulative Learning

What it Is Learning where several separate learning goals are unified into a single holistic learning system: Multitask learning, lifelong learning, transfer learning and few-shot learning.
(Research on learning typically separates these and works on a single aspect at a time.)
Unifying All of These

Lifelong Learning
Means that an AI system keeps learning and integrating knowledge throughout its operational lifetime: learning is “always on”.
Whichever way this is measured we expect at a minimum the `learning cycle' – alternating learning and non-learning periods – to be free from designer tampering or intervention at runtime. Provided this, the smaller those periods become (relative to the shortest perception-action cycle, for instance), to the point of being considered virtually or completely continuous, the better the “learning always on” requirement is met.
Subsumed by cumulative learning because the continuous online learning is steady and ongoing all the time – why switch it off?

Online Learning
The ability to learn continuously, uninterrupted, and in real-time from experience as it comes, and without specifically iterating over it many times.
Subsumed by cumulative learning because new information, which comes in via experience, is integrated with prior knowledge at the time it is acquired, so a cumulative learner is always learning as it's doing other things.

Multitask Learning
The ability to learn more than one task, either at once or in sequence.
The cumulative learner's ability to generalize, investigate, and reason will affect how well it implements this ability.
Subsumed by cumulative learning because knowledge is contextualized as it is acquired, meaning that the system has a place and a time for every tiny bit of information it absorbs.

Robust Knowledge Acquisition
The antithesis of which is brittle learning, where new knowledge results in catastrophic perturbations of prior knowledge (and behavior).
Subsumed by cumulative learning because new information is integrated continuously online, which means the increments are frequent and small, and inconsistencies in the prior knowledge get exposed in the process and opportunities for fixing small inconsistencies are also frequent because the learning is life-long; which means new information is highly unlikely to result in e.g. catastrophic forgetting.

Transfer Learning
The ability to build new knowledge on top of old in a way that the old knowledge facilitates learning the new. While interference/forgetting should not occur, knowledge should still be defeasible: the physical world is non-axiomatic so any knowledge could be proven incorrect in light of contradicting evidence.
Subsumed by cumulative learning because new information is integrated with old information, which may result in exposure of inconsistencies, missing data, etc., which is then dealt with as a natural part of the cumulative learning operations.

Few-Shot Learning
The ability to learn something from very few examples or very little data. Common variants include one-shot learning, where the learner only needs to be told (or experience) something once, and zero-shot learning, where the learner has already inferred it without needing to experience or be told.
Subsumed by cumulative learning because prior knowledge is transferrable to new information, meaning that (theoretically) only the delta between what has been priorly learned and what is required for the new information needs to be learned.


Self-Explaining Systems

What It Is The ability of a controller to explain, after the fact or before, why it did something or intends to do it.
'Explainability'

'self-explanation'
If an intelligence X can explain a phenomenon Y, Y is 'explainable' by Y, through some process chosen by Y.

In contrast, if an intelligence X can explain itself, its own actions, knowledge, understanding, beliefs, and reasoning, it is capable of self-explanation. The latter is stronger and subsumes the former.

Why It Is Important
If a controller does something we don't want it to repeat - e.g. crash an airplane full of people (in simulation mode, hopefully!) - it needs to be able to explain why it did what it did. If it can't, it means it - and we - can never be sure of why it did what it did, whether it had any other choice, whether it is likely to do it again, whether it's an evil machine that actually meant to do it, or even how likely it is to do it again.

Human-Level AI
Even more importantly, to grow and learn and self-inspect the AI system must be able to sort out causal chains. If it can't it will not only be incapable of explaining to others why it is like it is, it will be incapable of explaining to itself why things are the way they are, and thus, it will be incapable of sorting out whether something it did is better for its own growth than something else. Explanation is the big black hole of ANNs: In principle ANNs are black boxes, and thus they are in principle unexplainable - whether to themselves or others.
One way to address this is by encapsulating knowledge as hierarchical models that are built up over time, and can be de-constructed at any time (like AERA does).





2023©K.R.Thórisson