Table of Contents

DCS-T-713-MERS-2023 Main
Lecture Notes



Learning & Knowledge




Key Learning Terms

What it is Learning is a process that has the purpose of generating actionable information, a.k.a. knowledge.



Key Features
Inherits key features of any process:
- Purpose: To adapt, to respond in rational ways to problems / to achieve foreseen goals; this factor determines how the rest of the features in this list are measured.
- Speed: The speed of learning.
- Data: The data that the learning (and particular measured speed of learning) requires.
- Quality: How well something is learned.
- Retention: The robustness of what has been learned - how well it stays intact over time.
- Transfer: How general the learning is, how broadly what is learned can be employed for the purposes of adaptation or achievement of goals.
- Meta-Learning: A learner may improve its learning abilities - i.e. capable of meta-learning.
- Progress Signal(s): A learner needs to know how its learning is going, and if there is improvement, how much.
Evaluation To know any of the above some parameters have to be measured somehow: All of the above factors can be measured in many ways.
Major Caveat Since learning interacts with (is affect by) the task-environment and world that the learning takes place in, as well as the nature of these in the learner's subsequent deployment, none of the above features can be assessed by looking only at the learner.
This is addressed by the Pedagogical Pentagon (see below).


Measurement, Data, Information, Knowledge

Measurement Sampling of a value of one or more variables over a particular temporal interval.
(Often simplified by considering it coming from a “point” in time.)

Data
Stored, committed-to measurement.
Anything that can be measured can be stored as data. Measurement takes time, and so does storing it as data. Data is therefore always old.
To be of any use it must contain how the measurement was made and when.
For instance, [FI399 17:00 KEF].

Information
Data that is stored in a particular way for a particular purpose. Contextualized data.
In some sense, all data is information, because the data must be stored in some way, and the particular way will be better suited for some purpose than others. However, we typically only speak of “information” if there is something to the data format beyond simply the value measured and the time of measurement.
For instance, the time of departure for your flight from Keflavik airport, FI399, is at 17:00 today.

Knowledge
Actionable information. Information that can be used to get stuff done.
A set of interlinked information that can be used to plan, produce action, and interpret new information.
“Multi-purpose” information (that can be applied in many ways to many situations) – requires specialized mechanisms for manipulating the information in a context-sensitive way (i.e. reasoning methods).
Representation To be accessed after a measurement has been made, it must be represented somehow. The way it is represented has an effect on how it can be used. This is why representation is a key topic in AI.


The Pedagogical Pentagon

What is Needed There exists no universal theory of learning – nor of teaching, training, task-environments, and evaluation.
This means that experimentation, exploration, and blind search are the only ways to answer questions about a learner's performance, curriculum design, training requirements, etc., and that we can never get more than partial, limited answers to such questions.
That Said… The Pedagogical Pentagon captures the five pillars of education: Learning, Teaching, Training, Environments, and Testing.
It's not a theory, but rather, a conceptual framework for capturing all key aspects of education.
The Pedagogical Pentagon (left) captures the five main pillars of any learning/teaching situation. The relationships between its contents can be seen from various perspectives: (a) As information flow between processes. (b) As relations between systems. (c ) As dependencies between (largely missing!) theories. REF

Tasks
Learning systems adjust their knowledge as a result of interactions with a task- environment. Defined by (possibly a variety of) objective functions, as well as (possibly) instructions (i.e. knowledge provided at the start of the task, e.g. as a “seed”, or continuously or intermittently throughout its duration). Since tasks can only be defined w.r.t. some environment, we often refer to the combination of a task and its environment as a single unit: the task-environment.

Teacher
The goal of the teacher is to help a learner learn. This is done by influencing the learner’s task-environment in such a way that progress towards the learning goals is facilitated. Teaching, as opposed to training, typically involves information about the What, Why & How:
- What to pay attention to.
- Relationships between observables (causal, part-whole, etc.).
- Sub-goals, negative goals and their relationships (strategy).
- Background-foreground separation.
Environment & Task The learner and the teacher each interact with their own view of the world (i.e. their own “environments”) which are typically different, but overlapping to some degree.
Training Viewed from a teacher’s and intentional learner’s point of view, “training” means the actions taken (repeatedly) over time with the goal of becoming better at some task, by avoiding learning erroneous skills/things and avoid forgetting or unlearning desirable skills/things.

Test
Testing - or evaluation - is meant to obtain information about the structural, epistemic and emergent properties of learners, as they progress on a learning task. Testing can be done for different purposes: e.g. to ensure that a learner has good-enough performance on a range of tasks, to identify strengths and weaknesses for an AI designer to improve or an adversary to exploit, or to ensure that a learner has understood a certain concept so that we can trust it will use it correctly in the future.
Source The Pedagogical Pentagon: A Conceptual Framework for Artificial Pedagogy by Bieger et al.


Learning Controllers


A Learner
Adaptive/intelligent system/controller, embodied and situated in a task-environment, that continually receives inputs/observations (measurements) from its environment and sends outputs/actions back (signals to its manipulators).
Some of the learner’s inputs may be treated specially — e.g. as feedback or a reward signal, possibly provided by a teacher or a specially-rigged training task-environment. Since action can only be evaluated as “intelligent” in light of what it is trying to achieve - we model intelligent agents as imperfect optimizers of some (possibly unknown) real-valued objective function.
Note that this working definition fits experience-based learning.
Embodiment The interface between a learning controller and the task-environment.


Experience-Based Learning

What It Is Learning is the acquisition of knowledge for particular purposes. When this acquisition happens via interaction with an environment it is experience-based.
Why It Is Important Any environment which cannot be fully known a-priori requires experimentation of some sort, in the form of interaction with the world. This is what we call experience.

The Real World
The physical world we live in, often referred to as the “real world”, is highly complex, and rarely if ever do we have perfect models of how it behaves when we interact with it, whether it is to experiment with how it works or simply achieve some goal like buying bread.

Limited Time & Resources
An important limitation on any agent's ability to model the real world is its enormous state space, which vastly outdoes any known agent's memory capacity, even for relatively simple environments. Even if the models were sufficiently detailed, pre-computing everything beforehand is prohibited due to memory. On top of that, even if memory would suffice for pre-computing everything and anything necessary to go about our tasks, we would have to retrieve the pre-computed data in time when it's needed - the larger the state space the more demands on retrieval times this puts.
Why Experience-Based Learning is Relevant Under LTE (limited time and energy) in a plentiful task-environment it is impossible to know everything all at once, including causal relations. Therefore, most of the time an intelligent agent capable of some reasoning will be working with uncertain assumptions where nothing is certain, only some things are more probable than others.

Bottom Line
The physical world has infinite variety that cannot be catalogued beforehand.
As a result the fundamental rules of the world are not known and that there is a guarantee of uncertainty.
This means that for any learner in the physical world the learning will be non-axiomatic.


Probability


What It Is
Probability is a concept that is relevant to a situation where information is missing, which means it is a concept relevant to knowledge.
A common conceptualization of probability is that it is a measure of the likelihood that an event will occur REF.
If it is not know whether event X will be (or has been) observed in situation Y or not, the probability of X is the percentage of time X would be observed if the same situation Y occurred an infinite number of times.

Why It Is Important
in AI
Probability enters into our knowledge of anything for which the knowledge is incomplete.
As in, everything that humans do every day in every real-world environment.
With incomplete knowledge it is in principle impossible to know what may happen. However, if we have very good models for some limited (small, simple) phenomenon, we can expect our prediction of what may happen to be pretty good, or at least practically useful. This is especially true for knowledge acquired through the scientific method, in which empirical evidence and human reason is systematically brought to bear on the validity of the models.
How To Compute Probabilities Most common method is Bayesian networks, which encode the concept of probability in which probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief REF. Which makes it useful for representing an (intelligent) agent's knowledge of some environment, task or phenomenon.

Beyes' Theorem

A,B:=events
P(A/B):=probability of A given B is true
P(B/A):=probability of B given A is true
P(A),P(B):=the independent probabilities of A and B
Judea Pearl Most fervent advocate (and self-proclaimed inventor) of Bayesian Networks in AI REF.

Conceptualization of Probability in This Course
It is useful, in the context of this course, to think about probability as 'that which is not fully known':
The World contains a mechanism, M, whose operation is not fully known, K(M)<n (where n=[0,1] and n=1 is full knowledge). The part m which is unknown about M implies that predictions, statements, and actions that involve M, if repeated, will be reliable only part of the time. As such repetitions tend to infinity, m tends to the percentage of M that is unknown, representing the probability that any single statement, prediction and action about M is reliable.
Determinism The idea of Platonic cause-effect relations; a deterministic relationship between A and B means that there exists a universal guarantee for this relationship, for all eternity, of the inevitable and unbreakable kind.
“Adequate Determinism” A philosophical label for the kind of 'determinism' found in the physical world.
See the Adequate Determinism blurb on The Information Philosopher


Causation

What It Is A Platonic model for the relationship between two or more variables, whereby changes in one always precede changes in the other.


In More Detail
A causal relationship between variables A,B can be defined as a relationship such that changes in A always lead to a change in B and where the timing relationship is such that the former happens always before the latter, forall{A,B}:t(A)<t(B).
Example: A light switch is designed specifically to cause the light to turn on and off.
In a causal analysis based on abduction one may reason that, given that light switches don't tend to flip randomly, a light that was off but is now on may indicate that someone or something flipped the light switch. (The inverse - a light that was on but is now off - has a larger set of reasonable causes, in addition to someone turning it off, a power outage or bulb burnout.)
Why It Is Important
in Science
Causation is the foundation of empirical science. Without knowledge about causal relations it is impossible to get anything systematically done.

Why It Is Important
in AI
The main purpose of intelligence is to figure out how to get new stuff done, given limited time and energy (LTE), i.e. to get stuff done cheaply but well.
To get stuff done means knowing how to produce effects.
In this case reliable methods for getting stuff done are worth more to an intelligence than unreliable ones.
A relationship that approximates a Platonic cause-effect is worth more than one that does not.

History
David Hume (1711-1776) is one of the most influential philosophers addressing the topic. From the Encyclopedia of Philosophy: “…advocate[s] … that there are no innate ideas and that all knowledge comes from experience, Hume is known for applying this standard rigorously to causation and necessity.” REF
This makes Hume an empiricist.

More Recent History
Causation has been cast by the wayside in statistics for the past 120 years, saying instead that all we can claim about the relationship of any variables is that they correlate. Needless to say this has lead to significant confusion as to what science can and cannot say about causal relationships, such as whether mobile phones cause cancer. Equally badly, the statistical stance has infected some scientific fields to view causation as “unscientific”.
Spurious Correlation Non-zero correlation due to complete coincidence.

Causation & Correlation
What is the relation between causation and correlation?
There is no (non-spurious) correlation without causation.
There is no causation without correlation.
However, correlation between two variables does not necessitate one of them to be the cause of the other: They can have a shared (possibly hidden) common cause.


Causation & AI

Correlation Supports Prediction Correlation is sufficient for simple prediction (if A and B correlate highly, then it does not matter if we see an A OR a B, we can predict that the other is likely on the scene).

Knowledge of Causation Supports Action
We may know that A and B correlate, but if we don't know whether B is a result of A or vice versa, and we want B to disappear, we don't know whether it will suffice to modify A.
Example: The position of the light switch and the state of the light bulb correlate. Only by knowing that the light switch controls the bulb can we go directly to the switch if we want the light to turn on.
Causal Models
Are Necessary To Guide Action
While correlation gives us indication of causation, the direction of the “causal arrow” is critically necessary for guiding action.
Luckily, knowing which way the arrows point in any large set of correlated variables is usually not too hard to find out, by empirical experimentation.
Judea Pearl Most Fervent Advocate of causality in AI, and the inventor of the Do Calculus.
C.f. Bayesianism and Causality, or, Why I am Only a Half-Beyesian.

State Of The Art
Recent work by Judea Pearl demonstrates clearly the fallaciousness of the statistical stance, and fixes some important gaps in our knowledge on this subject which hopefully will rectify the situation in the coming years.
YouTube lecture by J. Pearl on causation.


Controlled Experiment

What is it? A fairly recent research method, historically speaking, for testing hypotheses / theories
Why is it Important? The most reliable way humanity has found to create reliable sharable knowledge.
Why is it Relevant Here? Like individual learning, it involves dealing with new phenomena and figuring them out.

Are the Methods Identical?
Very similar. Science is a systematized, organized approach to what individuals do when they learn. This involves:
- Exploring
- Hypothesizing
- Trying out
- Recording results
- Unifying and consolidating related information
- Creating coherent stories about what has been found (theorizing)
Bottom line The most powerful mechanism for generating reliable knowledge known to mankind has a lot in common with A(G)I.


Properties of the Learning Process

(may be set and measured by learner, designer, and/or the world itself)


Purpose
For closed problems (problems with clear goals), and for which the task-environment is known, the purpose of learning can often be readily specified.
In human education the purpose often gets conflated with the ways we test the learning (e.g. a child's learning of the alphabet is measured by its ability to recite the alphabet, rather than its ability to use it to search a dictionary efficiently).
In school the task-environment part of this equation often gets ignored.

Speed
Can be measured during a learning session by measuring state of knowledge at time t1 and again at time t2.
We should expect any (general) learner to exhibit various learning speeds at various times for various topics.
For human education, not many methods have been developed to measure directly an individual's learning; this is mostly done implicitly (and very approximately!) by grouping learners by age.

Data
There are many ways to classify data; too numerous to recount here. Suffice it to say that every problem and task brings with it a unique mixture of data; a general learner should be able to handle a broad spectrum of these.
For present ML systems we can e.g. say that DNNs can handle big continuous data, reinforcement learning is good for discrete-valued small data, but no good methods exist for diverse datasets with a mixture of symbolic, continuous, big- and small- data.

Quality
Quality of learning has at least two dimensions, reliability and applicability.
Reliability refers to the learned behavior/material's consistent performance under repeated application; applicability refers to its correct application in relevant circumstances.

Retention
A battery of tests administered at time t1 and then again at times t2, t3, and t4, would give an indication of a function describing the learner's retention of acquired material/skills/concepts.

Transfer
How well something that has been verified to have been learned at time t1 can be used by the learner subsequently (whether in identical situations, similar situations, or completely different situations).
Partly a function of Retention (Transfer would be zero if Retention is zero), as well as Quality and Reliability.
In human education, because the task-environment often gets ignored, transfer learning is seldom if ever evaluated.

Meta-Learning
The rate at which Meta-Learning happens could possibly be measured by the frequency with which the learning changes its nature, either in terms of new things, concepts of phenomena that it now can handle (as a class) as opposed to before, or WRT orders of magnitude (measured somehow) changes in of knowledge acquisition on the other dimensions above (Speed, Data, Quality, Retention, and Transfer).

Progress Signal(s)
For artificial learners these are known and thus do not have to be measured. For natural learners the main method for assessing what kinds of progress signals are allowed and/or needed, or which ones work best, is experimentation (e.g. food pellets for dogs); some methods are already well explored and documented for certain species of animals (e.g. classical conditioning and operant conditioning).


Cumulative Learning Through Reasoning

What it Is Unifies several separate research tracks in a coherent form easily relatable to AGI requirements: Multitask learning, lifelong learning, transfer learning and few-shot learning.

Multitask Learning
+
The ability to learn more than one task, either at once or in sequence.
The cumulative learner's ability to generalize, investigate, and reason will affect how well it implements this ability.
Subsumed by cumulative learning because knowledge is contextualized as it is acquired, meaning that the system has a place and a time for every tiny bit of information it absorbs.

Online Learning
+
The ability to learn continuously, uninterrupted, and in real-time from experience as it comes, and without specifically iterating over it many times.
Subsumed by cumulative learning because new information, which comes in via experience, is integrated with prior knowledge at the time it is acquired, so a cumulative learner is always learning as it's doing other things.

Lifelong Learning
+
Means that an AI system keeps learning and integrating knowledge throughout its operational lifetime: learning is “always on”.
Whichever way this is measured we expect at a minimum the `learning cycle' – alternating learning and non-learning periods – to be free from designer tampering or intervention at runtime. Provided this, the smaller those periods become (relative to the shortest perception-action cycle, for instance), to the point of being considered virtually or completely continuous, the better the “learning always on” requirement is met.
Subsumed by cumulative learning because the continuous online learning is steady and ongoing all the time – why switch it off?

Robust Knowledge Acquisition
+
The antithesis of which is brittle learning, where new knowledge results in catastrophic perturbations of prior knowledge (and behavior).
Subsumed by cumulative learning because new information is integrated continuously online, which means the increments are frequent and small, and inconsistencies in the prior knowledge get exposed in the process and opportunities for fixing small inconsistencies are also frequent because the learning is life-long; which means new information is highly unlikely to result in e.g. catastrophic forgetting.

Transfer Learning
+
The ability to build new knowledge on top of old in a way that the old knowledge facilitates learning the new. While interference/forgetting should not occur, knowledge should still be defeasible: the physical world is non-axiomatic so any knowledge could be proven incorrect in light of contradicting evidence.
Subsumed by cumulative learning because new information is integrated with old information, which may result in exposure of inconsistencies, missing data, etc., which is then dealt with as a natural part of the cumulative learning operations.

Few-Shot Learning
The ability to learn something from very few examples or very little data. Common variants include one-shot learning, where the learner only needs to be told (or experience) something once, and zero-shot learning, where the learner has already inferred it without needing to experience or be told.
Subsumed by cumulative learning because prior knowledge is transferrable to new information, meaning that (theoretically) only the delta between what has been priorly learned and what is required for the new information needs to be learned.





2023©K.R.Thórisson