User Tools

Site Tools


T-720-ATAI-2019 Main
Links to Lecture Notes


Lecture Notes: Learning & Teaching

Learning: Means (The How)

What it is The acquisition of information in order to improve performance with respect to some Goal or set of Goals.
Learning from experience A method for learning. Also called “learning by doing”: An Agent A does action a to phenomenon p in context c and uses the result to improve its ability to act on Goals involving p. All higher-level Earth-bound intelligences learn from experience.
Learning by observation A method for learning. An Agent A learns how to achieve Goal G by receiving realtime information about some other Agent A' achieving Goal G by doing action a.
Learning from reasoning A method for learning. Using deduction, induction and abduction to simulate, generalize, and infer, respectively, new information from acquired information.
Most effectively used in combination with Learning from Experience.
Multi-objective learning Learning while aiming to achieve more than one Goal.
Transfer learning A method for learning faster. Applying already-acquired knowledge to a new or newish Problem.
System-wide ampliative learning What we could call a combination of all of the above.

Learning: Targets Categories (The What)

What is Being Learned Categories:
- Tool (body)
- Task-environment (the task at hand)
- Domain-bound strategies
- Domain-independent learning
- Domain-independent learning strategies (“cognitive development”)
Each one subsumes the ones above.
Tool (body) A controller needs to be embodied to affect the world; learning what the body does, irrespective of the task-environment, domain, or other issues.
Task-environment The proverbial “task” that an agent has been assigned in a particular environment.
Domain-bound strategies Strategies related to specific issues in the task-domain but learning may temporarily slow down learning the task-environment.
Domain-independent learning Refers to the concept of “learning to learn” - learning that is transferrable between domains.
Domain-independent learning strategies (“cognitive development”) Strategies that are so profound as to affect a learner's ability to learn particular domains. Otherwise known as “cognitive development”.

Learning: Data & Representation

DATAExample Learning Method
Stimulus-Response Reinforcement learning, association.
Time-Series Linked-list reinforcement learning, Markov chains.
2D Data ANNs of some sort.
3D Structures Models of some sort.
Complex Systems Models of some sort.

Key Learning Terms

What it is Learning is a process.
Key Features Inherits key features of any process:
- Purpose: To adapt, to respond in rational ways to problems / to achieve foreseen goals; this factor determines how the rest of the features in this list are measured
- Speed: The speed of learning
- Data: The data that the learning (and particular measured speed of learning) requires
- Quality: How well something is learned
- Retention: The robustness of what has been learned - how well it stays intact over time
- Transfer: How general the learning is, how broadly what is learned can be employed for the purposes of adaptation or achievement of goals
- Meta-Learning: A learner may improve its learning abilities - i.e. capable of meta-learning.
- Progress Signal(s): A learner needs to know how its learning is going, and if there is improvement, how much.
Measurements To know any of the above some parameters have to be measured: All of the above factors can be measured in many ways.
Major Caveat Since learning interacts with (is affect by) the task-environment and world that the learning takes place in, as well as the nature of these in the learner's subsequent deployment, none of the above features can be assessed by looking only at the learner.
This is further addressed below in the Pedagogical Pentagon.

Features of Learning: Measurement Methods

Purpose For closed problems (problems with clear goals), and for which the task-environment is known, the purpose of learning can often be readily specified.
In human education the purpose often gets conflated with the ways we test the learning (e.g. a child's learning of the alphabet is measured by its ability to recite the alphabet, rather than its ability to use it to search a dictionary efficiently).
In school the task-environment part of this equation often gets ignored.
Speed Can be measured during a learning session by measuring state of knowledge at time t_1 and again at time t_2.
We should expect any (general) learner to exhibit various learning speeds at various times for various topics.
For human education, not many methods have been developed to measure directly an individual's learning; this is mostly done implicitly (and very approximately!) by grouping learners by age.
Data There are many ways to classify data; too numerous to recount here. Suffice it to say that every problem and task brings with it a unique mixture of data; a general learner should be able to handle a broad spectrum of these.
For present ML systems we can e.g. say that DNNs can handle big continuous data, reinforcement learning is good for discrete-valued small data, but no good methods exist for diverse datasets with a mixture of symbolic, continuous, big- and small- data.
Quality Quality of learning has at least two dimensions, reliability and applicability.
Reliability refers to the learned behavior/material's consistent performance under repeated application; applicability refers to its correct application in relevant circumstances.
Retention A battery of tests administered at time t_1 and then again at times t_2, t_3, and t_4, would give an indication of a function describing the learner's retention of acquired material/skills/concepts.
Transfer How well something that has been verified to have been learned at time t_1 can be used by the learner subsequently (whether in identical situations, similar situations, or completely different situations).
Partly a function of Retention (Transfer would be zero if Retention is zero), as well as Quality and Reliability.
In human education, because the task-environment often gets ignored, transfer learning is seldom if ever evaluated.
Meta-Learning The rate at which Meta-Learning happens could possibly be measured by the frequency with which the learning changes its nature, either in terms of new things, concepts of phenomena that it now can handle (as a class) as opposed to before, or WRT orders of magnitude (measured somehow) changes in of knowledge acquisition on the other dimensions above (Speed, Data, Quality, Retention, and Transfer).
Progress Signal(s) For artificial learners these are known and thus do not have to be measured. For natural learners the main method for assessing what kinds of progress signals are allowed and/or needed, or which ones work best, is experimentation (e.g. food pellets for dogs); some methods are already well explored and documented for certain species of animals (e.g. classical conditioning and operant conditioning).

Terms Used for Describing Learning Styles

Reflex A reflex behavior is controlled by an (architecturally) fixed circuit (“arch”) that is not shaped by experience.
This defines the lower bound of natural learners' behavior: It is Not Learnable.
Reinforcement Learning Learning proceeds through discrete steps whereby a step is of the kind A-R pair, A being an action and R being a reward.

Observational Learning
Many animals have been observed to learn by observation (no pun intended). In some cases by observing members of the same species doing that particular thing (called “conspecifics”), in other cases by watching events unfold.
Observational Learning
Aka Structural-. Learning is restricted to the morphology (structure) of the movements and/or actions being observed. May be sufficient when learning to dance, but certainly less useful when learning to conduct an orchestra.
Goal-Level Observational
Learning includes learning the purpose for which the observed actions are performed.
Life-Long Learning Colloquially: Learning throughout one's lifetime.
In AI: A particular focus of learning research targeting how systems can change their learning over long periods of time. “Duration” doesn't refer to a particular number of hours or years but rather indicates the expectations on the system being engineered that it learn over long periods of time, “long” relative to prior such machine learners, and “long” relative to the system's operational lifetime.
Online Learning Aka “continuous”-. Learning while doing (same or other) things.
Multi-task Learning Aka “multi-goal”-. The same system learning many tasks/things without forgetting what was learned before.
Transfer Learning The ability to benefit from something already learned when learning something new.
Single-Shot Learning Aka “few-shot”-. The ability to learn something new from one example.
Cumulative Learning New things learned are integrated with things learned prior. The two are fused so as to create a more coherent, more easily-verifiable knowledge set.

Learning Paradigms

Learning From Input/Output Pairs
“Supervised Learning”
The ability to learn a mapping from inputs to outputs based on examples of input-output pairs. This requires having a way to perceive what the output to a particular input should have been.
No Feedback
“Unsupervised Learning”
The ability to learn patterns in the input even though no external feedback is given. Examples include clustering, anomaly detection and dimensionality reduction.
Learning from Rewards
“Reinforcement Learning”
The ability to learn from a series of (positive and negative) rewards. This is usually used to learn how to behave in multi-step control problems. It requires machinery to treat certain perceptions (i.e.~the rewards) as “special” and something to be optimized for.

Learning from Teaching
Can be done in a wide variety of ways, each of which might impose their own requirements on the AI architecture. For instance, imitation learning – the ability to learn behaviors by observing another agent carry them out – requires a deep understanding of the perceived actions to be imitated, meaning the system must not only be able to observe those actions, but also recognize those actions, map them to its own perspective and body, and possibly infer their intent.

Reinforcement Learning

Input Data Discrete variables.
Output Data Discrete variables.
Max. # I/O Vars. card_max(I union O) <= 8 , preferably
Min. Cycles Depends on the number of input and output variables.
Training On-task.
Training Style Turn-based; discrete time steps.
Training Signal Explicit, discrete, every turn (after each action).
Hyper-Parameters - Learning Rate
- Exploration vs. exploitation.

Max. Vars. limits what can be learned.
Discretization of I/O vars limits the types of vars. that can be handled.
Cannot handle n-modal distributions or conditional out-of-phase vars. unless known BILL (before it leaves the lab).
Task-environments must allow turn-based learning.
I/O vars., discretization, must be BILL.

Artificial Neural Nets (ANNs)

Input Data Continuous and discrete variables.
Output Data Continuous and discrete variables.
Max. # I/O Vars. Very high (limited by CPU and BW).
Min. Cycles Typical numbers 4k-10k.
Depends on data complexity and number of layers in the ANN.
Training Off-task. Learning turned off when fully trained.
Training Style Training phase BILL; discrete training steps.

Training Signal
Supervised learning: Explicit “error signal propagation” after every turn, generated from pre-categorized examples and outcomes.
Unsupervised: explicit “error signal propagation” after every turn, auto-generated.
Hyper-Parameters - Learning Rate and many others
Strengths Handles complex data sets.
Scalability Unpredictable behavior under data drift AILL.
Must be trained BILL (unpredictable learning AILL).

Experience-Based Learning

What It Is Learning is the acquisition of knowledge for particular purposes. When this acquisition happens via interaction with an environment it is experience-based.
Why It Is Important Any environment which cannot be fully known a-priori requires experimentation of some sort, in the form of interaction with the world. This is what we call experience.
The Real World The physical world we live in, often referred to as the “real world”, is highly complex, and rarely if ever do we have perfect models of how it behaves when we interact with it, whether it is to experiment with how it works or simply achieve some goal like buying bread.
Limited Time & Resources An important limitation on any agent's ability to model the real world is its enormous state space, which vastly outdoes any known agent's memory capacity, even for relatively simple environments. Even if the models were sufficiently detailed, pre-computing everything beforehand is prohibited due to memory. On top of that, even if memory would suffice for pre-computing everything and anything necessary to go about our tasks, we would have to retrieve the pre-computed data in time when it's needed - the larger the state space the more demands on retrieval times this puts.


What It Is Probability is the measure of the likelihood that an event will occur REF.
Why It Is Important Probability enters into our knowledge of anything for which the knowledge is incomplete. As in, everything that humans do every day in every real-world environment. With incomplete knowledge it is in principle impossible to know what may happen. However, if we have very good models for some limited phenomenon, we can expect our prediction of what may happen to be pretty good. This is especially true for knowledge acquired through the scientific method, in which empirical evidence and human reason is systematically brought to bear on the validity of the models.
How To Do It Most common method is Bayesian networks, which encode the concept of probability in which probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief REF. Which makes it ideal for representing an (intelligent) agent's knowledge of some environment, task or phenomenon lecture_notes_w5.
How It Works P(a|b)={P(b|a)P(a)}/{P(b)}
Judea Pearl Most Fervent Advocate of Bayesian Networks in AI REF.


What It Is A causal variable can (informally) be defined as a variable whose relationship with another variable is such that when changed it will change the other variable.
Example: A light switch is designed specifically to cause the light to turn on and off.
In a causal analysis based on abduction one may reason that, given that light switches don't tend to flip randomly, a light that was off but is now on may indicate that someone or something flipped the light switch. (The inverse - a light that was on but is now off - has a larger set of reasonable causes, in addition to someone turning it off, a power outage or bulb burnout.
Why It Is Important Causation is the foundation of empirical science. Without knowledge about causal relations it is impossible to get anything done.
History David Hume (1711-1776) is one of the most influential philosophers addressing the topic. From the Encyclopedia of Philosophy: ”…advocate[s] … that there are no innate ideas and that all knowledge comes from experience, Hume is known for applying this standard rigorously to causation and necessity.” REF
This makes Hume an empiricist.
More Recent History Causation has been cast by the wayside in statistics for the past 120 years, saying instead that all we can claim about the relationship of any variables is that they correlate. Needless to say this has lead to significant confusion as to what science can and cannot say about causal relationships, such as whether mobile phones cause cancer. Equally badly, the statistical stance has infected some scientific fields to view causation as “unscientific”.
State Of The Art Recent work by Judea Pearl demonstrates clearly the fallaciousness of the statistical stance, and fixes some important gaps in our knowledge on this subject which hopefully will rectify the situation in the coming years. YouTube lecture by J. Pearl on causation.
Correlation Supports Prediction While correlation is sufficient for prediction (if A and B correlate highly, then it does not matter if we see an A OR a B, we can predict that the other is likely on the scent.
Causation Supports Action We may know that A and B correlate, but if we want B to disappear we don't know whether to do that by modifying A or B, because we don't know if B is a result of A or vice versa.
Example: The position of the light switch and the state of the lightbulb correlate. Only by knowing that the light switch controls the bulb can we go directly to the switch if we want the light to turn on.
Causal Models Are Necessary To Guide Action While correlation gives us indication of causation, the direction of the “causal arrow” is critically necessary for guiding action.
Luckily, knowing which way the arrows point in any large set of correlated variables is usually not too hard to find out, by empirical experimentation.
Causation & Correlation What is the relation between causation and correlation?
There is no (non-spurious) correlation without causation.
There is no causation without correlation.
However, causation between two variables does necessitate one of them to be the cause of the other: They can have a shared (possibly hidden) common cause.

Cumulative Learning

What it Is Unifies several separate research tracks in a coherent form easily relatable to AGI requirements: Multitask learning, lifelong learning, transfer learning and few-shot learning.
Multitask Learning The ability to learn more than one task, either at once or in sequence.
The cumulative learner's ability to generalize, investigate, and reason will affect how well it implements this ability.
Subsumed by cumulative learning because knowledge is contextualized as it is acquired, meaning that the system has a place and a time for every tiny bit of information it absorbs.
Online Learning The ability to learn continuously, uninterrupted, and in real-time from experience as it comes, and without specifically iterating over it many times.
Subsumed by cumulative learning because new information, which comes in via experience, is integrated with prior knowledge at the time it is acquired, so a cumulative learner is always learning as it's doing other things.

Lifelong Learning
Means that an AI system keeps learning and integrating knowledge throughout its operational lifetime: learning is “always on”.
Whichever way this is measured we expect at a minimum the `learning cycle' – alternating learning and non-learning periods – to be free from designer tampering or intervention at runtime. Provided this, the smaller those periods become (relative to the shortest perception-action cycle, for instance), to the point of being considered virtually or completely continuous, the better the “learning always on” requirement is met.
Subsumed by cumulative learning because the continuous online learning is steady and ongoing all the time – why switch it off?
Robust Knowledge Acquisition The antithesis of which is brittle learning, where new knowledge results in catastrophic perturbations of prior knowledge (and behavior).
Subsumed by cumulative learning because new information is integrated continuously online, which means the increments are frequent and small, and inconsistencies in the prior knowledge get exposed in the process and opportunities for fixing small inconsistencies are also frequent because the learning is life-long; which means new information is highly unlikely to result in e.g. catastrophic forgetting.
Transfer Learning The ability to build new knowledge on top of old in a way that the old knowledge facilitates learning the new. While interference/forgetting should not occur, knowledge should still be defeasible: the physical world is non-axiomatic so any knowledge could be proven incorrect in light of contradicting evidence.
Subsumed by cumulative learning because new information is integrated with old information, which may result in exposure of inconsistencies, missing data, etc., which is then dealt with as a natural part of the cumulative learning operations.
Few-Shot Learning The ability to learn something from very few examples or very little data. Common variants include one-shot learning, where the learner only needs to be told (or experience) something once, and zero-shot learning, where the learner has already inferred it without needing to experience or be told.
Subsumed by cumulative learning because prior knowledge is transferrable to new information, meaning that (theoretically) only the delta between what has been priorly learned and what is required for the new information needs to be learned.

Theories of "Learning Styles"

VARK Theory that (human) learners can be divided according to their “learning styles”:
- Visual
- Auditory
- Reading
- Kinesthetic
Original idea based on skimpy observational evidence in the early 90s source

Gardner's Multiple Intelligence Model
- Linguistic intelligence (“word smart”)
- Logical-mathematical intelligence (“number/reasoning smart”)
- Spatial intelligence (“picture smart”)
- Bodily-Kinesthetic intelligence (“body smart”)
- Musical intelligence (“music smart”)
- Interpersonal intelligence (“people smart”)
- Intrapersonal intelligence (“self smart”)
- Naturalist intelligence (“nature smart”)
The Good Emphasizes that there are multiple ways of learning.
The Bad Very human-centric, not very applicable to AI.
The Ugly Not rooted in a well-grounded theory of learning that references key features of learning like memorization, understanding, knowledge transfer, retention, learning speed.
Bottom Line Many people view learning styles theories as broadly accurate, but, in fact, scientific support for these theories is severely lacking.

The Pedagogical Pentagon: A Framework for (Artificial) Pedagogy

What is Needed
We need a universal theory of learning – and in fact of teaching, training, task-environments, and evaluation.
Anything short of having complete theories for all of these means that experimentation, exploration, and blind search are the only ways to answer questions about performance, curriculum design, training requirements, etc., from which we can never get more than partial, limited answers.
The Pedagogical Pentagon (left) captures the five main pillars of any learning/teaching situation. The relationships between its contents can be seen from various perspectives: (a) As information flow between processes. (b) As relations between systems. (c ) As dependencies between (largely missing!) theories. REF

The Learner
Intelligent systems continually receive inputs/observations from their environment and send outputs/actions back. Some of the system’s inputs may be treated specially — e.g. as feedback or a reward signal, possibly provided by a teacher. Since intelligent action can only be called “intelligent” if it is trying to achieve something - against which the level of intelligence can be evaluated - we model intelligent agents as imperfect optimizers of some (possibly unknown) real-valued objective function.

Learning systems adjust their knowledge as a result of interactions with a task- environment. Defined by (possibly a variety of) objective functions, as well as (possibly) instructions (i.e. knowledge provided at the start of the task, e.g. as a “seed”, or continuously or intermittently throughout its duration). Since tasks can only be defined w.r.t. some environment, we often refer to the combination of a task and its environment as a single unit: the task-environment.

The goal of the teacher is to influence the learner’s task-environments in such a way that progress towards the learning goal is facilitated. The teacher’s teaching task is to change the learner’s knowledge in some way (e.g. to make the learner understand something, or increase the learner’s skill on some metric).
Environment & Task The learner and the teacher each interact with their own view of the world (i.e. their own “environments”) which are typically different, but overlapping to some degree.

Viewed from a teacher’s and intentional learner’s point of view, “training” means the actions taken (repeatedly) over time with the goal of becoming better at some task, by avoiding learning erroneous skills/things and avoid forgetting or unlearning desirable skills/things.

Testing - or evaluation - is meant to obtain information about the structural, epistemic and emergent properties of learners, as they progress on a learning task. Testing can be done for different purposes: e.g. to ensure that a learner has good-enough performance on a range of tasks, to identify strengths and weaknesses for an AI designer to improve or an adversary to exploit, or to ensure that a learner has understood a certain concept so that we can trust it will use it correctly in the future.
Source The Pedagogical Pentagon: A Conceptual Framework for Artificial Pedagogy by Bieger et al.

Artificial Pedagogy

What it is
The science about how to teach artificial learners.
- Focus on teaching rather than learning.
- Aimed at full spectrum of learning system.
- Emphasis towards AGI-aspiring systems.

Key Question
Given information about a learner, teaching goal and constraints
- What is the best way to teach?
- What teaching methods are there, and when are they applicable?
- How can we evaluate the learner and the teacher?
State of the Art No good scientific theory of teaching exists.

Why Artificial Pedagogy?
- Current machine teaching is ad hoc.
- Sophisticated teaching needed in complex domains.
- Sufficiently advanced learners now exist.
- Relevance will increase as AI field advances.

– minimal seed knowledge required
– precise

– natural
– adaptive
– on-the-fly
– can’t program everything

Teaching Methods
- Heuristic Rewarding
- Decomposition
- Simplification
- Situation Selection
- Teleoperation
- Demonstration
- Coaching
- Explanation
- Cooperation
- Socratic method

Artificial Pedagogy Tutoring Methods

Heuristic Rewards Giving the learner intermediate feedback about performance
Related: Reward shaping, Gamification, Heuristics in e.g. minimax game playing
RL example: Different reward for positive/negative step

Decomposition of whole, complex tasks into smaller components
Related: Whole-task vs. part-task training, Curriculum learning, (Catastrophic interference), (Transferlearning), (Multitask learning).
RL example: Sliding puzzle at goal location on grid.
Situation Selection Selecting situations (or data) for the learner to focs on, e.g. simpler or more difficult situations.
Related: Boosting, ML application development, big data, active learning.
RL Example: Start (or stop) in problematic states.

Temporarily taking control of the learner’s actions so they can experience them.
Applications: Tennis/golf/chess, Robot ping-pong, artificial tutor.
RL Example: Force good or random moves.

Showing the learner how to accomplish a task.
Requirements: Desire to imitate, ability to map tutor's actions onto own actions, generalization ability.
Related: Apprenticeship learning, inverse reinforcement learning, imitation learning.
RL Example: Nonexistent.

Giving the learner instructions of what action to take during the task.
Requirements: Ability to map language-based instruction onto actions, generalization ability.
Related: Supervised learning.
RL Example: Add input that specifies correct output.

Explaining to the learner how to approach certain situations before the starts (a new instance of) the task.
Requirements: Language capability, generalization ability.
Related: Imperative programming, analogies.
RL Example: Nonexistent.

Doing a task together with the learner to facilitate other tutoring techniques.
Socratic Method Asking questions to encourage critical thinking and guide the learner towards its own conclusions.
Related: Shaping, chaining.
RL Example: Nonexistent.
NARS Example:
> <dog –> mammal>.
> «$x –> mammal> –> <$x –> [breaths]».
> <{Spike} –> dog>.
> <{Spike} –> [breaths]>? main question
> <{Spike} –> mammal>?
helping question



/var/www/ailab/WWW/wiki/data/pages/public/t-720-atai/atai-19/lecture_notes_w5.txt · Last modified: 2020/03/25 18:29 by thorisson