| |
public:t_720_atai:atai-18:lecture_notes_w7 [2018/10/05 08:34] – [Reinforcement Learning] thorisson | public:t_720_atai:atai-18:lecture_notes_w7 [2024/04/29 13:33] (current) – external edit 127.0.0.1 |
---|
| What it is | The acquisition of information in order to improve performance with respect to some Goal or set of Goals. | | | What it is | The acquisition of information in order to improve performance with respect to some Goal or set of Goals. | |
| Learning from experience | A method for learning. Also called "learning by doing": An Agent <m>A</m> does action <m>a</m> to phenomenon <m>p</m> in context <m>c</m> and uses the result to improve its ability to act on Goals involving <m>p</m>. All higher-level Earth-bound intelligences learn from experience. | | | Learning from experience | A method for learning. Also called "learning by doing": An Agent <m>A</m> does action <m>a</m> to phenomenon <m>p</m> in context <m>c</m> and uses the result to improve its ability to act on Goals involving <m>p</m>. All higher-level Earth-bound intelligences learn from experience. | |
| Learning by obser on | A method for learning. An Agent <m>A</m> learns how to achieve Goal <m>G</m> by receiving realtime information about some other Agent <m>A'</m> achieving Goal <m>G</m> by doing action <m>a</m>. | | | Learning by observation | A method for learning. An Agent <m>A</m> learns how to achieve Goal <m>G</m> by receiving realtime information about some other Agent <m>A'</m> achieving Goal <m>G</m> by doing action <m>a</m>. | |
| Learning from reasoning | A method for learning. Using deduction, induction and abduction to simulate, generalize, and infer, respectively, new information from acquired information. \\ Most effectively used in combination with Learning from Experience. | | | Learning from reasoning | A method for learning. Using deduction, induction and abduction to simulate, generalize, and infer, respectively, new information from acquired information. \\ Most effectively used in combination with Learning from Experience. | |
| Multi-objective learning | Learning while aiming to achieve more than one Goal. | | | Multi-objective learning | Learning while aiming to achieve more than one Goal. | |
| Input Data | Discrete variables. | | | Input Data | Discrete variables. | |
| Output Data | Discrete variables. | | | Output Data | Discrete variables. | |
| Max. # Vars. | Preferably <m>card_max(I union O) <= 8 </m> | | | Max. # I/O Vars. | <m>card_max(I union O) <= 8 </m> , preferably | |
| Min. Cycles | Depends on the number of input and output variables. | | | Min. Cycles | Depends on the number of input and output variables. | |
| Training | On-task. | | | Training | On-task. | |
| Input Data | Continuous and discrete variables. | | | Input Data | Continuous and discrete variables. | |
| Output Data | Continuous and discrete variables. | | | Output Data | Continuous and discrete variables. | |
| Max. I/O Vars. Cardinality | Very high (limited by CPU and BW). | | | Max. # I/O Vars. | Very high (limited by CPU and BW). | |
| Min. Cycles | Typical numbers 4k-10k. \\ Depends on data complexity and number of layers in the ANN. | | | Min. Cycles | Typical numbers 4k-10k. \\ Depends on data complexity and number of layers in the ANN. | |
| Training | Off-task. Learning turned off when fully trained. | | | Training | Off-task. Learning turned off when fully trained. | |
\\ | \\ |
====Cumulative Learning==== | ====Cumulative Learning==== |
| What it Is | Unifies several separate research tracks in a coherent form easily relatable to AGI requirements. | | | What it Is | Unifies several separate research tracks in a coherent form easily relatable to AGI requirements: Multitask learning, lifelong learning, transfer learning and few-shot learning. | |
| Multitask Learning | The ability to learn more than one task, either at once or in sequence | | | Multitask Learning | The ability to learn more than one task, either at once or in sequence. \\ The cumulative learner's ability to generalize, investigate, and reason will affect how well it implements this ability. \\ //Subsumed by cumulative learning because knowledge is contextualized as it is acquired, meaning that the system has a place and a time for every tiny bit of information it absorbs.// | |
| Online Learning | The ability to learn continuously, uninterrupted, and in real-time from experience as it comes, and without specifically iterating over it many times. | | | Online Learning | The ability to learn continuously, uninterrupted, and in real-time from experience as it comes, and without specifically iterating over it many times. \\ //Subsumed by cumulative learning because new information, which comes in via experience, is //integrated// with prior knowledge at the time it is acquired, so a cumulative learner is //always learning// as it's doing other things.// | |
| \\ Lifelong Learning | Means that an AI system keeps learning and integrating knowledge throughout its operational lifetime: learning is "always on". \\ Whichever way this is measured we expect at a minimum the `learning cycle' -- alternating learning and non-learning periods -- to be free from designer tampering or intervention at runtime. Provided this, the smaller those periods become (relative to the shortest perception-action cycle, for instance), to the point of being considered virtually or completely continuous, the better the "learning always on" requirement is met. | | | \\ Lifelong Learning | Means that an AI system keeps learning and integrating knowledge throughout its operational lifetime: learning is "always on". \\ Whichever way this is measured we expect at a minimum the `learning cycle' -- alternating learning and non-learning periods -- to be free from designer tampering or intervention at runtime. Provided this, the smaller those periods become (relative to the shortest perception-action cycle, for instance), to the point of being considered virtually or completely continuous, the better the "learning always on" requirement is met. \\ //Subsumed by cumulative learning because the continuous online learning is steady and ongoing all the time -- why switch it off?// | |
| Transfer Learning | The ability to build new knowledge on top of old in a way that the old knowledge facilitates learning the new. While interference/forgetting should not occur, knowledge should still be defeasible: the physical world is non-axiomatic so **//any//** knowledge could be proven incorrect in light of contradicting evidence. | | | Robust Knowledge Acquisition | The antithesis of which is brittle learning, where new knowledge results in catastrophic perturbations of prior knowledge (and behavior). \\ //Subsumed by cumulative learning because new information is //integrated// continuously online, which means the increments are frequent and small, and inconsistencies in the prior knowledge get exposed in the process and opportunities for fixing small inconsistencies are also frequent because the learning is life-long; which means new information is highly unlikely to result in e.g. catastrophic forgetting.// | |
| Few-Shot Learning | The ability to learn something from very few examples or very little data. Common variants include one-shot learning, where the learner only needs to be told (or experience) something once, and zero-shot learning, where the learner has already inferred it without needing to experience or be told. | | | Transfer Learning | The ability to build new knowledge on top of old in a way that the old knowledge facilitates learning the new. While interference/forgetting should not occur, knowledge should still be defeasible: the physical world is non-axiomatic so **//any//** knowledge could be proven incorrect in light of contradicting evidence. \\ //Subsumed by cumulative learning because new information is //integrated// with old information, which may result in exposure of inconsistencies, missing data, etc., which is then dealt with as a natural part of the cumulative learning operations.// | |
| | Few-Shot Learning | The ability to learn something from very few examples or very little data. Common variants include one-shot learning, where the learner only needs to be told (or experience) something once, and zero-shot learning, where the learner has already inferred it without needing to experience or be told. \\ //Subsumed by cumulative learning because prior knowledge is transferrable to new information, meaning that (theoretically) only the delta between what has been priorly learned and what is required for the new information needs to be learned.// | |
| |
| |
| \\ The Learner | Intelligent systems continually receive inputs/observations from their environment and send outputs/actions back. Some of the system’s inputs may be treated specially — e.g. as feedback or a reward signal, possibly provided by a teacher. Since intelligent action can only be called "intelligent" if it is trying to achieve something - against which the level of intelligence can be evaluated - we model intelligent agents as imperfect optimizers of some (possibly unknown) real-valued objective function. | | | \\ The Learner | Intelligent systems continually receive inputs/observations from their environment and send outputs/actions back. Some of the system’s inputs may be treated specially — e.g. as feedback or a reward signal, possibly provided by a teacher. Since intelligent action can only be called "intelligent" if it is trying to achieve something - against which the level of intelligence can be evaluated - we model intelligent agents as imperfect optimizers of some (possibly unknown) real-valued objective function. | |
| \\ Tasks | Learning systems adjust their knowledge as a result of interactions with a task- environment. Defined by (possibly a variety of) objective functions, as well as (possibly) instructions (i.e. knowledge provided at the start of the task, e.g. as a "seed", or continuously or intermittently throughout its duration). Since tasks can only be defined w.r.t. some environment, we often refer to the combination of a task and its environment as a single unit: the task-environment. | | | \\ Tasks | Learning systems adjust their knowledge as a result of interactions with a task- environment. Defined by (possibly a variety of) objective functions, as well as (possibly) instructions (i.e. knowledge provided at the start of the task, e.g. as a "seed", or continuously or intermittently throughout its duration). Since tasks can only be defined w.r.t. some environment, we often refer to the combination of a task and its environment as a single unit: the task-environment. | |
| \\ Teacher | The goal of the teacher is to influence the learner’s task-environments in such a way that progress towards the is facilitated. The teacher’s teaching task is to change the learner’s knowledge in some way (e.g. to make the learner understand something, or increase the learner’s skill on some metric). | | | \\ Teacher | The goal of the teacher is to influence the learner’s task-environments in such a way that progress towards the learning goal is facilitated. The teacher’s teaching task is to change the learner’s knowledge in some way (e.g. to make the learner understand something, or increase the learner’s skill on some metric). | |
| Environment & Task | The learner and the teacher each interact with their own view of the world (i.e. their own “environments”) which are typically different, but overlapping to some degree. | | | Environment & Task | The learner and the teacher each interact with their own view of the world (i.e. their own “environments”) which are typically different, but overlapping to some degree. | |
| \\ Training | Viewed from a teacher’s and intentional learner’s point of view, “training” means the actions taken (repeatedly) over time with the goal of becoming better at some task, by avoiding learning erroneous skills/things and avoid forgetting or unlearning desirable skills/things. | | | \\ Training | Viewed from a teacher’s and intentional learner’s point of view, “training” means the actions taken (repeatedly) over time with the goal of becoming better at some task, by avoiding learning erroneous skills/things and avoid forgetting or unlearning desirable skills/things. | |