Differences

This shows you the differences between two versions of the page.

--- public:t_720_atai:atai-18:lecture_notes_evaluation [2018/09/25 15:18] – [The Toy Box Problem] thorisson
+++ public:t_720_atai:atai-18:lecture_notes_evaluation [2024/04/29 13:33] (current) – external edit 127.0.0.1
@@ Line 75: / Line 75: @@
 \\
-====State of the Art====
-|  Summary   | Practically all proposals to date for evaluating intelligence leave out some major important aspects of intelligence. Virtually no proposals exist for evaluation of knowledge transfer, attentional capabilities, knowledge acquisition, knowledge capacity, knowledge retention, multi-goal learning, social intelligence, creativity, reasoning, cognitive growth, and meta-learning / integrated cognitive control -- all of which are quite likely vital to achieving general intelligence on par with human.  |
-|  What is needed  | A theory of intelligence that allows us to construct adequate, thorough, and comprehensive tests of intelligence and intelligent behavior.  |
+====Requirements for Evaluation: Features That Evaluators Should Be Able To Control====
-|  What can be done  | In leu of such a theory (which still is not forthcoming after over 100 years of psychology and 60 years of AI) we could use a multi-dimensional "Lego" kit for exploring various means of measuring intelligence and intelligent performance, so as to be able to evaluate the pros and cons of various approaches, methods, scales, etc.    |
+|  Determinism  | Both full determinism and partial stochasticity (for realism regarding, e.g. noise, stochastic events, etc.) must be supported.   |
+|  Ergodicity  | The reachability of (aspects of) states from others determines the degree to which the agent can undo things and get second chances.  |
+|  Continuity  | For evaluation to be relevant to e.g.robotics, it is critical to allow continuous variables, to appropriately represent continu- ous spatial and temporal features. The degree to which continuity is approximated (discretization granularity) should be changeable for any variable.  |
+|  Asynchronicity  | Any action in the task-environment, including sensors and controls, may operate on arbitrary time scales and interact at any time, letting an agent respond when it can. |
+|  Dynamism  | A static task-environment’s state only changes in response to the AI’s actions. The most simplistic ones are step-lock, where the agent makes one move and the environment responds with another (e.g. board games). More complex environments can be dynamic to various degrees in terms of speed and magnitude, and may be caused by interactions between environmental factors, or simply due to the passage of time.  |
+|  Observability  | Task-environments can be partially observable to varying degrees, depending on the type, range, refresh rate, and precision of available sensors, affecting the difficulty and general nature of the task-environment.   |
+|  Controllability  | The control that the agent can exercise over the environ- ment to achieve its goals can be partial or full, depending on the capability, type, range, inherent latency, and precision of available actuators.   |
+|  Multiple Parallel Causal Chains  | Any generally intelligent system in a complex environment is likely to be trying to meet multiple objectives, that can be co-dependent in various ways through any number of causal chains in the task-environment. Actions, observations, and tasks may occur sequentially or in parallel (at the same time). Needed to implement real- world clock environments.  |
+|  Periodicity  | Many structures and events in nature are repetitive to some extent, and therefore contain a (learnable) periodic cycle – e.g. the day-night cycle or blocks of identical houses.   |
+|  Repeatability  | Both fully deterministic and partially stochastic environ- ments must be fully repeatable, for traceable transparency.  |
+|  REF  | [[http://alumni.media.mit.edu/~kris/ftp/AGIEvaluationFlexibleFramework-ThorissonEtAl2015.pdf|Thorisson, Bieger, Schiffel & Garrett]]   |
 \\
 \\
+====Requirements for Evaluation: Settings That Must Be Obtainable====
+|  Complexity  | ENVIRONMENT IS COMPLEX WITH DIVERSE INTERACTING OBJECTS   |
+|  Dynamicity  | ENVIRONMENT IS DYNAMIC  |
+|  Regularity  | TASK-RELEVANT REGULARITIES EXIST AT MULTIPLE TIME SCALES  |
+|  Task Diversity  | TASKS CAN BE COMPLEX, DIVERSE, AND NOVEL  |
+|  Interactions  | AGENT/ENVIRONMENT/TASK INTERACTIONS ARE COMPLEX AND LIMITED  |
+|  Computational limitations  | AGENT COMPUTATIONAL RESOURCES ARE LIMITED  |
+|  Persistence  | AGENT EXISTENCE IS LONG-TERM AND CONTINUAL  |
+|  REF   | [[http://www.atlantis-press.com/php/download_paper.php?id=1900|Laird et al.]]   |
+\\
+\\
+====Example Frameworks for Evaluating AI Systems====
+|  \\ \\ Merlin  | A significant problem facing researchers in reinforcement and multi-objective learning is the lack of good benchmarks. Merlin (for Multi-objective Environments for Reinforcement LearnINg) is a software tool and method for enabling the creation of random problem instances, including multi-objective learning problems, with specific structural properties. Merlin provides the ability to control task features in predictable ways allowing researchers to build a more detailed understanding about what features of a problem interact with a given learning algorithm, improving or degrading its performance.   |  [[http://alumni.media.mit.edu/~kris/ftp/Tunable-generic-Garrett-etal-2014.pdf|Paper]] by Garrett et al.  |
+|  \\ FRaMoTEC  | Framework that allows modular construction of physical task-environments for evaluating intelligent control systems. A proto- task theory on which the framework is built aims for a deeper understanding of tasks in general, with a future goal of providing a theoretical foundation for all resource-bounded real-world tasks. Tasks constructed in the framework can be rooted in physics, allowing us their execution to analyze the performance of control systems in terms of expended time and energy.   |  [[http://alumni.media.mit.edu/~kris/ftp/EGPAI_2016_paper_8.pdf|Paper]] by Thorarensen et al.   |
+|  AI Gym  | Gym is a toolkit developed by OpenAI for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Pinball.    |  [[https://gym.openai.com|Link]] to Website.  |
+\\
+\\
+====State of the Art====
+|  Summary   | Practically all proposals to date for evaluating intelligence leave out some major important aspects of intelligence. Virtually no proposals exist for evaluation of knowledge transfer, attentional capabilities, knowledge acquisition, knowledge capacity, knowledge retention, multi-goal learning, social intelligence, creativity, reasoning, cognitive growth, and meta-learning / integrated cognitive control -- all of which are quite likely vital to achieving general intelligence on par with human.  |
+|  What is needed  | A theory of intelligence that allows us to construct adequate, thorough, and comprehensive tests of intelligence and intelligent behavior.  |
+|  What can be done  | In leu of such a theory (which still is not forthcoming after over 100 years of psychology and 60 years of AI) we could use a multi-dimensional "Lego" kit for exploring various means of measuring intelligence and intelligent performance, so as to be able to evaluate the pros and cons of various approaches, methods, scales, etc. \\ Some sort of kit meeting part or all of the requirements listed above would go a long way to bridging the gap, and possibly generate some ideas that could speed up theoretical development.    |
+\\
+\\
+\\
 \\
 (c)K.R.Thórisson \\
 //EOF//