User Tools

Site Tools


public:t_720_atai:atai-18:lecture_notes_evaluation

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
public:t_720_atai:atai-18:lecture_notes_evaluation [2018/09/24 17:09] – created thorissonpublic:t_720_atai:atai-18:lecture_notes_evaluation [2024/04/29 13:33] (current) – external edit 127.0.0.1
Line 56: Line 56:
 |  What it is   | A proposal for evaluating the intelligence of an agent.  | |  What it is   | A proposal for evaluating the intelligence of an agent.  |
 |  Why it's relevant  | One of several new and novel methods proposed for this purpose; focuses on variety, novelty and exploration.  | |  Why it's relevant  | One of several new and novel methods proposed for this purpose; focuses on variety, novelty and exploration.  |
-|  Method   | A robot is given a box of previously unseen toys. The toys vary in shape, appearance and construction materials. Some toys may be entirely unique, some toys may be identical, and yet other toys may share certain character- istics (such as shape or construction materials). The robot has an opportunity to  rst play and experiment with the toys, but is subsequently tested on its knowledge of the toys. It must predict the responses of new interactions with toys, and the likely behavior of previously unseen toys made from similar materials or of similar shape or appearance. Furthermore, should the toy box be emptied onto the floor, it must also be able to generate an appropriate sequence of actions to return the toys to the box without causing damage to any toys (or itself).  |+|  Method   | A robot is given a box of previously unseen toys. The toys vary in shape, appearance and construction materials. Some toys may be entirely unique, some toys may be identical, and yet other toys may share certain characteristics (such as shape or construction materials). The robot has an opportunity to  rst play and experiment with the toys, but is subsequently tested on its knowledge of the toys. It must predict the responses of new interactions with toys, and the likely behavior of previously unseen toys made from similar materials or of similar shape or appearance. Furthermore, should the toy box be emptied onto the floor, it must also be able to generate an appropriate sequence of actions to return the toys to the box without causing damage to any toys (or itself).  |
 |  Pros  | Includes perception and action explicitly. Specifically designed as a stepping stone towards general intelligence; a solution to the simplest instances should not require universal or human-like intelligence.    | |  Pros  | Includes perception and action explicitly. Specifically designed as a stepping stone towards general intelligence; a solution to the simplest instances should not require universal or human-like intelligence.    |
 |  Cons  | Limited to a single instance in time. Somewhat too limited to dexterity guided by vision, missing out on reasoning, creativity, and many other factors.    |  |  Cons  | Limited to a single instance in time. Somewhat too limited to dexterity guided by vision, missing out on reasoning, creativity, and many other factors.    | 
Line 75: Line 75:
 \\ \\
  
-====State of the Art==== + 
-|  Summary   Practically all proposals to date for evaluating intelligence leave out some major important aspects of intelligenceVirtually no proposals exist for evaluation of knowledge transferattentional capabilitiesknowledge acquisitionknowledge capacityknowledge retentionmulti-goal learningsocial intelligencecreativityreasoningcognitive growth, and meta-learning / integrated cognitive control -- all of which are quite likely vital to achieving general intelligence on par with human + 
-|  What is needed  A theory of intelligence that allows us to construct adequatethorough, and comprehensive tests of intelligence and intelligent behavior +====Requirements for Evaluation: Features That Evaluators Should Be Able To Control==== 
-|  What can be done  In leu of such theory (which still is not forthcoming after over 100 years of psychology and 60 years of AIwe could use a multi-dimensional "Lego" kit for exploring various means of measuring intelligence and intelligent performance, so as to be able to evaluate the pros and cons of various approachesmethodsscalesetc.    |+|  Determinism  Both full determinism and partial stochasticity (for realism regarding, e.g. noise, stochastic events, etc.) must be supported.   | 
 +|  Ergodicity  | The reachability of (aspects of) states from others determines the degree to which the agent can undo things and get second chances | 
 +|  Continuity  | For evaluation to be relevant to e.g.roboticsit is critical to allow continuous variablesto appropriately represent continu- ous spatial and temporal features. The degree to which continuity is approximated (discretization granularity) should be changeable for any variable. 
 +|  Asynchronicity  | Any action in the task-environmentincluding sensors and controlsmay operate on arbitrary time scales and interact at any timeletting an agent respond when it can. | 
 +|  Dynamism  | A static task-environment’s state only changes in response to the AI’s actions. The most simplistic ones are step-lockwhere the agent makes one move and the environment responds with another (e.g. board games). More complex environments can be dynamic to various degrees in terms of speed and magnitudeand may be caused by interactions between environmental factorsor simply due to the passage of time.  | 
 +|  Observability  | Task-environments can be partially observable to varying degreesdepending on the type, range, refresh rate, and precision of available sensors, affecting the difficulty and general nature of the task-environment  
 +|  Controllability  The control that the agent can exercise over the environ- ment to achieve its goals can be partial or fulldepending on the capability, type, range, inherent latency, and precision of available actuators  
 +|  Multiple Parallel Causal Chains  Any generally intelligent system in complex environment is likely to be trying to meet multiple objectives, that can be co-dependent in various ways through any number of causal chains in the task-environment. Actions, observations, and tasks may occur sequentially or in parallel (at the same time). Needed to implement realworld clock environments. 
 +|  Periodicity  | Many structures and events in nature are repetitive to some extent, and therefore contain a (learnable) periodic cycle – e.g. the day-night cycle or blocks of identical houses.   | 
 +|  Repeatability  | Both fully deterministic and partially stochastic environ- ments must be fully repeatablefor traceable transparency. 
 +|  REF  | [[http://alumni.media.mit.edu/~kris/ftp/AGIEvaluationFlexibleFramework-ThorissonEtAl2015.pdf|ThorissonBiegerSchiffel & Garrett]]    
  
 \\ \\
 \\ \\
 +====Requirements for Evaluation: Settings That Must Be Obtainable====
  
 +|  Complexity  | ENVIRONMENT IS COMPLEX WITH DIVERSE INTERACTING OBJECTS   |  
 +|  Dynamicity  | ENVIRONMENT IS DYNAMIC  |  
 +|  Regularity  | TASK-RELEVANT REGULARITIES EXIST AT MULTIPLE TIME SCALES  |  
 +|  Task Diversity  | TASKS CAN BE COMPLEX, DIVERSE, AND NOVEL  |  
 +|  Interactions  | AGENT/ENVIRONMENT/TASK INTERACTIONS ARE COMPLEX AND LIMITED  |  
 +|  Computational limitations  | AGENT COMPUTATIONAL RESOURCES ARE LIMITED  |  
 +|  Persistence  | AGENT EXISTENCE IS LONG-TERM AND CONTINUAL  |  
 +|  REF   | [[http://www.atlantis-press.com/php/download_paper.php?id=1900|Laird et al.]]   |
  
 +\\
 +\\
  
 +
 +====Example Frameworks for Evaluating AI Systems====
 +|  \\ \\ Merlin  | A significant problem facing researchers in reinforcement and multi-objective learning is the lack of good benchmarks. Merlin (for Multi-objective Environments for Reinforcement LearnINg) is a software tool and method for enabling the creation of random problem instances, including multi-objective learning problems, with specific structural properties. Merlin provides the ability to control task features in predictable ways allowing researchers to build a more detailed understanding about what features of a problem interact with a given learning algorithm, improving or degrading its performance.    [[http://alumni.media.mit.edu/~kris/ftp/Tunable-generic-Garrett-etal-2014.pdf|Paper]] by Garrett et al.  |
 +|  \\ FRaMoTEC  | Framework that allows modular construction of physical task-environments for evaluating intelligent control systems. A proto- task theory on which the framework is built aims for a deeper understanding of tasks in general, with a future goal of providing a theoretical foundation for all resource-bounded real-world tasks. Tasks constructed in the framework can be rooted in physics, allowing us their execution to analyze the performance of control systems in terms of expended time and energy.    [[http://alumni.media.mit.edu/~kris/ftp/EGPAI_2016_paper_8.pdf|Paper]] by Thorarensen et al.   |
 +|  AI Gym  | Gym is a toolkit developed by OpenAI for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Pinball.    |  [[https://gym.openai.com|Link]] to Website.  |  
 +
 +\\
 +\\
 +
 +
 +====State of the Art====
 +|  Summary   | Practically all proposals to date for evaluating intelligence leave out some major important aspects of intelligence. Virtually no proposals exist for evaluation of knowledge transfer, attentional capabilities, knowledge acquisition, knowledge capacity, knowledge retention, multi-goal learning, social intelligence, creativity, reasoning, cognitive growth, and meta-learning / integrated cognitive control -- all of which are quite likely vital to achieving general intelligence on par with human.  |
 +|  What is needed  | A theory of intelligence that allows us to construct adequate, thorough, and comprehensive tests of intelligence and intelligent behavior.  |
 +|  What can be done  | In leu of such a theory (which still is not forthcoming after over 100 years of psychology and 60 years of AI) we could use a multi-dimensional "Lego" kit for exploring various means of measuring intelligence and intelligent performance, so as to be able to evaluate the pros and cons of various approaches, methods, scales, etc. \\ Some sort of kit meeting part or all of the requirements listed above would go a long way to bridging the gap, and possibly generate some ideas that could speed up theoretical development.    |
 +
 +\\
 +\\
 +\\
 \\ \\
 2018(c)K.R.Thórisson \\ 2018(c)K.R.Thórisson \\
 //EOF// //EOF//
/var/www/cadia.ru.is/wiki/data/attic/public/t_720_atai/atai-18/lecture_notes_evaluation.1537808959.txt.gz · Last modified: 2024/04/29 13:33 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki