Both sides previous revisionPrevious revisionNext revision | Previous revision |
public:t_720_atai:atai-18:lecture_notes_evaluation [2018/09/25 15:25] – thorisson | public:t_720_atai:atai-18:lecture_notes_evaluation [2024/04/29 13:33] (current) – external edit 127.0.0.1 |
---|
\\ | \\ |
| |
====State of the Art==== | |
| Summary | Practically all proposals to date for evaluating intelligence leave out some major important aspects of intelligence. Virtually no proposals exist for evaluation of knowledge transfer, attentional capabilities, knowledge acquisition, knowledge capacity, knowledge retention, multi-goal learning, social intelligence, creativity, reasoning, cognitive growth, and meta-learning / integrated cognitive control -- all of which are quite likely vital to achieving general intelligence on par with human. | | ====Example Frameworks for Evaluating AI Systems==== |
| What is needed | A theory of intelligence that allows us to construct adequate, thorough, and comprehensive tests of intelligence and intelligent behavior. | | | \\ \\ Merlin | A significant problem facing researchers in reinforcement and multi-objective learning is the lack of good benchmarks. Merlin (for Multi-objective Environments for Reinforcement LearnINg) is a software tool and method for enabling the creation of random problem instances, including multi-objective learning problems, with specific structural properties. Merlin provides the ability to control task features in predictable ways allowing researchers to build a more detailed understanding about what features of a problem interact with a given learning algorithm, improving or degrading its performance. | [[http://alumni.media.mit.edu/~kris/ftp/Tunable-generic-Garrett-etal-2014.pdf|Paper]] by Garrett et al. | |
| What can be done | In leu of such a theory (which still is not forthcoming after over 100 years of psychology and 60 years of AI) we could use a multi-dimensional "Lego" kit for exploring various means of measuring intelligence and intelligent performance, so as to be able to evaluate the pros and cons of various approaches, methods, scales, etc. | | | \\ FRaMoTEC | Framework that allows modular construction of physical task-environments for evaluating intelligent control systems. A proto- task theory on which the framework is built aims for a deeper understanding of tasks in general, with a future goal of providing a theoretical foundation for all resource-bounded real-world tasks. Tasks constructed in the framework can be rooted in physics, allowing us their execution to analyze the performance of control systems in terms of expended time and energy. | [[http://alumni.media.mit.edu/~kris/ftp/EGPAI_2016_paper_8.pdf|Paper]] by Thorarensen et al. | |
| | AI Gym | Gym is a toolkit developed by OpenAI for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Pinball. | [[https://gym.openai.com|Link]] to Website. | |
| |
\\ | \\ |
| |
| |
| ====State of the Art==== |
| | Summary | Practically all proposals to date for evaluating intelligence leave out some major important aspects of intelligence. Virtually no proposals exist for evaluation of knowledge transfer, attentional capabilities, knowledge acquisition, knowledge capacity, knowledge retention, multi-goal learning, social intelligence, creativity, reasoning, cognitive growth, and meta-learning / integrated cognitive control -- all of which are quite likely vital to achieving general intelligence on par with human. | |
| | What is needed | A theory of intelligence that allows us to construct adequate, thorough, and comprehensive tests of intelligence and intelligent behavior. | |
| | What can be done | In leu of such a theory (which still is not forthcoming after over 100 years of psychology and 60 years of AI) we could use a multi-dimensional "Lego" kit for exploring various means of measuring intelligence and intelligent performance, so as to be able to evaluate the pros and cons of various approaches, methods, scales, etc. \\ Some sort of kit meeting part or all of the requirements listed above would go a long way to bridging the gap, and possibly generate some ideas that could speed up theoretical development. | |
| |
| \\ |
| \\ |
| \\ |
\\ | \\ |
2018(c)K.R.Thórisson \\ | 2018(c)K.R.Thórisson \\ |
//EOF// | //EOF// |