User Tools

Site Tools


public:t-720-atai:atai-20:evaluation

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
public:t-720-atai:atai-20:evaluation [2020/10/29 09:52] – [The Challenge of Evaluating Intelligence] thorissonpublic:t-720-atai:atai-20:evaluation [2024/04/29 13:33] (current) – external edit 127.0.0.1
Line 28: Line 28:
 ====What Are We Trying to Evaluate?==== ====What Are We Trying to Evaluate?====
 |  Proposed Definitions  | "Intelligence" as a concept must be broken into smaller parts. \\ "Adaptation" seems too broad. \\ "Behavior" is difficult to measure unless it's codified in domain-dependent methods (e.g. verbal, motor, ...).    |  Proposed Definitions  | "Intelligence" as a concept must be broken into smaller parts. \\ "Adaptation" seems too broad. \\ "Behavior" is difficult to measure unless it's codified in domain-dependent methods (e.g. verbal, motor, ...).   
-|  Alternatives  | What if we could avoid definitions? Competitions (e.g. games, robofootball, specific (single-goal tasks) have been proposed in its place. \\ Turing proposed the 'imitation game' ("Turing Test") as a placeholder for a definitive definition (the Turing Test is most correctly seen as a working definition).   |  +|  \\ Alternatives  | What if we could avoid definitions? Competitions (e.g. games, robofootball, specific (single-goal tasks) have been proposed in its place. \\ Turing proposed the 'imitation game' ("Turing Test") as a placeholder for a definitive definition (the Turing Test is most correctly seen as a working definition).   |  
-|  Shortcomings  | Mostly single-goal (physical world requires multiple simultaneous goals). \\ Mostly easily measurable goals (PW often has ill-defined goals). \\ Mostly toy-like (no noise; PW has lots of noise.) \\ Mostly limited-count variables (PW has infinite number of vars).    | +|  \\ Shortcomings  | Mostly single-goal (physical world requires multiple simultaneous goals). \\ Mostly easily measurable goals (PW often has ill-defined goals). \\ Mostly toy-like (no noise; PW has lots of noise.) \\ Mostly limited-count variables (PW has infinite number of vars).    | 
 |  Current Status  | Scientists still working on how to properly measure learning and intelligence.  |  Current Status  | Scientists still working on how to properly measure learning and intelligence. 
  
Line 54: Line 54:
 |  What it is  | A test for intelligence proposed by Alan Turing in 1950.   | |  What it is  | A test for intelligence proposed by Alan Turing in 1950.   |
 |  Why it's relevant  | Proposed as a way to get a pragmatic/working definition of the //concept of intelligence//. \\ The first proposal for how to evaluate an intelligent machine.  | |  Why it's relevant  | Proposed as a way to get a pragmatic/working definition of the //concept of intelligence//. \\ The first proposal for how to evaluate an intelligent machine.  |
-|  Method  | It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart front the other two. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either "X is A and Y is B" or "X is B and Y is A." We now ask the question, "What will happen when a machine takes the part of A in this game?"  |+|  \\ Method  | It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart front the other two. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either "X is A and Y is B" or "X is B and Y is A." We now ask the question, "What will happen when a machine takes the part of A in this game?"  |
 |  Pros  | It is difficult to imagine an honest, collaborative machine playing this game for several days or months could ever fool a human into thinking it was a grown human unless it really understood a great deal.   | |  Pros  | It is difficult to imagine an honest, collaborative machine playing this game for several days or months could ever fool a human into thinking it was a grown human unless it really understood a great deal.   |
-|  Cons  | Targets evaluation at a single point in time. Anchored in human language, social convention and dialogue. The Loebner Prize competition has been running for some decades, offering a large financial prize for the first machine to "pass the Turing Test". None of the competing machines has thus far offered any significant advances in the field of AI, and most certainly not to AGI.  //"It's important to note that Turing never meant for his test to be the official benchmark as to whether a machine or computer program can actually think like a human"// (- Mark Reidl) +|  \\ Cons  | Targets evaluation at a single point in time. Anchored in human language, social convention and dialogue. The Loebner Prize competition has been running for some decades, offering a large financial prize for the first machine to "pass the Turing Test". None of the competing machines has thus far offered any significant advances in the field of AI, and most certainly not to AGI.  //"It's important to note that Turing never meant for his test to be the official benchmark as to whether a machine or computer program can actually think like a human"// (- Mark Reidl) 
 |  \\ Implementations  | The Loebner Prize competition has been running for some decades, offering a large financial prize for the first machine to "pass the Turing Test". None of the competing machines has thus far offered any significant advances in the field of AI, and most certainly not to AGI.    |  |  \\ Implementations  | The Loebner Prize competition has been running for some decades, offering a large financial prize for the first machine to "pass the Turing Test". None of the competing machines has thus far offered any significant advances in the field of AI, and most certainly not to AGI.    | 
 |  Bottom Line  | //"It's important to note that Turing never meant for his test to be the official benchmark as to whether a machine or computer program can actually think like a human"// (- Mark Reidl) | |  Bottom Line  | //"It's important to note that Turing never meant for his test to be the official benchmark as to whether a machine or computer program can actually think like a human"// (- Mark Reidl) |
Line 142: Line 142:
  
 ====State of the Art==== ====State of the Art====
-|  Summary   | Practically all proposals to date for evaluating intelligence leave out some major important aspects of intelligence. Virtually no proposals exist for evaluation of knowledge transfer, attentional capabilities, knowledge acquisition, knowledge capacity, knowledge retention, multi-goal learning, social intelligence, creativity, reasoning, cognitive growth, and meta-learning / integrated cognitive control -- all of which are quite likely vital to achieving general intelligence on par with human.  |+|  \\ Summary   | Practically all proposals to date for evaluating intelligence leave out some major important aspects of intelligence. Virtually no proposals exist for evaluation of knowledge transfer, attentional capabilities, knowledge acquisition, knowledge capacity, knowledge retention, multi-goal learning, social intelligence, creativity, reasoning, cognitive growth, and meta-learning / integrated cognitive control -- all of which are quite likely vital to achieving general intelligence on par with human.  |
 |  What is needed  | A theory of intelligence that allows us to construct adequate, thorough, and comprehensive tests of intelligence and intelligent behavior.  | |  What is needed  | A theory of intelligence that allows us to construct adequate, thorough, and comprehensive tests of intelligence and intelligent behavior.  |
-|  What can be done  | In leu of such a theory (which still is not forthcoming after over 100 years of psychology and 60 years of AI) we could use a multi-dimensional "Lego" kit for exploring various means of measuring intelligence and intelligent performance, so as to be able to evaluate the pros and cons of various approaches, methods, scales, etc. \\ Some sort of kit meeting part or all of the requirements listed above would go a long way to bridging the gap, and possibly generate some ideas that could speed up theoretical development.    |+|  \\ What can be done  | In leu of such a theory (which still is not forthcoming after over 100 years of psychology and 60 years of AI) we could use a multi-dimensional "Lego" kit for exploring various means of measuring intelligence and intelligent performance, so as to be able to evaluate the pros and cons of various approaches, methods, scales, etc. \\ Some sort of kit meeting part or all of the requirements listed above would go a long way to bridging the gap, and possibly generate some ideas that could speed up theoretical development.    |
  
 \\ \\
/var/www/cadia.ru.is/wiki/data/attic/public/t-720-atai/atai-20/evaluation.1603965173.txt.gz · Last modified: 2024/04/29 13:32 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki