public:t-709-aies-2025:aies-2025:trust_explanation_meaning
This is an old revision of the document!
Table of Contents
T-713-MERS-2025 Main
Link to Lecture Notes
TRUST & EXPLANATIONS
Engineered Predictability
What It Is | The ability of an outsider to predict the behavior of a controller based on some information. |
Why It Is Important | Predicting the behavior of (semi-) autonomous machines is important if we want to ensure their safe operation, or be sure that they do what we want them to do. |
How To Do It | Predicting the future behavior of ANNs (of any kind) is easier if we switch off their learning after they have been trained, because there exists no method for predicting where their development will lead them if they continue to learn after the leave the lab. Predicting ANN behavior on novel input can be done statistically, but there is no way to be sure that novel input will not completely reverse their behavior. There are very few if any methods for giving ANNs the ability to judge the “novelty” of any input, which might to some extent possibly help with this issue. Reinforcement learning addresses this by only scaling to a handful of variables with known max and min. |
Correlation vs. Causation | In the physical world, predictability means reliability; reliability cannot be achieved without knowledge of, and reliance on, repeatable causal relationships. To achieve predictable behavior of engineered artifacts, correlation alone does not suffice, because it does not allow the engineer to foresee, and engineer out of the system, the potential undesirable categories of results (i.e. effects) of certain (novel) categories of inputs (i.e. causes). |
Predictable AI | There is no way to achieve predictable behavior of a machine that handles novel input without giving the machine itself the ability to judge the input and classify it as “novel”. This cannot be done with ANN-based systems because of how their information (I hesitate to call it “knowledge” - see definitions of these concepts in prior lecture notes) is represented. |
Engineered Reliability
What It Is | The ability of a machine to always return the same - or categorically similar - answer to the same - and/or categorically similar - input. |
Why It Is Important | Simple AI algorithms (e.g. reinforcement learning, auto-correlation, decision trees, etc.) are very good in this respect, delivering high reliability. Because they are simple, and their environment is oversimplified, their reliability to can be engineered up front. Near-human-level AI – or adaptive autonomous machines (AAMs) – on the other hand, have the same limitations as humans and animals in this respect, i.e. reliability is a challenge, and no guarantees can be given. |
Predictability is Hard to Achieve | In a growing, developing system that is adapting and learning (3 or 4 levels of detail of dynamical relations!) achieving predictability can only be achieved by abstraction: Moving up to the next level of detail; e.g. “I cannot be sure what exactly I will eat for dinner (one level of detail), but I can be pretty sure that I will eat dinner (more coarse-grain level)”. |
Producing Abstraction | Can be done through hierarchy (but must be dynamic - i.e. dynamically adjusted to its intended usage, as the circumstances call for - because the world has too complex combinatorics to store precomputed hierarchies for everything). |
Human-Level AI | To make AAMs reliable is important because without reliability they cannot be trusted, and hence would defeat most of the purpose for creating them in the first place. (One method for addressing this is through autonomous iterative micro-model generation and refinement.) |
Achieving Reliabile AI | Requires predictability. Predictability requires sorting out causal relations (without these, neither we nor the system can never be sure what leads to what, precluding reliability). |
Engineered / Earned Trustworthiness
What It Is | The ability of a machine's owner to trust that the machine will do what it is supposed to do. |
Why It Is Important | Any machine created by humans is created for a purpose. The more reliably it does its job (and nothing else) the more trustworthy it is. Trusting simple machines like thermostats involves mostly durability, since they have very few open variables (unbound variables at time of manufacture). |
Human-Level AI | To make human-level AI trustworthy is very different from creating simple machines because so many variables are unbound at manufacture time. What does trustworthiness mean in this context? We can look at human trustworthiness: Numerous methods exist for ensuring trustworthiness (license to drive, air traffic controller training, certification programs, etc.). We can have the same certification programs for all humans because their principles of operation are shared at multiple levels of detail (biology, sociology, psychology). For an AI this is different because the variability in the makeup of the machines is enormous. This makes trustworthiness of AI robots a complex issue. |
To Achieve Trustworthiness | Requires reliability, and predictability at multiple levels of operation. Trustworthiness can be ascertained through special certification programs geared directly at the kind of robot/AI system in question (kind of like certifying a particular horse as safe for a particular circumstance and purpose, e.g. horseback riding kids). |
Trustworthiness Methods… | …for AI are in their infancy. |
IN-CLASS GROUP ASSIGNMENT
In groups of 2, answer the following questions:
Tust & Explanation
Can Trust Exist Without Explanation? | What are the fundamentals of Trust? What role does explanation play in creating trust? |
Tust & Ethics
Can Ethics Exist Without Trust? | What role does trust play in ethics? Can an untrustworthy agent operate ethically? Why / why not? |
2025©K.R.Thórisson
/var/www/cadia.ru.is/wiki/data/attic/public/t-709-aies-2025/aies-2025/trust_explanation_meaning.1756197771.txt.gz · Last modified: 2025/08/26 08:42 by thorisson