Differences

This shows you the differences between two versions of the page.

--- public:t-709-aies-2025:aies-2025:principle_ai_ethics [2025/09/18 10:12] – leonard
+++ public:t-709-aies-2025:aies-2025:principle_ai_ethics [2025/09/22 12:41] (current) – leonard
@@ Line 70: / Line 70: @@
 |  Implementation  | - Use risk assessment and harm impact assessment before, during, and after deployment.\\ - Limit scope: Ensure capabilities are aligned with what is needed.\\ - Define and document legitimate aims explicitly.\\ - Ensure there is the ability to stop or scale back systems.\\ - Build in legal or regulatory constraints to enforce limits. |
+===== AI - What are we talking about? =====
+==== Historical Perspective ====
+In 1950, Turing proposed the idea of building child machines, which are machines that can mimic human children's learning and reasoning behavior. Minsky (1952) built the first Artificial Neural Net (ANN), called SNARC. In 1959, Rosenblatt created the Perceptron, which signaled the coming of sub-symbolic systems over half a century before contemporary ANNs. Biological neural networks have inspired the creation of these networks.
+In 1956, McCarthy and Minsky proposed ideas for building reasoning systems based on logic rules and symbolic representations. In the 1960s and 1970s, heavy emphasis was put on symbolic AI and rule-based systems such as chess.
+The 1970s and 1980s marked the AI winter, where the progress in AI research slowed down.
+In the 1990s and 2000s, AI matured in the form of machine learning (ML) algorithms, which are data-driven methods, as opposed to rule-based symbolic systems.
+==== Contemporary AI: Practical Artificial Neural Networks ====
+In the 2010s, Artificial Neural Networks (ANNs) required enormous amounts of data and computing power. ANN led to extensive automation in the domains of
+  * Image recognition: inputs are image data.
+  * Natural Language Processing: text and audio data
+In recent years, ANNs have been used in building large language models (LLMs), e.g., GPT models.
+  * Training data: texts on the internet.
+  * How do ANNs get training?
+    * Training: feed input data (examples) with known outputs, which tunes the weights (edges between nodes) until it predicts well.
+    * Backpropagation: adjust the weights when a wrong prediction is made.
+==== Ethical Issues of ANNs ====
+|  Transparency  | ANNs designed for real-world application domains have a large number of inputs and outputs, hidden nodes, and layers, making them black-box predictors/classifiers. (Some examples of domains?) |
+|  Justice and Fairness  | ANNs are overly sensitive to training data. If the training data is biased, they amplify the biases through their architecture, making decisions/classifications/predictions that reflect horrifying discrimination (e.g., racial and gender biases in what domains?). |
+|  Safety and Security  | - ANNs can not deal with adversarial attacks, where the input test data is designed in such a way that fools the ANN-based AI systems.\\ - ANNs can be used for the creation of deepfakes, which can be used for unethical purposes. |
+|  Responsibility  | A lack of transparency makes it difficult to understand how the system makes decisions. This allows some AI system designers to avoid the consequences of their systems' malfunctions. |
+|  Privacy  | ANNs require large amounts of training data, allowing ANN developers to collect and access a significant amount of people's private data. |
+==== Can ANNs become more ethical? ====
+|  Explainable ANNs  | Using explainability techniques to help users and developers understand the decision. But: Only to some extend. |
+|  Bias mitigation  | Collecting more training data, or bias detection algorithms. However:\\ - Lack of training data in many domains\\ - Privacy issues with collecting more and more data.\\ - Who writes the algorithms? How to ensure that biases are mitigated? |
+|  Safety improvement  | By adding humans in the loop components to retrain models. However:\\ - High human labor cost\\ - Catastrophic forgetting. |
+|  Responsibility  | Still hard to make out who is responsible for the decisions made. |
+|  Privacy  | Anonymizing training data (challenging for large datasets). |
+==== Example ====
+=== ChatGPT ===
+  * Generative Pre-trained Transformer (GPT) is one of the largest LMMs.
+  * GPT-4 had 45 TB, GPT-5 even as much as 280 TB of (unfiltered) training data.
+  * GPT-4 roughly 1.7 trillion parameters. GPT-5 is estimated to have up to 5-10 trillion parameters (could be less than GPT-4, however, due to a possibly multi-model architecture. This is all unclear.).
+  * Up to 400,000 tokens context window.
+**Ethical issues**:
+  * Transparency: Very limited and it is unclear how some responses are generated.
+  * Justice and fairness: Use of ChatGPT may violate copyrights, etc.
+  * Safety: Generating harmful content, misinformation or being misused for unethical purposes.
+  * Responsibility: Who is responsible?
+  * Privacy: Massive amounts of data. Some training data can be extracted. See for example [[https://not-just-memorization.github.io/extracting-training-data-from-chatgpt.html| Extracting Training Data from ChatGPT]]