[[/public:t-713-mers:mers-23:main|T-713-MERS-2023 Main]] \\ [[/public:t-713-mers:mers-23:lecture_notes|Link to Lecture Notes]] \\ \\ =====A SHORT CRASH COURSE IN ANNs===== \\ \\ \\ ====ANNs: Overview==== | ANNs | A special way to create classification functions over large amounts of data without explicitly specifying the mathematical operations. Instead, a largely automated process called 'training' (based on an algorithm called "back propagation" or "backprop") for achieving their production. The data sets can be for example images and text. | | Name | The name comes from their inspiration from natural neural networks, like those found in the central nervous system of many animal species. ANNs are certainly "networks" but they are not really "neural" in any meaningful sense, as they don't try to mimic what neurons in nature do. The term "artificial" refers to the fact that they are human-made. | | History | The inventors of the first ANNs, Marvin Minsky and Rosenblatt, name comes from their inspiration from natural neural networks like those in the central nervous system in many animals. ANNs are certainly "nets" but they are not really "neural" in any meaningful sense, as they don't try to mimic what neurons in nature do, except in a cartoonish way. | | Current State | Very large ANNs of a particular kind, called Large Language Models and Deep Neural Networks, are used in a variety of industries to achieve the key property mentioned above, namely, to produce a (feed-forward) function based on large amounts of data, for a variety of classification tasks. | \\ ====ANNs: Data & Training==== | Input Data | Continuous and discrete variables. | | Output Data | Continuous and discrete variables. | | Max. # I/O Vars. | Very high (possibly limited by CPU and BW only). | | Min. Training Cycles | Typical numbers 4k-10k. \\ Depends on data complexity and number of layers in the ANN. | | Training | Off-task. Learning turned off when fully trained. | | Training Style | Training phase BILL (before it leaves the lab); discrete training steps. | | \\ Training Signal | Supervised learning: Explicit "error signal propagation" after every turn, generated from pre-categorized examples and outcomes. \\ Unsupervised: explicit "error signal propagation" after every turn, auto-generated. | | Hyper-Parameters | - Learning Rate, Exploration/Exploitation, and many others | | Strengths | Handles complex data sets. \\ | | Scalability | Unpredictable behavior under data drift AILL (after it leaves the lab). \\ Must be trained BILL (unpredictable learning AILL). \\ | \\ ====ANNs: What They Can Do==== | | | \\ ====ANNs: Key Principles of Operation & Use==== | Representation | ANNs represent all their information as correlation. | | Pattern Extraction | The reason ANNs are useful is because they don't need to be programmed line-by-line, by hand, by a coder -- the information they use to operate, once trained, is based on **examples** that are **automatically** fed into the system to produce the desired end result. | | End Result | ...can be a highly sophisticated classification function that can be used in a variety of situations -- . | | Application Areas | A trained ANNs can be employed essentially anywhere that a sophisticated classification function is needed. | | Requirements | Large amounts of data must be available to train an ANN for reliable operation of any kind, typically 2k to 5k data records (examples) of **each category** that the ANN is supposed to classify. | \\ ====ANNs: Key Limitations==== | Big Data | ANNs require large amounts of training data. If Big Data doesn't exist for what they are intended to be applied to, they cannot be used. | | Black-Box | ANNs | | Interpolation vs. extrapolation | If we think of the training data as painting a picture of a state space, with the most extreme data points representing the boundaries of this space, after training, ANNs are very good at interpolating between points in this multi-dimensional space, if the data is well chosen. The output of the ANN can relatively predictable and reliable. When a trained ANN receives cases that are //outside// of these boundaries, however, the reliability of the output drops rapidly: Sometimes it is good, sometimes it is extremely wrong. ANNs can thus be seen to be better at interpolation than extrapolation. | | No On-Job Learning | ANNs are "baked in the lab" during training, after which they are "frozen" - they cannot learn autonomously AILL because the outcome of that process is undefined. | | No Learning AILL | ANNs cannot learn after they leave the lab (AILL) without being put through the same training process as before, with a modified training data set. Technically, it is not the "same" ANN that comes out of such a process as the one that came before, because re-training means creating a new training data set, and the old ANN is not re-used in the training. So by definition this produces a **new** ANN. | | Cause-Effect Knowledge | ANNs can only represent correlational relations between data, they cannot represent cause-effect relations explicitly. They cannot tell whether the tail wags the cat or the cat wags the tail. | | Self-Policing | ANNs are black-box and have no ability to reflect on their own knowledge. Therefore they cannot be made to police their own knowledge - they cannot tell whether or when they are "hallucinating". | | Trustworthiness | Due to the above, ANNs cannot be trusted to perform anything reliably, unless a human checks their output. Therefore they cannot in principle be trusted with tasks where human life is at stake. | | No Reasoning | ANNs cannot apply logic systematically to arguments, even though it may sometimes - even often - seem that way. The reason is twofold: (a) When they produce their output they do so by applying functions that are based on training data, and (b) they have no systematic way of accessing their own knowledge - it is black-box. When these two features are combined it means that the system cannot guarantee that, when a reasoning method would be appropriate, a reasoning method is applied. | \\ ====ANNs & Reasoning==== | Feed-Forward Function | The feed-forward nature of ANNs means that it is difficult to imbue them with any kind of "working memory" - i.e. memory of its interaction with the environment (e.g. a user). | | Reasoning Requires Selection | Selecting relevant background assumptions is part of any successful reasoning process. Because ANNs don't really have a "short-term memory" their selection of relevant data - if we could call it that - is based on a-priori extracted patterns, rather than patterns that are relevant to the unique situation. | | Reasoning Requires Recursion | When selecting background assumptions, for instance, sometimes reasoning must be applied to their selection, in a recursive step. ANNs cannot do this based on the needs of the situation, only if it has been trained in this very particular way for this particular data. | \\ \\ \\ \\ \\ 2023(c)K.R.Thórisson \\