======T-725-MALV, Natural Language Processing Fall 2015====== ===== Basic Info ===== * **Instructors: ** [[http://www.ru.is/kennarar/hrafn|Hrafn Loftsson]] and [[http://www.ru.is/kennarar/hannes|Hannes Högni Vilhjálmsson]] * **Teaching Assistant: ** Örvar Kárason, orvark13@ru.is * **Contact: ** Office in V207, 599 6227, {hrafn, hannes}@ru.is * **Discussion Classes: ** Tuesdays 15:45 - 17:20 in M113 * **Lab Classes: ** Thursdays 8:30 - 10:05 in M102 ===== Description ===== The goal of language technology (LT) is to develop systems which allow people to communicate with computers using natural languages. LT is an interdisciplinary field, requiring knowledge from subjects like linguistics, statistics, psychology, engineering and computer science. This course discusses fundamentals of natural language processing (NLP), which is one of the subfields of LT, and introduces research in the field, in part with regard to the Icelandic language. Students acquire understanding of the various stages of NLP, e.g. morphological analysis, part-of-speech tagging, syntactic analysis, semantic analysis, discourse and dialogue. In the course, students work on programming projects related to the aforementioned stages. ===== Learning Outcome ===== On completing the course, students should: * know the main methods of processing required for computers to analyse and understand texts in a human language * understand the strengths and weaknesses of current Natural Language Processing (NLP) technology * know the main models and algorithms used in NLP, such as in morphological analysis, part-of-speech tagging, parsing, semantic analysis, and discourse and dialogue analysis * know at least one programming language suitable for text processing * be able to write simple NLP applications and present their work both orally and in writing * be able to evaluate the performance/accuracy of NLP systems * be aware of current research in NLP ===== Course Assessment ===== The course assessment is as follows: ^Part of Course^Total Weight^ |Three individual projects/assignments; 3*10% | 30%| |A final project (can be worked on in a group of two students) | 30%| |Participation in class (discussions, Piazza, labs) | 10%| |A final written exam | 30%| ^ Total 100% ^^ To provide a rich hands-on experience, students will build their own application (a final project) that relies on NLP over the course of the semester. A number of project proposals will be provided by the instructors, but students are also encouraged to come up with their own ideas. Three homework projects/assignments will also be distributed during the semester to reinforce some of the more theoretical material. Everything that has to be turned in should arrive no later than at 23:59 on the due date, or else incur 10% penalty for each additional day, including weekends and holidays. Projects are not accepted if handed in more than two days late. Students need to hand in at least 70% of the lab projects in order to take the final exam. ===== Assignments and Projects ===== ^Assignment^Assigned^Due^Duration^Weight^ |{{:public:t-malv-15-3:assignmenti.pdf|Assignment I}} | Mon 7. Sep. | Wed 16. Sep. (Week 5) | 10 days | 10% | |{{:public:t-malv-15-3:assignmentii.pdf|Assignment II}} |Mon 28. Sep. | Wed 7. Oct (Week 8) | 10 days | 10% | |{{:public:t-malv-15-3:assignmentiii.pdf|Assignment III}} |Tue 20. Oct. | Wed 28. Oct (Week 11) | 9 days | 10% | |{{:public:t-malv-15-3:final.pdf|Final project}} |Mon 28. Sep. | Fri 6. Nov (Week 12) | 6 weeks | 30% | ^ ^^^^Total 60% ^ ==== Final Project Teams ==== ^ Team ^ Members ^ Proejct ^ | 1 | Szymon Klepacz and Felix Weissl | 1.15 Question-Answering System | | 2 | Steve Losh | Tweet Scraping and Sentiment Analysis | | 3 | Jacopo de Berardinis, Carlo Castagnari, Giorgio Focina | 1.13 Intelligent Computer-Assisted Languge Learning (Italian?) | | 4 | Kristján Rúnarsson | 1.15 Intonation for Text-to-Speech | | 5 | Tinna Frímann Jökulsdóttir | 1.13 Intelligent Computer-Assisted Language Learning (Icelandic) | | 6 | Starkaður Barkarson | 1.13 Intelligent Computer-Assisted Language Learning (Icelandic) | | 7 | Sigurður Jónsson | Knowledge Representation and Reasoning | | 8 | Arnar Freyr Bjarnason | TBD | | 9 | Ívar Örn Ragnarsson | Text Summarization | ===== Final Exam ===== There will be a final written exam counting 30% towards your grade. ===== Online Discussion Forum ===== Piazza - [[http://piazza.com/ru.is/fall2015/t725malv/home]] ===== Syllabus ===== ^Week^ Theory (Tues) ^^^ Practice (Thurs) ^^^ ^ ^Textbook^Slides^Video^Text^Video^Lab^ | 1 | Chapter 1 | {{:public:t-malv-15-3:AboutCourse.pdf|About course}}\\ {{:public:t-malv-10-3:introduction.pdf|Introduction}} | [[https://www.youtube.com/watch?v=nfoudtpBV68&list=PL6397E4B26D00A269|Introduction ]] | [[ https://docs.python.org/3.4/tutorial/ | Python Tutorial]]\\ [[ http://www.nltk.org/book/ch01.html|Language Processing]] | [[ https://www.youtube.com/watch?v=xSbNZrc-oAA | Install Python on Windows ]] | 1 | | 2 | Chapter 2 | {{:public:t-malv-15-3:regexautomata.pdf|Regular Expressions and Automata}} | [[https://www.youtube.com/watch?v=hwDhO1GLb_4|Regular expressions ]] \\ [[ https://www.youtube.com/watch?v=GwsU2LPs85U | Convert Regular Expression to Finite-State Automaton ]]| [[ http://www.nltk.org/book/ch02.html|Accessing Text Corpora and Lexical Resources]] | | 2 | | 3 | Chapter 3.1-3.5, 3.9 | {{:public:t-malv-15-3:wordstransducers.pdf | Words and Transducers}} \\ {{:public:t-malv-15-3:textprocessingtools.pdf | Various text processing tools}} | [[ https://www.youtube.com/watch?v=jBk24DI8kg0 | Tokenisation ]] \\ [[ https://www.youtube.com/watch?v=2s7f8mBwnko | Normalisation and Stemming]] \\ [[ https://www.youtube.com/watch?v=di0N3kXfGYg | Sentence segmentation]] | [[http://www.nltk.org/book/ch03.html | Processing Raw Text]] | [[ https://www.youtube.com/watch?v=FLZvOKSCkxY | Tokenizing words and Sentences ]] \\ [[ https://www.youtube.com/watch?v=w36-U-ccajM | Stop words ]] \\ [[ https://www.youtube.com/watch?v=yGKTphqxR9Q | Stemming ]] | 3 | | 4 | Chapter 4.1 - 4.5.1 | {{:public:t-malv-15-3:ngrams.pdf | N-grams}} | [[https://www.youtube.com/watch?v=s3kKlUBa3b0 | Introduction to N-grams]] \\ [[https://www.youtube.com/watch?v=o-CvoOkVrnY | Estimating N-gram Probabilities]] \\ [[https://www.youtube.com/watch?v=OHyVNCvnsTo | Evaluation and Perplexity]] \\ [[https://www.youtube.com/watch?v=s5Yg6qac9ag | Generalisation and zeros]] \\ [[https://www.youtube.com/watch?v=d8nVJjlMOYo | Smoothing]] | [[http://www.nltk.org/howto/collocations.html | Collocations]] | | 4 | | 5 | Chapter 5.1-5.8 | {{:public:t-malv-15-3:pos-tagging.pdf | POS tagging}} | [[https://www.youtube.com/watch?v=LivXkL2DO_w | An Intro to Parts of Speech and POS Tagging]] \\ [[ https://www.youtube.com/watch?v=RIYQD8zF2e0 | Some Methods and Results on Sequence Models for POS Tagging ]] \\ [[ https://www.youtube.com/watch?v=kHvoHUGUitQ | The Tagging Problem ]] \\ [[ https://www.youtube.com/watch?v=gLofvoZXaNI | Generative models ]] \\ [[ https://www.youtube.com/watch?v=98TARXun1xA | Hidden Markov Models (HMMs) ]] \\ [[ https://www.youtube.com/watch?v=djtnIQT1bYg | Parameter Estimation in HMMs ]] | [[http://www.nltk.org/book/ch05.html | Categorizing and Tagging Words]] | [[https://www.youtube.com/watch?v=6j6M2MtEqi8 | Part of Speech Tagging]] | 5 | | 6 | Chapter 12.1-12.4, 12.7, 13.1-13.3 | {{:public:t-malv-15-3:formalgrammars.pdf | Formal grammars}} \\ {{:public:t-malv-15-3:parsing.pdf | Parsing}} | [[ https://www.youtube.com/watch?v=ntF5cFZq1PA | The parsing problem (1)]] [[ https://www.youtube.com/watch?v=l9dvl1aIO8M | The parsing problem (2)]] \\ [[ https://www.youtube.com/watch?v=9XKUcm8au4U | Context-free grammars (1) ]] \\ [[ https://www.youtube.com/watch?v=gtNfZZaA9EQ | Context-free grammars (2)]] \\ [[ https://www.youtube.com/watch?v=dVbpROvMWhs | Simple English Grammar (1)]] \\ [[ https://www.youtube.com/watch?v=9Cl-uuTHwkk | Simple English Grammar (2) ]] \\ [[ https://www.youtube.com/watch?v=h-J6i9qgAog | Examples of Ambiguity ]] \\ [[ https://www.youtube.com/watch?v=EVgwR9jlIaU | Two views of syntactic structure ]] | [[http://www.nltk.org/book/ch08.html | Analyzing Sentence Structure ]] | | 6 | | 7 | Chapter 13.4-13.5, 14 | {{:public:t-malv-15-3:parsing.pdf | Parsing}} \\ {{:public:t-malv-15-3:partialparsing-chunking.pdf | Partial Parsing}} \\ {{:public:t-malv-15-3:statisticalparsing.pdf | Statistical Parsing}} | [[ https://www.youtube.com/watch?v=YaXpVT9Q_0o | An exponential number of attachments]] \\ [[https://www.youtube.com/watch?v=YQHj4w-sKwQ | CFGs and PCFGs]] \\ [[ https://www.youtube.com/watch?v=PLCpYgq2De8 | Lexicalization of PCFGs ]] \\ [[ https://www.youtube.com/watch?v=IOOfn5nmtT8 | The model of Charniak]] \\ [[ https://www.youtube.com/watch?v=E7U2E1uHsJY | PCFG Independence Assumptions ]] \\ [[ https://www.youtube.com/watch?v=mMXgbrts82M | Constituency Parser Evaluation ]] |[[http://www.nltk.org/book/ch08.html | Analyzing Sentence Structure ]] \\ [[http://www.nltk.org/book/ch07.html | Extracting Information from Text]] | [[ https://www.youtube.com/watch?v=imPpT2Qo2sk | Chunking ]] | 7 | | 8 | Chapter 17.1-3\\ Chapter 18.1-2 | {{:public:t-malv-15-3:malv-chapter17-representingmeaning.pdf|The Representation of Meaning}}\\ {{:public:t-malv-15-3:malv-chapter18-computationalsemantics.pdf|Computational Semantics}} | [[https://www.youtube.com/watch?v=8QZWx_XAO1w|Semantics an Overview]]\\ [[https://www.youtube.com/watch?v=XLvv_5meRNM&list=PLRIMXVU7SGRJF8gxD70oZPBoFAYxGs4QL&index=5|Semantics and Pragmatics - Sentence Semantics]] | [[http://www.nltk.org/book/ch10.html|Analyzing the Meaning of Sentences]] | | 8 | | 9 | Chapter 21.1-5| {{:public:t-malv-15-3:malv-chapter21-computationaldiscourse.pdf|Computational Discourse}}\\ {{:public:t-malv-15-3:malv-chapter21-computationaldiscoursereference.pdf|Reference}} | | | | 9 | | 10 | Chapter 24.1\\ [[http://www.ru.is/~hannes/publications/HICSS2005.pdf|Vilhjálmsson 2005]]\\ [[http://www.ru.is/~hannes/publications/KBS2001.pdf|Cassell et al. 2001]] | {{:public:t-malv-15-3:malv-chapter24-conversation.pdf|Conversation}}\\ {{:public:t-malv-15-3:malv-nonverbalbehavior.pdf|Nonverbal Behavior}} | | | | 10 | | 11 | Chapter 24.2 | {{:public:t-malv-15-3:malv-chapter24-dialog.pdf|Dialog Systems}} | | | | 11 | | 12 | Chapter 9.1-6 | {{:public:t-malv-15-3:malv-chapter9-automaticspeechrecognition.pdf|Automatic Speech Recognition}} | | | | 12 No 12th lab ass. \\ FRIDAY: Final Project Presentations! | | 13 | | {{:public:t-malv-15-3:nlp-review.pdf|Material for final exam}} | | | | | ^^^^^^ ===== Other material ===== ^Topic^Title^Link^ |Lexical analysis / Regex matching| JFlex | [[http://jflex.de/]] | |Lexical analysis / Regex matching| Basic tokenizer for Icelandic | [[http://www.ru.is/kennarar/hrafn/courses/nlp/IceBasic.zip]] | |Regular expressions| grep tutorial | [[http://www.uccs.edu/~ahitchco/grep/]] | |Regular expressions| sed tutorial | [[http://www.grymoire.com/Unix/Sed.html]] | |Finite state transducers | HFST | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ ]] | |Finite state transducers | foma | [[https://code.google.com/p/foma/ ]] |