======T-538-MALV and T-725-MALV, Natural Language Processing Fall 2010====== ===== Basic Info ===== * **Instructors: ** [[http://www.ru.is/kennarar/hrafn|Hrafn Loftsson]] and [[http://www.ru.is/kennarar/hannes|Hannes Högni Vilhjálmsson]] * **Contact: ** Office at V2.07, 599 6227, {hrafn, hannes}@ * **Classes When: ** Mondays 13:10-14:45 and Thursdays 10:20-11:55 * **Classes Where: ** M1.05 ===== Description ===== The goal of language technology (LT) is to develop systems which allow people to communicate with computers using natural languages. LT is an interdisciplinary field, requiring knowledge from subjects like linguistics, statistics, psychology, engineering and computer science. This course discusses fundamentals of natural language processing (NLP), which is one of the subfields of LT, and introduces research in the field. Students acquire understanding of the various stages of NLP, e.g. morphological analysis, part-of-speech tagging, syntactic analysis, semantic analysis, discourse and dialogue. In the course, students work on programming projects related to the aforementioned stages. ===== Learning Outcome ===== On completing the course, students should: * know the main methods of processing required for computers to analyse and understand texts in a human language * understand the strengths and weaknesses of current Natural Language Processing (NLP) technology * know the main models and algorithms used in NLP, such as in morphological analysis, part-of-speech tagging, parsing, semantic analysis, and discourse and dialogue analysis * know at least one programming language suitable for text processing * be able to write simple NLP applications and present their work both orally and in writing * be able to evaluate the performance/accuracy of NLP systems * be aware of current research in NLP ===== Course Assessment ===== The course assessment is as follows: ^Part of Course^Total Weight^ |Three individual projects/assignments; 3*10% | 30%| |A final project (can be worked on in a group of two students) | 30%| |Participation in class | 10%| |A final written exam | 30%| ^ Total 100% ^^ To provide a rich hands-on experience, students will build their own application (a final project) that relies on NLP over the course of the semester. A number of project proposals will be provided by the instructors, but students are also encourages to come up with their own ideas. Three homework projects/assignments will also be distributed during the semester to reinforce some of the more theoretical material. Everything that has to be turned in should arrive no later than at 23:59 on the due date, or else incur 10% penalty for each additional day, including weekends and holidays. Projects are not accepted if handed in more than two days late. General participation in discussions in class count towards a special participation grade. In addition, M.Sc. level students need to prepare and give one presentation on an existing research paper in the field, which also counts towards their participation grade. ===== Assignments and Projects ===== ^Assignment^Assigned^Due^Duration^Weight^ |Assignment 1 | Mon 27. Sep. | Thu 07. Oct | 10 days | 10% | |Assignment 2 |Mon 11. Oct. | Thu 21. Oct | 10 days | 10% | |Assignment 3 |Thu 11. Nov. | Thu 18. Nov | 7 days | 10% | |Final project |Mon 18. Oct. | Mon 29. Nov | 6 weeks | 30% | ^ ^^^^Total 60% ^ ===== Final projects ===== ^Students^Project^ |Arnþór and Kristinn | 1.14 Question-Answering System | |Gunnar and Ragnar | 1.19 Generating an Image from a Natural Language Description | |Angelo and Lorenzo | 1.16 A Simple Embodied Dialog System | |Andreas and Sebastian | 1.7 An unknown word guesser | |Danilo and Steffen | 1.14 Question-Answering System | |Marcel and Simone | 1.4 N-gram based text categorization | |Daníel and Eiríkur Fannar | 1.8 Automatic thesaurus extraction | |Kristján and Skúli | 1.1 Grammar checking | |Carmine and Luis | 1.18 Generate Visual Reference into an Existing 3D Scene | |Arnór and Haukur | 1.3 Named Entity Recognition (NER) | |Niccolo and Stefán | 1.12 Intelligent Computer-Assisted Language Learning (ICALL) | ===== Quizzes ===== ^No^Date^Solution^ |1|Sep. 16 | | |2|Sep. 23 | | |3|Sep. 30 | | |4|Oct. 10 | | |5|Oct. 21 | | |6|Nov. 01 | | |7|Nov. 11 | | |8|Nov. 22 | | ^^^ ===== Final Exam ===== There will be a final written exam counting 30% towards your grade. ===== Online Discussion Forum ===== The course has an online discussion forum that we can use in any way we see fit. Note that the students have to register on this forum to post their replies (simply go to the address below to register). ^Host^Forum Name^Location^ |ProBoards|NLP2010|[[http://ruclasses.proboards.com/index.cgi?board=nlp2010]]| ^ ^^^ ===== Lectures ===== ^Week^Date^Topic^Textbook or supplementary material^Recordings^Who^ |1|thu 09/09| {{:public:t-malv-10-3:aboutthiscourse.pdf|About the course}} and {{:public:t-malv-10-3:introduction.pdf|Introduction}} | Chapter 1 | [[http://www.ru.is/faculty/hrafn/recordings/nlp/intro.zip]] | Hrafn | |1|mon 13/09| {{:public:t-malv-10-3:corpora.pdf|Corpora}} and {{:public:t-malv-10-3:finitestate.pdf|Finite-state automata}} | Chapters 2.1-2.2 | | Hrafn | |2|thu 16/09| {{:public:t-malv-10-3:regex.pdf|Regular expressions}} | Chapters 2.3-2.4 | | Hrafn | |2|mon 20/09| {{:public:t-malv-10-3:perl.pdf|The programming language Perl}} | http://www.ebb.org/PickingUpPerl/pickingUpPerl.pdf | | Hrafn | |3|thu 23/09| {{:public:t-malv-10-3:tokenisation.pdf|Tokenisation}} | Chapters 4.1-4.3. Chapter 2 in "Handbook of Natural Language Processing" | | Hrafn | |3|mon 27/09| {{:public:t-malv-10-3:n-grams.pdf|Word counting and N-grams}} | Chapters 4.4-4.7 | | Hrafn | |4|thu 30/09| {{:public:t-malv-10-3:morphology.pdf|Morphology}} | Chapter 5| | Hrafn | |4|mon 04/10 | {{:public:t-malv-10-3:lexc.pdf|Lexicon Compiler}} | http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ | | Hrafn | |4|mon 04/10 | {{:public:t-malv-10-3:presentation-kristinn.pdf|Statistical Identification of a language}} | | | **Kristinn** | |4|mon 04/10 | {{:public:t-malv-10-3:mixedtrigrams.pdf|A Mixed Trigrams Approach for Context Sensitive Spell Checking}} | | | **Hjalti** | |5|thu 07/10| {{:public:t-malv-10-3:tagging_rules.pdf|POS tagging - with rules}} | Chapters 6.1-6.3| | Hrafn | |5|thu 07/10| {{:public:t-malv-10-3:taggingicelandictext.pdf|Tagging Icelandic text: A linguistic rule-based approach}} | | | **Sebastian**| |5|mon 11/10| {{:public:t-malv-10-3:tagging_statistics.pdf|POS tagging - with statistics}}| Chapters 7.1, 7.2.1-7.2.2 | {{:public:t-malv-10-3:nlp-pos-tagging-stat.avi|avi}} {{:public:t-malv-10-3:nlp-pos-tagging-stat2.avi|avi(2)}} | Hrafn | |6|thu 14/10| {{:public:t-malv-10-3:syntax.pdf|Syntax analysis}} | Chapters 9.1-9.4 and 9.7 in "Speech and Language Processing" | | Hrafn | |6|thu 14/10| {{:public:t-malv-10-3:pos_tagging_for_german.pdf|POS Tagging for German: How Important is the Right Context?}} | | | **Steffen**| |6|thu 14/10| {{:public:t-malv-10-3:fyrirlesturcombotagger.pdf|Tagging Icelandic text: An experiment with integrations and combinations of taggers}} | | | **Stefán**| |6|mon 18/10| {{:public:t-malv-10-3:cfg_and_prolog.pdf|Context-free grammar and Prolog}} | Chapters 8.1-8.4 | | Hrafn | |7|thu 21/10| {{:public:t-malv-10-3:partialparsing.pdf|Partial parsing}}| Chapters 9.1, 9.3-9.4, 9.6, 9.9 | | Hrafn | |7|mon 25/10| Partial parsing | Chapters 9.1, 9.3-9.4, 9.6, 9.9 | | Hrafn | |7|mon 25/10| {{:public:t-malv-10-3:daniel_iceparser.pdf|IceParser: An Incremental Finite-State Parser for Icelandic}} | | | **Daníel** | |7|mon 25/10| {{:public:t-malv-10-3:presentation_voellger.pdf|An Open Source Tool for Partial Parsing and Morphosyntactic Disambiguation}} | | | **Andreas** | |8|thu 28/10| {{:public:t-malv-10-3:parsingtechniques.pdf|Parsing techniques}} | Chapters 11.1-11.4, 11.5.0 | | Hrafn | |8|mon 01/11| {{:public:t-malv-10-3:semantics.pdf|Semantics and predicate logic}} | Chapters 8.7, 12.1-12.9 | | Hrafn | |9|thu 04/11| {{:public:t-malv-10-3:lexsemmalvinnslamjw.pdf|Lexical semantics}} | Chapters 13.1-13.5 | | Matthew Whelpton | |9|mon 08/11| **Final projects: Status report/presentation (M1.05, 09:00-11:00)** | | | **All students** | |9|mon 08/11| {{:public:t-malv-10-3:discourse.pdf|Discourse and reference resolution}} | Chapter 14 | | Hannes | |10|thu 11/11| {{:public:t-malv-10-3:information.pdf|Information structure and newness of information}} | (Brown and Yule 1983) {{:public:t-malv-09-3:brown_yule_1983_chapter4.pdf|Sec. 4.1-4.2}} + ([[http://www.ling.upenn.edu/~ellen/givennew.pdf|Prince 1981]]) Sec. 1-3 | | Hannes | |10|thu 11/11| {{:public:t-malv-10-3:presentation_paper_angelo.pdf|Building Effective Question and Answering Characters}} | | | **Angelo** | |10|mon 15/11| {{:public:t-malv-10-3:discstructure.pdf|Discourse structure and discourse markers}} | Chapters 14.6, 14.8 + (Allen 1995) {{:public:t-malv-09-3:allen_1995_chapter16.pdf|16.1-16.3}} | | Hannes | |10|mon 15/11| {{:public:t-malv-10-3:presentation_paper_carmine.pdf|Generating Dialogues Between Virtual Agents Automatically from Text}} | | | **Carmine** | |11|thu 18/11| {{:public:t-malv-10-3:grounding.pdf|Dialog Systems, Speech Acts and Grounding}} | Chapter 15 | | Hannes | |11|thu 18/11| {{:public:t-malv-10-3:presentation_paper_luis.pdf|Towards a model of face-to-face grounding}} | | | **Luis** | |11|mon 22/11| {{:public:t-malv-10-3:nonverbal.pdf|The role of non-verbal behaviour in communication}} | ({{:public:t-malv-09-3:hicss2005.pdf|Vilhjálmsson 2005}}) | | Hannes | |11|mon 22/11| {{:public:t-malv-10-3:presentation_paper_lorenzo.pdf|Augmenting Online Conversation through Automatic Discourse Tagging}} | | | **Lorenzo** | |11|mon 22/11| {{:public:t-malv-10-3:presentation_paper_danilo.pdf|More Than Just a Pretty Face: Conversational Protocols and the Affordances of Embodiment}} | | | **Danilo** | |12|thu 25/11| {{:public:t-malv-10-3:jongudnasonspeechanalysis.pdf|Speech analysis}}, speech synthesis and dialogue systems | Chapter 15 | | Jón Guðnason | |12|mon 29/11| **Final projects: Demo, M1.09, 13:00-17:00** | | | **All students** | |+|wed 8/12| {{:public:t-malv-10-3:presentation_paper_niccolo.pdf|Semantic and Discourse Information for Text-to-Speech Intonation}} | | | **Niccolo** | |+|wed 8/12| {{:public:t-malv-10-3:review.pdf|Course Material Review}}\\ **M1.02, 10:20-11:55** | | | Hrafn, Hannes | ^^^^^^ ===== A selection of papers ===== ^Topic^Title^Link^ |N-grams| Statistical Identification of Language | [[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.1958]] | |N-grams| N-Gram-Based Text Categorization | [[http://citeseer.ist.psu.edu/68861.html]] | |N-grams| A Mixed Trigrams Approach for Context Sensitive Spell Checking | [[http://nlp.cs.uic.edu/PS-papers/spell-cicling07.pdf ]] | |Morphology | Applications of Finite-State Transducers in Natural Language Processing | [[http://www2.parc.com/istl/members/karttune/publications/ciaa-2000/fst-in-nlp.pdf]] | |Morphology | Constructing Lexical Transducers | [[http://citeseer.ist.psu.edu/443780.html]] | |Morphology | Guessing Morphological Classes of Unknown German Nouns | [[http://nats-www.informatik.uni-hamburg.de/~vhahn/Downloads/RANLP03.pdf]] | |Morphology | Automatic Rule Induction for Unknown Word Guessing | [[http://portal.acm.org/citation.cfm?id=972708]] | |Morphology | A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI) | [[http://www.springerlink.com/content/h530q7157285563u/]] | |POS tagging | Tagging Icelandic txt: an experiment with integrations and combinations of taggers | [[http://nlp.ru.is/publications.htm]] | |POS tagging | Tagging Icelandic text: A linguistic rule-based approach | [[http://nlp.ru.is/publications.htm]] | |POS tagging | TnT - A Statistical Part-of-Speech Tagger | [[http://citeseer.ist.psu.edu/brants00tnt.html]] | |POS tagging | Comparing a Linguistic and a Stochastic Tagger | [[http://acl.ldc.upenn.edu/P/P97/P97-1032.pdf]] | |POS tagging | A simple rule-based part of speech tagger | [[http://portal.acm.org/citation.cfm?id=974526]] | |POS tagging | POS Tagging for German: How Important is the Right Context? | [[http://www.lrec-conf.org/proceedings/lrec2008/pdf/253_paper.pdf]] | |Parsing | Treebank Grammars | [[ http://www.nlp.org.cn/docs/docredirect.php?doc_id=25]] | |Parsing | Exploring Evidence for Shallow Parsing | [[http://acl.ldc.upenn.edu/W/W01/W01-0706.pdf]] | |Parsing | Text Chunking using Transformation-Based Learning | [[http://acl.ldc.upenn.edu/W/W95/W95-0107.pdf]] | |Parsing | IceParser: An Incremental Finite-State Parser for Icelandic | [[http://nlp.ru.is/publications.htm]] | |Parsing | An Open Source Tool for Partial Parsing and Morphosyntactic Disambiguation | [[http://www.mimuw.edu.pl/~olekz/ilnet/il/spejd/doc/spade.pdf]] | |Parsing | Statistical Techniques for Natural Language Parsing | [[http://citeseer.ist.psu.edu/286958.html]] | |Discourse and Dialogue | Augmenting Online Conversation through Automatic Discourse Tagging | [[http://www.ru.is/faculty/hannes/publications/HICSS2005.pdf]] | |Discourse and Dialogue | Generating Dialogues Between Virtual Agents Automatically from Text | [[http://www.springerlink.com/index/p6265q6h81312001.pdf]] | |Discourse and Dialogue | More Than Just a Pretty Face: Conversational Protocols and the Affordances of Embodiment | [[http://www.ru.is/faculty/hannes/publications/KBS2001.pdf]] | |Discourse and Dialogue | Towards a model of face-to-face grounding | [[http://www.springerlink.com/index/p6265q6h81312001.pdf]] | |Discourse and Dialogue | Building Effective Question and Answering Characters | [[http://www.aclweb.org/anthology-new/W/W06/W06-1303.pdf]] | |Discourse and Dialogue | Semantic and Discourse Information for Text-to-Speech Intonation | [[ http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.4835]] | =====Other material===== ^Title^ | {{:public:t-malv-10-3:icelandictagset.pdf|The Icelandic tagset}} | | Example lexicon for HFST: [[http://www.ru.is/faculty/hrafn/data/ice-lexc.txt]]| | {{:public:t-malv-10-3:bigram.zip|Bigram tagging example}} | | Implementing Regular Expressions: [[http://swtch.com/~rsc/regexp/]] [[http://linuxgazette.net/issue27/mueller.html]]| | The Penn Treebank tagset: [[http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html]]| | Icelandic Grammar: [[http://en.wikipedia.org/wiki/Icelandic_grammar]]| | The Database of Icelandic Inflections: [[http://bin.arnastofnun.is]]| | Icelandic PoS tagging and parsing demo (IceNLP): [[http://nlp.cs.ru.is/icenlp.htm]] | | Icelandic-English machine translation demo: [[http://nlp.cs.ru.is/is-en.htm]] | | English PoS tagging Demo (Penn Treebank tagset): [[http://l2r.cs.uiuc.edu/~cogcomp/pos_demo.php]] | | English Constraint Grammar Demo: [[http://www2.lingsoft.fi/cgi-bin/engcg]] | | Constraint Grammar Development: [[http://visl.sdu.dk/constraint_grammar.html]] | | English Partial Parsing Demo: [[http://l2r.cs.uiuc.edu/~cogcomp/demo.php?dkey=SP]] | | Dialog Act Coding Schemes: [[http://www.dfki.de/mate/d11/chap4.html]] | | DAMSL Dialog Act Markup Scheme: [[http://www.cs.rochester.edu/research/cisd/resources/damsl/RevisedManual/RevisedManual.html]] | | Dialog Corpora Database: [[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm]] | | The CSLU Spoken Language System Toolkit: [[http://www.cslu.ogi.edu/toolkit/]] | | CADIA BML Realizer: [[http://cadia.ru.is/projects/bmlr/]] |