======T-538-MALV and T-725-MALV, Natural Language Processing Fall 2008====== ===== Basic Info ===== * **Instructors: ** [[http://www.ru.is/hrafn|Hrafn Loftsson]] and [[http://www.ru.is/kennarar/hannes|Hannes Högni Vilhjálmsson]] * **Contact: ** Office at Kringlan 1, 599 6227, {hrafn, hannes}@ * **Classes When: ** Mondays 8:15-9:50 and Wednesdays 8:15-9:50 * **Classes Where: ** Kringlan 1, Room K-6 ===== Description ===== The goal of language technology (LT) is to develop systems which allow people to communicate with computers using natural languages. LT is an interdisciplinary field, requiring knowledge from subjects like linguistics, statistics, psychology, engineering and computer science. This course discusses fundamentals of natural language processing (NLP), which is one of the subfields of LT, and introduces research in the field with regard to the Icelandic language. Students acquire understanding of the various stages of NLP, e.g. morphological analysis, part-of-speech tagging, syntactic analysis, semantic analysis, discourse and dialogue. In the course, students work on programming projects related to the aforementioned stages. ===== Goals ===== The course objectives are that students: * Know the main methods used in the field of natural language processing * Are familiar with the main research areas in the field * Are able to implement a system which processes a natural language ===== Coursework Overview ===== To provide a rich hands-on experience, students will build their own application that relies on NLP over the course of the semester. While the choice of final application is completely in the hands of the students, important NLP components will be built in a series of programming projects that lead up to the final system demonstration. Three homework assignments will also be distributed during the semester to reinforce some of the more theoretical material. Everything that has to be turned in, including the programming projects, should arrive no later than at 23:59 on the due date, or else incur 10% penalty for each additional day, including weekends and holidays. General participation in discussions, online and in class, count towards a special participation grade. In addition, M.Sc. level students will be asked to prepare and give one presentation on an existing research paper in the field, which also counts towards their participation grade. ===== Assignments and Projects ===== ^Assignment^Code^Description^Assigned^Due^Duration^Weight^Discuss^ |Homework Assignment 1 |A1|Regular Expressions |W 3. Sep |W 10. Sep | 8 days | 5% | | |Programming Project 1 |P1|Tokenizing text |W 10. Sep |W 24. Sep | 15 days | 8% - 10% | | |Homework Assignment 2 |A2|Tagging |M 29. Sep |M 13. Oct | 15 days | 5% | | |Programming Project 2 |P2|Tagging text |W 1. Oct |M 20. Oct | 14 days | 8% - 10% | | |Programming Project 3 |P3|Parsing Text |M 20. Oct |W 29. Oct | 10 days | 8% - 10% | | |Homework Assignment 3 |A3|Discourse analysis |W 29. Oct |W 5. Nov | 8 days | 5% | | |Programming Project 4 |P4|Discourse model |M 3. Nov |W 12. Nov | 10 days | 8% - 10% | | |[[Programming Project 5]] |P5|Application |W 12. Nov |F 28. Nov | 18 days | 8% - 10% | | ^ Total 55% ^^^^^^^^ ===== Final Exam ===== There will be a final written exam. An exam preparation document will be posted here closer to the exam date. ===== Discussion Questions ===== After every lecture, the presenter will post a discussion question on an online forum and the students will be asked to contribute to the discussion of that topic until the following lecture. The discussion takes place on an external forum page at the following address. Note that the students have to register on this forum to post their replies (simply go to the address below to register). ^Host^Forum Name^Address^Discussion Questions^ |ProBoards|Málvinnsla|http://malv2008.proboards57.com/|[[http://malv2008.proboards57.com/index.cgi?board=questions|Read Questions]]| ^ ^^^^ ===== Schedule ===== ^Date^Topic^Material^Who^Due^ |mon 25/08| {{public:t-malv-08-3:introduction.pdf|Introduction}} | Chapter 1 | Both | | |wed 27/08| {{public:t-malv-08-3:corpora.pdf|Corpora and finite-state automata}} | Chapters 2.1-2.2. | Hrafn | | |mon 01/09| {{public:t-malv-08-3:regex.pdf|Regular expressions}} | Chapters 2.3-2.4. | Hrafn | | |wed 03/09| {{public:t-malv-08-3:perl.pdf|The Perl programming language}} | [[http://www.ebb.org/PickingUpPerl/pickingUpPerl.pdf|PickingUpPerl]] | Hrafn | | |mon 08/09| {{public:t-malv-08-3:tokenisation.pdf|Tokenisation}} | Chapters 4.1-4.3. Chapter 2 in "Handbook of Natural Language Processing" | Hrafn | | |wed 10/09|{{public:t-malv-08-3:n-grams.pdf| Word counting and n-grams}} | Chapters 4.4-4.7. | Hrafn | A1 | |mon 15/09| {{public:t-malv-08-3:morphology.pdf|Morphology}} | Chapter 5. | Hrafn | | |wed 17/09| {{public:t-malv-08-3:lexc.pdf|Lexicon Compiler}} | | Hrafn | | |mon 22/09| //No class on this date// |||| |wed 24/09| {{public:t-malv-08-3:tagging_rules.pdf|POS tagging - with rules}} | Chapters 6.1-6.3. | Hrafn | | |wed 24/09| {{public:t-malv-08-3:tagging_icelandic_text.pdf|Tagging Icelandic text: A linguistic rule-based approach}} | **Student lecture** | **Haukur** | | |mon 29/09| {{public:t-malv-08-3:tagging_statistics.pdf|POS tagging - with statistics}} | Chapters 7.1, 7.2.1-7.2.2. | Hrafn | P1 | |wed 01/10| {{public:t-malv-08-3:linguistic_stochastic_tagger.pdf|Comparing a Linguistic and a Stochastic Tagger}} | **Student lecture** | **Gunnar** | | |wed 01/10| {{public:t-malv-08-3:syntax.pdf|Syntax analysis}} | Chapters 9.1-9.4, 9.7 in "Speech and Language Processing". | Hrafn | | |mon 06/10| //Midterm break// |||| |wed 08/10| //Midterm break// |||| |mon 13/10| {{public:t-malv-08-3:cfg_and_prolog.pdf|Context-free grammar and Prolog}} | Chapters 8.1-8.4. | Hrafn | A2 | |wed 15/10| {{:public:t-malv-08-3:partialparsing.pdf|Partial parsing}} | Chapters 9.1, 9.3-9.4, 9.6, 9.9. | Hrafn | | |wed 15/10| {{:public:t-malv-08-3:matthewiceparser.pdf|IceParser: An Incremental Finite-State Parser for Icelandic}} | **Student lecture** | **Matthew** | | |mon 20/10| Partial parsing | Chapters 9.1, 9.3-9.4, 9.6, 9.9. | Hrafn | P2 | |mon 20/10| {{:public:t-malv-08-3:martha_shallow_parsing.pdf|Exploring Evidence for Shallow Parsing}} | **Student lecture** | **Martha** | | |wed 22/10| {{:public:t-malv-08-3:parsingtechniques.pdf|Parsing techniques}} | Chapters 11.1-11.4, 11.5.0. | Hrafn | | |mon 27/10| {{:public:t-malv-08-3:semantics.pdf|Semantics and predicate logic}} | Chapters 8.7, 12.1-12.9 | Hrafn | | |wed 29/10| {{:public:t-malv-08-3:discourse.pdf|Discourse and reference resolution}} | Chapters 14.1-14.5, 14.7 (skip 14.7.4) + (Brown and Yule 1983) [[https://myschool.ru.is/myschool/?Page=Download&ID=17417&Act=3&File=brown%5Fyule%5F1983%5Fchapter1%2Epdf|Sec. 1.1, 1.3]] | Hannes | P3 | |mon 03/11| {{:public:t-malv-08-3:information.pdf|Information structure and newness of information}} | (Brown and Yule 1983) [[https://myschool.ru.is/myschool/?Page=Download&ID=17417&Act=3&File=brown%5Fyule%5F1983%5Fchapter4%2Epdf|Sec. 4.1-4.2]] + ([[http://www.ling.upenn.edu/~ellen/givennew.pdf|Prince 1981]]) Sec. 1-3 | Hannes | | |wed 05/11| {{:public:t-malv-08-3:discstructure.pdf|Discourse structure and discourse markers}} | Chapters 14.6, 14.8 + (Allen 1995) {{:public:t-malv-08-3:allen_1995_chapter16.pdf|16.1-16.3}} | Hannes | A3 | |wed 05/11| T2D: Generating Dialogues Between Virtual Agents Automatically from Text | **Student lecture** | **Andri** | | |mon 10/11| {{:public:t-malv-08-3:grounding.pdf|Adjacency pairs, speech acts and grounding in dialogue}} | Chapter 15. | Hannes | | |wed 12/11| {{:public:t-malv-08-3:nonverbal.pdf|The role of nonverbal behaviour in communication.}} | ([[http://jls.sagepub.com/cgi/content/abstract/19/2/163|Bavelas and Chovil 2000]]) | Hannes | P4 | |wed 12/11| {{:public:t-malv-08-3:brynjar-spark-presentation.pdf|Augmenting Online Conversation through Automatic Discourse Tagging}} | **Student lecture** | **Brynjar** | | |mon 17/11| Embodied Conversational Agents Systems (e.g. {{:public:t-malv-08-3:beatcomplete.pdf|The BEAT Tool}}) | ([[http://www.ru.is/faculty/hannes/publications/IITSEC2004.pdf|Johnson, et al. 2004]]), ([[http://www.ru.is/faculty/hannes/publications/siggraph2001.pdf|Cassell et al. 2001]]) | Hannes | | |wed 19/11| Review of Discourse Assignment/Project, {{:public:t-malv-08-3:discourse_dialog_examtopics.pdf|Exam Topics}} | None (but read [[http://www.ru.is/faculty/hannes/publications/KBS2001.pdf|the paper Birna is presenting]]) | Hannes | | |wed 19/11| {{:public:t-malv-08-3:birna_rea.pdf|More Than Just a Pretty Face: Conversational Protocols and the Affordances of Embodiment}} | **Student lecture** | **Birna** | | |mon 24/11| {{:public:t-malv-08-3:review.pdf|Review}} and discussion about the final exam | | Hrafn | | |fri 28/11| Final project demo | | all | P5 | ^ ^ ^ ^ ^ ^ ===== A selection of papers ===== ^Topic^Title^Link^ |N-grams| Statistical Identification of Language | [[http://citeseer.ist.psu.edu/dunning94statistical.html]] | |N-grams| N-Gram-Based Text Categorization | [[http://citeseer.ist.psu.edu/68861.html]] | |Morphology | Applications of Finite-State Transducers in Natural Language Processing | [[http://www.xrce.xerox.com/Publications/Attachments/2000-302/fst-in-nlp.pdf]] | |Morphology | Constructing Lexical Transducers | [[http://citeseer.ist.psu.edu/443780.html]] | |Morphology | Guessing Morphological Classes of Unknown German Nouns | [[http://nats-www.informatik.uni-hamburg.de/~vhahn/Downloads/RANLP03.pdf]] | |Morphology | Automatic Rule Induction for Unknown Word Guessing | [[http://portal.acm.org/citation.cfm?id=972708]] | |Morphology | A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI) | [[http://www.springerlink.com/content/h530q7157285563u/]] | |POS tagging | Tagging Icelandic txt: an experiment with integrations and combinations of taggers | [[http://nlp.ru.is/publications.htm]] | |POS tagging | Tagging Icelandic text: A linguistic rule-based approach | [[http://nlp.ru.is/publications.htm]] | |POS tagging | TnT - A Statistical Part-of-Speech Tagger | [[http://citeseer.ist.psu.edu/brants00tnt.html]] | |POS tagging | Comparing a Linguistic and a Stochastic Tagger | [[http://acl.ldc.upenn.edu/P/P97/P97-1032.pdf]] | |POS tagging | A simple rule-based part of speech tagger | [[http://portal.acm.org/citation.cfm?id=974526]] | |Parsing | Exploring Evidence for Shallow Parsing | [[http://acl.ldc.upenn.edu/W/W01/W01-0706.pdf]] | |Parsing | Text Chunking using Transformation-Based Learning | [[http://acl.ldc.upenn.edu/W/W95/W95-0107.pdf]] | |Parsing | IceParser: An Incremental Finite-State Parser for Icelandic | [[http://nlp.ru.is/publications.htm]] | |Parsing | Statistical Techniques for Natural Language Parsing | [[http://citeseer.ist.psu.edu/286958.html]] | |Discourse and Dialogue | Augmenting Online Conversation through Automatic Discourse Tagging | [[http://www.ru.is/faculty/hannes/publications/HICSS2005.pdf]] | |Discourse and Dialogue | Generating Dialogues Between Virtual Agents Automatically from Text | [[http://www.springerlink.com/index/p6265q6h81312001.pdf]] | |Discourse and Dialogue | More Than Just a Pretty Face: Conversational Protocols and the Affordances of Embodiment | [[http://www.ru.is/faculty/hannes/publications/KBS2001.pdf]] | |Discourse and Dialogue | Towards a model of face-to-face grounding | [[http://www.springerlink.com/index/p6265q6h81312001.pdf]] | |Discourse and Dialogue | Building Effective Question and Answering Characters | [[http://www.aclweb.org/anthology-new/W/W06/W06-1303.pdf]] | |Discourse and Dialogue | Semantic and Discourse Information for Text-to-Speech Intonation | [[ http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.4835]] | =====Grading===== ^Part of Course^Total Weight^ |Programming Project | 40%| |Participation | 15%| |Homework Assignments | 15%| |Final Written Exam | 30%| ^ Total 100% ^^