Table of Contents

T-538-MALV and T-725-MALV, Natural Language Processing Fall 2008

Basic Info

Description

The goal of language technology (LT) is to develop systems which allow people to communicate with computers using natural languages. LT is an interdisciplinary field, requiring knowledge from subjects like linguistics, statistics, psychology, engineering and computer science. This course discusses fundamentals of natural language processing (NLP), which is one of the subfields of LT, and introduces research in the field with regard to the Icelandic language. Students acquire understanding of the various stages of NLP, e.g. morphological analysis, part-of-speech tagging, syntactic analysis, semantic analysis, discourse and dialogue. In the course, students work on programming projects related to the aforementioned stages.

Goals

The course objectives are that students:

Coursework Overview

To provide a rich hands-on experience, students will build their own application that relies on NLP over the course of the semester. While the choice of final application is completely in the hands of the students, important NLP components will be built in a series of programming projects that lead up to the final system demonstration. Three homework assignments will also be distributed during the semester to reinforce some of the more theoretical material.

Everything that has to be turned in, including the programming projects, should arrive no later than at 23:59 on the due date, or else incur 10% penalty for each additional day, including weekends and holidays.

General participation in discussions, online and in class, count towards a special participation grade. In addition, M.Sc. level students will be asked to prepare and give one presentation on an existing research paper in the field, which also counts towards their participation grade.

Assignments and Projects

AssignmentCodeDescriptionAssignedDueDurationWeightDiscuss
Homework Assignment 1 A1Regular Expressions W 3. Sep W 10. Sep 8 days 5%
Programming Project 1 P1Tokenizing text W 10. Sep W 24. Sep 15 days 8% - 10%
Homework Assignment 2 A2Tagging M 29. Sep M 13. Oct 15 days 5%
Programming Project 2 P2Tagging text W 1. Oct M 20. Oct 14 days 8% - 10%
Programming Project 3 P3Parsing Text M 20. Oct W 29. Oct 10 days 8% - 10%
Homework Assignment 3 A3Discourse analysis W 29. Oct W 5. Nov 8 days 5%
Programming Project 4 P4Discourse model M 3. Nov W 12. Nov 10 days 8% - 10%
Programming Project 5 P5Application W 12. Nov F 28. Nov 18 days 8% - 10%
Total 55%

Final Exam

There will be a final written exam. An exam preparation document will be posted here closer to the exam date.

Discussion Questions

After every lecture, the presenter will post a discussion question on an online forum and the students will be asked to contribute to the discussion of that topic until the following lecture. The discussion takes place on an external forum page at the following address. Note that the students have to register on this forum to post their replies (simply go to the address below to register).

HostForum NameAddressDiscussion Questions
ProBoardsMálvinnslahttp://malv2008.proboards57.com/Read Questions

Schedule

DateTopicMaterialWhoDue
mon 25/08 Introduction Chapter 1 Both
wed 27/08 Corpora and finite-state automata Chapters 2.1-2.2. Hrafn
mon 01/09 Regular expressions Chapters 2.3-2.4. Hrafn
wed 03/09 The Perl programming language PickingUpPerl Hrafn
mon 08/09 Tokenisation Chapters 4.1-4.3. Chapter 2 in “Handbook of Natural Language Processing” Hrafn
wed 10/09 Word counting and n-grams Chapters 4.4-4.7. Hrafn A1
mon 15/09 Morphology Chapter 5. Hrafn
wed 17/09 Lexicon Compiler Hrafn
mon 22/09 No class on this date
wed 24/09 POS tagging - with rules Chapters 6.1-6.3. Hrafn
wed 24/09 Tagging Icelandic text: A linguistic rule-based approach Student lecture Haukur
mon 29/09 POS tagging - with statistics Chapters 7.1, 7.2.1-7.2.2. Hrafn P1
wed 01/10 Comparing a Linguistic and a Stochastic Tagger Student lecture Gunnar
wed 01/10 Syntax analysis Chapters 9.1-9.4, 9.7 in “Speech and Language Processing”. Hrafn
mon 06/10 Midterm break
wed 08/10 Midterm break
mon 13/10 Context-free grammar and Prolog Chapters 8.1-8.4. Hrafn A2
wed 15/10 Partial parsing Chapters 9.1, 9.3-9.4, 9.6, 9.9. Hrafn
wed 15/10 IceParser: An Incremental Finite-State Parser for Icelandic Student lecture Matthew
mon 20/10 Partial parsing Chapters 9.1, 9.3-9.4, 9.6, 9.9. Hrafn P2
mon 20/10 Exploring Evidence for Shallow Parsing Student lecture Martha
wed 22/10 Parsing techniques Chapters 11.1-11.4, 11.5.0. Hrafn
mon 27/10 Semantics and predicate logic Chapters 8.7, 12.1-12.9 Hrafn
wed 29/10 Discourse and reference resolution Chapters 14.1-14.5, 14.7 (skip 14.7.4) + (Brown and Yule 1983) Sec. 1.1, 1.3 Hannes P3
mon 03/11 Information structure and newness of information (Brown and Yule 1983) Sec. 4.1-4.2 + (Prince 1981) Sec. 1-3 Hannes
wed 05/11 Discourse structure and discourse markers Chapters 14.6, 14.8 + (Allen 1995) 16.1-16.3 Hannes A3
wed 05/11 T2D: Generating Dialogues Between Virtual Agents Automatically from Text Student lecture Andri
mon 10/11 Adjacency pairs, speech acts and grounding in dialogue Chapter 15. Hannes
wed 12/11 The role of nonverbal behaviour in communication. (Bavelas and Chovil 2000) Hannes P4
wed 12/11 Augmenting Online Conversation through Automatic Discourse Tagging Student lecture Brynjar
mon 17/11 Embodied Conversational Agents Systems (e.g. The BEAT Tool) (Johnson, et al. 2004), (Cassell et al. 2001) Hannes
wed 19/11 Review of Discourse Assignment/Project, Exam Topics None (but read the paper Birna is presenting) Hannes
wed 19/11 More Than Just a Pretty Face: Conversational Protocols and the Affordances of Embodiment Student lecture Birna
mon 24/11 Review and discussion about the final exam Hrafn
fri 28/11 Final project demo all P5

A selection of papers

TopicTitleLink
N-grams Statistical Identification of Language http://citeseer.ist.psu.edu/dunning94statistical.html
N-grams N-Gram-Based Text Categorization http://citeseer.ist.psu.edu/68861.html
Morphology Applications of Finite-State Transducers in Natural Language Processing http://www.xrce.xerox.com/Publications/Attachments/2000-302/fst-in-nlp.pdf
Morphology Constructing Lexical Transducers http://citeseer.ist.psu.edu/443780.html
Morphology Guessing Morphological Classes of Unknown German Nouns http://nats-www.informatik.uni-hamburg.de/~vhahn/Downloads/RANLP03.pdf
Morphology Automatic Rule Induction for Unknown Word Guessing http://portal.acm.org/citation.cfm?id=972708
Morphology A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI) http://www.springerlink.com/content/h530q7157285563u/
POS tagging Tagging Icelandic txt: an experiment with integrations and combinations of taggers http://nlp.ru.is/publications.htm
POS tagging Tagging Icelandic text: A linguistic rule-based approach http://nlp.ru.is/publications.htm
POS tagging TnT - A Statistical Part-of-Speech Tagger http://citeseer.ist.psu.edu/brants00tnt.html
POS tagging Comparing a Linguistic and a Stochastic Tagger http://acl.ldc.upenn.edu/P/P97/P97-1032.pdf
POS tagging A simple rule-based part of speech tagger http://portal.acm.org/citation.cfm?id=974526
Parsing Exploring Evidence for Shallow Parsing http://acl.ldc.upenn.edu/W/W01/W01-0706.pdf
Parsing Text Chunking using Transformation-Based Learning http://acl.ldc.upenn.edu/W/W95/W95-0107.pdf
Parsing IceParser: An Incremental Finite-State Parser for Icelandic http://nlp.ru.is/publications.htm
Parsing Statistical Techniques for Natural Language Parsing http://citeseer.ist.psu.edu/286958.html
Discourse and Dialogue Augmenting Online Conversation through Automatic Discourse Tagging http://www.ru.is/faculty/hannes/publications/HICSS2005.pdf
Discourse and Dialogue Generating Dialogues Between Virtual Agents Automatically from Text http://www.springerlink.com/index/p6265q6h81312001.pdf
Discourse and Dialogue More Than Just a Pretty Face: Conversational Protocols and the Affordances of Embodiment http://www.ru.is/faculty/hannes/publications/KBS2001.pdf
Discourse and Dialogue Towards a model of face-to-face grounding http://www.springerlink.com/index/p6265q6h81312001.pdf
Discourse and Dialogue Building Effective Question and Answering Characters http://www.aclweb.org/anthology-new/W/W06/W06-1303.pdf
Discourse and Dialogue Semantic and Discourse Information for Text-to-Speech Intonation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.4835

Grading

Part of CourseTotal Weight
Programming Project 40%
Participation 15%
Homework Assignments 15%
Final Written Exam 30%
Total 100%