Table of Contents

T-538-MALV and T-725-MALV, Natural Language Processing Fall 2010

Basic Info

Description

The goal of language technology (LT) is to develop systems which allow people to communicate with computers using natural languages. LT is an interdisciplinary field, requiring knowledge from subjects like linguistics, statistics, psychology, engineering and computer science. This course discusses fundamentals of natural language processing (NLP), which is one of the subfields of LT, and introduces research in the field. Students acquire understanding of the various stages of NLP, e.g. morphological analysis, part-of-speech tagging, syntactic analysis, semantic analysis, discourse and dialogue. In the course, students work on programming projects related to the aforementioned stages.

Learning Outcome

On completing the course, students should:

Course Assessment

The course assessment is as follows:

Part of CourseTotal Weight
Three individual projects/assignments; 3*10% 30%
A final project (can be worked on in a group of two students) 30%
Participation in class 10%
A final written exam 30%
Total 100%

To provide a rich hands-on experience, students will build their own application (a final project) that relies on NLP over the course of the semester. A number of project proposals will be provided by the instructors, but students are also encourages to come up with their own ideas. Three homework projects/assignments will also be distributed during the semester to reinforce some of the more theoretical material.

Everything that has to be turned in should arrive no later than at 23:59 on the due date, or else incur 10% penalty for each additional day, including weekends and holidays. Projects are not accepted if handed in more than two days late.

General participation in discussions in class count towards a special participation grade. In addition, M.Sc. level students need to prepare and give one presentation on an existing research paper in the field, which also counts towards their participation grade.

Assignments and Projects

AssignmentAssignedDueDurationWeight
Assignment 1 Mon 27. Sep. Thu 07. Oct 10 days 10%
Assignment 2 Mon 11. Oct. Thu 21. Oct 10 days 10%
Assignment 3 Thu 11. Nov. Thu 18. Nov 7 days 10%
Final project Mon 18. Oct. Mon 29. Nov 6 weeks 30%
Total 60%

Final projects

StudentsProject
Arnþór and Kristinn 1.14 Question-Answering System
Gunnar and Ragnar 1.19 Generating an Image from a Natural Language Description
Angelo and Lorenzo 1.16 A Simple Embodied Dialog System
Andreas and Sebastian 1.7 An unknown word guesser
Danilo and Steffen 1.14 Question-Answering System
Marcel and Simone 1.4 N-gram based text categorization
Daníel and Eiríkur Fannar 1.8 Automatic thesaurus extraction
Kristján and Skúli 1.1 Grammar checking
Carmine and Luis 1.18 Generate Visual Reference into an Existing 3D Scene
Arnór and Haukur 1.3 Named Entity Recognition (NER)
Niccolo and Stefán 1.12 Intelligent Computer-Assisted Language Learning (ICALL)

Quizzes

NoDateSolution
1Sep. 16
2Sep. 23
3Sep. 30
4Oct. 10
5Oct. 21
6Nov. 01
7Nov. 11
8Nov. 22

Final Exam

There will be a final written exam counting 30% towards your grade.

Online Discussion Forum

The course has an online discussion forum that we can use in any way we see fit. Note that the students have to register on this forum to post their replies (simply go to the address below to register).

HostForum NameLocation
ProBoardsNLP2010http://ruclasses.proboards.com/index.cgi?board=nlp2010

Lectures

WeekDateTopicTextbook or supplementary materialRecordingsWho
1thu 09/09 About the course and Introduction Chapter 1 http://www.ru.is/faculty/hrafn/recordings/nlp/intro.zip Hrafn
1mon 13/09 Corpora and Finite-state automata Chapters 2.1-2.2 Hrafn
2thu 16/09 Regular expressions Chapters 2.3-2.4 Hrafn
2mon 20/09 The programming language Perl http://www.ebb.org/PickingUpPerl/pickingUpPerl.pdf Hrafn
3thu 23/09 Tokenisation Chapters 4.1-4.3. Chapter 2 in “Handbook of Natural Language Processing” Hrafn
3mon 27/09 Word counting and N-grams Chapters 4.4-4.7 Hrafn
4thu 30/09 Morphology Chapter 5 Hrafn
4mon 04/10 Lexicon Compiler http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Hrafn
4mon 04/10 Statistical Identification of a language Kristinn
4mon 04/10 A Mixed Trigrams Approach for Context Sensitive Spell Checking Hjalti
5thu 07/10 POS tagging - with rules Chapters 6.1-6.3 Hrafn
5thu 07/10 Tagging Icelandic text: A linguistic rule-based approach Sebastian
5mon 11/10 POS tagging - with statistics Chapters 7.1, 7.2.1-7.2.2 avi avi(2) Hrafn
6thu 14/10 Syntax analysis Chapters 9.1-9.4 and 9.7 in “Speech and Language Processing” Hrafn
6thu 14/10 POS Tagging for German: How Important is the Right Context? Steffen
6thu 14/10 Tagging Icelandic text: An experiment with integrations and combinations of taggers Stefán
6mon 18/10 Context-free grammar and Prolog Chapters 8.1-8.4 Hrafn
7thu 21/10 Partial parsing Chapters 9.1, 9.3-9.4, 9.6, 9.9 Hrafn
7mon 25/10 Partial parsing Chapters 9.1, 9.3-9.4, 9.6, 9.9 Hrafn
7mon 25/10 IceParser: An Incremental Finite-State Parser for Icelandic Daníel
7mon 25/10 An Open Source Tool for Partial Parsing and Morphosyntactic Disambiguation Andreas
8thu 28/10 Parsing techniques Chapters 11.1-11.4, 11.5.0 Hrafn
8mon 01/11 Semantics and predicate logic Chapters 8.7, 12.1-12.9 Hrafn
9thu 04/11 Lexical semantics Chapters 13.1-13.5 Matthew Whelpton
9mon 08/11 Final projects: Status report/presentation (M1.05, 09:00-11:00) All students
9mon 08/11 Discourse and reference resolution Chapter 14 Hannes
10thu 11/11 Information structure and newness of information (Brown and Yule 1983) Sec. 4.1-4.2 + (Prince 1981) Sec. 1-3 Hannes
10thu 11/11 Building Effective Question and Answering Characters Angelo
10mon 15/11 Discourse structure and discourse markers Chapters 14.6, 14.8 + (Allen 1995) 16.1-16.3 Hannes
10mon 15/11 Generating Dialogues Between Virtual Agents Automatically from Text Carmine
11thu 18/11 Dialog Systems, Speech Acts and Grounding Chapter 15 Hannes
11thu 18/11 Towards a model of face-to-face grounding Luis
11mon 22/11 The role of non-verbal behaviour in communication (Vilhjálmsson 2005) Hannes
11mon 22/11 Augmenting Online Conversation through Automatic Discourse Tagging Lorenzo
11mon 22/11 More Than Just a Pretty Face: Conversational Protocols and the Affordances of Embodiment Danilo
12thu 25/11 Speech analysis, speech synthesis and dialogue systems Chapter 15 Jón Guðnason
12mon 29/11 Final projects: Demo, M1.09, 13:00-17:00 All students
+wed 8/12 Semantic and Discourse Information for Text-to-Speech Intonation Niccolo
+wed 8/12 Course Material Review
M1.02, 10:20-11:55
Hrafn, Hannes

A selection of papers

TopicTitleLink
N-grams Statistical Identification of Language http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.1958
N-grams N-Gram-Based Text Categorization http://citeseer.ist.psu.edu/68861.html
N-grams A Mixed Trigrams Approach for Context Sensitive Spell Checking http://nlp.cs.uic.edu/PS-papers/spell-cicling07.pdf
Morphology Applications of Finite-State Transducers in Natural Language Processing http://www2.parc.com/istl/members/karttune/publications/ciaa-2000/fst-in-nlp.pdf
Morphology Constructing Lexical Transducers http://citeseer.ist.psu.edu/443780.html
Morphology Guessing Morphological Classes of Unknown German Nouns http://nats-www.informatik.uni-hamburg.de/~vhahn/Downloads/RANLP03.pdf
Morphology Automatic Rule Induction for Unknown Word Guessing http://portal.acm.org/citation.cfm?id=972708
Morphology A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI) http://www.springerlink.com/content/h530q7157285563u/
POS tagging Tagging Icelandic txt: an experiment with integrations and combinations of taggers http://nlp.ru.is/publications.htm
POS tagging Tagging Icelandic text: A linguistic rule-based approach http://nlp.ru.is/publications.htm
POS tagging TnT - A Statistical Part-of-Speech Tagger http://citeseer.ist.psu.edu/brants00tnt.html
POS tagging Comparing a Linguistic and a Stochastic Tagger http://acl.ldc.upenn.edu/P/P97/P97-1032.pdf
POS tagging A simple rule-based part of speech tagger http://portal.acm.org/citation.cfm?id=974526
POS tagging POS Tagging for German: How Important is the Right Context? http://www.lrec-conf.org/proceedings/lrec2008/pdf/253_paper.pdf
Parsing Treebank Grammars http://www.nlp.org.cn/docs/docredirect.php?doc_id=25
Parsing Exploring Evidence for Shallow Parsing http://acl.ldc.upenn.edu/W/W01/W01-0706.pdf
Parsing Text Chunking using Transformation-Based Learning http://acl.ldc.upenn.edu/W/W95/W95-0107.pdf
Parsing IceParser: An Incremental Finite-State Parser for Icelandic http://nlp.ru.is/publications.htm
Parsing An Open Source Tool for Partial Parsing and Morphosyntactic Disambiguation http://www.mimuw.edu.pl/~olekz/ilnet/il/spejd/doc/spade.pdf
Parsing Statistical Techniques for Natural Language Parsing http://citeseer.ist.psu.edu/286958.html
Discourse and Dialogue Augmenting Online Conversation through Automatic Discourse Tagging http://www.ru.is/faculty/hannes/publications/HICSS2005.pdf
Discourse and Dialogue Generating Dialogues Between Virtual Agents Automatically from Text http://www.springerlink.com/index/p6265q6h81312001.pdf
Discourse and Dialogue More Than Just a Pretty Face: Conversational Protocols and the Affordances of Embodiment http://www.ru.is/faculty/hannes/publications/KBS2001.pdf
Discourse and Dialogue Towards a model of face-to-face grounding http://www.springerlink.com/index/p6265q6h81312001.pdf
Discourse and Dialogue Building Effective Question and Answering Characters http://www.aclweb.org/anthology-new/W/W06/W06-1303.pdf
Discourse and Dialogue Semantic and Discourse Information for Text-to-Speech Intonation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.4835

Other material

Title
The Icelandic tagset
Example lexicon for HFST: http://www.ru.is/faculty/hrafn/data/ice-lexc.txt
Bigram tagging example
Implementing Regular Expressions: http://swtch.com/~rsc/regexp/ http://linuxgazette.net/issue27/mueller.html
The Penn Treebank tagset: http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
Icelandic Grammar: http://en.wikipedia.org/wiki/Icelandic_grammar
The Database of Icelandic Inflections: http://bin.arnastofnun.is
Icelandic PoS tagging and parsing demo (IceNLP): http://nlp.cs.ru.is/icenlp.htm
Icelandic-English machine translation demo: http://nlp.cs.ru.is/is-en.htm
English PoS tagging Demo (Penn Treebank tagset): http://l2r.cs.uiuc.edu/~cogcomp/pos_demo.php
English Constraint Grammar Demo: http://www2.lingsoft.fi/cgi-bin/engcg
Constraint Grammar Development: http://visl.sdu.dk/constraint_grammar.html
English Partial Parsing Demo: http://l2r.cs.uiuc.edu/~cogcomp/demo.php?dkey=SP
Dialog Act Coding Schemes: http://www.dfki.de/mate/d11/chap4.html
DAMSL Dialog Act Markup Scheme: http://www.cs.rochester.edu/research/cisd/resources/damsl/RevisedManual/RevisedManual.html
Dialog Corpora Database: http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm
The CSLU Spoken Language System Toolkit: http://www.cslu.ogi.edu/toolkit/
CADIA BML Realizer: http://cadia.ru.is/projects/bmlr/