Table of Contents

T-538-MALV and T-725-MALV, Natural Language Processing Fall 2007

Basic Info

Description

One of the most important goals of Human Language Technology, as a field within Computer Science, is to make it possible for people to interact with computer systems using their own natural manner of communication through language. Human Language Technology is an interdisciplinary field that for example relies on knowledge from linguistics, statistics, psychology, engineering and classic computer science. This course focuses on the fundamentals of Natural Language Processing (NLP), an important subfield of Human Language Technology, and introduces state-of-the-art research in the field with particular regard to the Icelandic language. Students will gain understanding of the various stages in language processing, for example morphological analysis, part-of-speech tagging, syntactic analysis, semantic analysis, discourse analysis and dialog control. Students will get a chance to explore these techniques through programming assignments and an independent project.

Goals

On completion of the course students should:

Coursework Overview

To provide a rich hands-on experience, students will build their own application that relies on NLP over the course of the semester. While the choice of final application is completely in the hands of the students, important NLP components will be built in a series of programming projects that lead up to the final system demonstration. Three homework assignments will also be distributed during the semester to reinforce some of the more theoretical material.

Everything that has to be turned in, including the programming projects, should arrive no later than at 23:59 on the due date, or else incur 10% penalty for each additional day, including weekends and holidays.

Each student will be asked to prepare and give one presentation on an existing research paper in the field. This presentation along with general participation in discussions, online and in class, count towards a participation grade.

Assignments and Projects

AssignmentCodeDescriptionAssignedDueDurationWeightDiscuss
Homework Assignment 1 A1Regular Expressions T 28. Aug T 4. Sep 8 days 5%
Programming Project 1 P1Tokenizing text F 7. Sep T 18. Sep 12 days 8% - 10% Thread
Homework Assignment 2 A2Tagging T 18. Sep T 25. Sep 8 days 5%
Programming Project 2 P2Tagging text T 25. Sep F 5. Oct 11 days 8% - 10%
Programming Project 3 P3Parsing Text F 5. Oct T 16. Oct 12 days 8% - 10%
Programming Project 4
Invalid Link
data
P4Discourse model T 16. Oct F 26. Oct 11 days 8% - 10%
Homework Assignment 3 A3Discourse analysis F 26. Oct F 2. Nov 8 days 5%
Programming Project 5 P5Application F 26. Oct F 9. Nov 15 days 8% - 10%
Total 55%

Final Exam

You will find information about the exam in this exam preparation document.

Discussion Questions

After every lecture, the presenter will post a discussion question on an online forum and the students will be asked to contribute to the discussion of that topic until the following lecture. The discussion takes place on an external forum page at the following address. Note that the students have to register on this forum to post their replies (simply go to the address below to register).

HostForum NameAddressDiscussion Questions
ProBoardsMálvinnslahttp://malv2007.proboards50.com/Read Questions

Schedule

WeekDateTopicWhoDue
1Tue 21. Aug Introduction to NLP (Chapter 1) both
Fri 24. Aug Corpora and finite state automata (Chapter 2) hrafn
2Tue 28. Aug Regular expressions and Perl (Chapter 2) hannes
Fri 31. Aug Tokenization hrafn
3Tue 4. Sep Word counting and n-grams (Chapter 4) hrafn A1
Fri 7. Sep Morphology (Chapter 5) hrafn
4Tue 11. Sep Lexicon Compiler (Chapter 5) hrafn
Fri 14. Sep POS-Tagging with rules (Chapter 6) hrafn
5Tue 18. Sep POS-Tagging with stochastic techniques (Chapter 7) hrafn P1
Fri 21. Sep Combinations of taggers hrafn
Fri 21. Sep Statistical Identification of Language birna
6Tue 25. Sep Syntax analysis (Chapter 9 in “Speech and Language Processing”) hrafn A2
Fri 28. Sep Context-free grammar and PROLOG (Chapter 8) hrafn
Fri 28. Sep Comparing a Linguistic and a Stochastic Tagger indriði
7Tue 2. Oct Partial parsing (Chapter 9) hrafn
7Tue 2. Oct Tagging Icelandic text: A linguistic rule-based approach ida
Fri 5. Oct Partial parsing (Chapter 9) hrafn P2
Fri 5. Oct A simple rule-based part of speech tagger vignir
8Tue 9. Oct Parsing techniques (Chapter 11) hrafn
Fri 12. Oct Semantics and predicate logic (Chapter 12) hrafn
9Tue 16. Oct Modeling discourse and reference resolution (Chapter 14) hannes P3
Fri 19. Oct No class (Ólympíuleikar HR)
10Tue 23. Oct Information structure and newness of information (Chapter 14) hannes
Tue 23. Oct "Augmenting Online Conversation through Automatic Discourse Tagging" (presentation) bjarni
Fri 26. Oct Communicative intent, discourse structure and discourse markers (Chapter 14) hannes P4
11Tue 30. Oct Adjacency pairs, speech acts and grounding in dialogue (Chapter 15) hannes
Tue 30. Oct "Towards a Model of Face-to-Face Grounding" (presentation ) ægir
Fri 2. Nov The role of nonverbal behavior in communication hannes A3
Fri 2. Nov “Discourse-Oriented Facial Displays in Conversation” (presentation) sigrún
12Tue 6. Nov Dialogue systems and embodied conversational agents hannes
Fri 9. Nov Final project presentations and demos both P5

Grading

Part of CourseTotal Weight
Programming Project 40%
Participation 15%
Homework Assignments 15%
Final Written Exam 30%
Total 100%