One of the most important goals of Human Language Technology, as a field within Computer Science, is to make it possible for people to interact with computer systems using their own natural manner of communication through language. Human Language Technology is an interdisciplinary field that for example relies on knowledge from linguistics, statistics, psychology, engineering and classic computer science. This course focuses on the fundamentals of Natural Language Processing (NLP), an important subfield of Human Language Technology, and introduces state-of-the-art research in the field with particular regard to the Icelandic language. Students will gain understanding of the various stages in language processing, for example morphological analysis, part-of-speech tagging, syntactic analysis, semantic analysis, discourse analysis and dialog control. Students will get a chance to explore these techniques through programming assignments and an independent project.
On completion of the course students should:
To provide a rich hands-on experience, students will build their own application that relies on NLP over the course of the semester. While the choice of final application is completely in the hands of the students, important NLP components will be built in a series of programming projects that lead up to the final system demonstration. Three homework assignments will also be distributed during the semester to reinforce some of the more theoretical material.
Everything that has to be turned in, including the programming projects, should arrive no later than at 23:59 on the due date, or else incur 10% penalty for each additional day, including weekends and holidays.
Each student will be asked to prepare and give one presentation on an existing research paper in the field. This presentation along with general participation in discussions, online and in class, count towards a participation grade.
Assignment | Code | Description | Assigned | Due | Duration | Weight | Discuss | |
---|---|---|---|---|---|---|---|---|
Homework Assignment 1 | A1 | Regular Expressions | T 28. Aug | T 4. Sep | 8 days | 5% | ||
Programming Project 1 | P1 | Tokenizing text | F 7. Sep | T 18. Sep | 12 days | 8% - 10% | Thread | |
Homework Assignment 2 | A2 | Tagging | T 18. Sep | T 25. Sep | 8 days | 5% | ||
Programming Project 2 | P2 | Tagging text | T 25. Sep | F 5. Oct | 11 days | 8% - 10% | ||
Programming Project 3 | P3 | Parsing Text | F 5. Oct | T 16. Oct | 12 days | 8% - 10% | ||
Programming Project 4 Invalid Link | P4 | Discourse model | T 16. Oct | F 26. Oct | 11 days | 8% - 10% | ||
Homework Assignment 3 | A3 | Discourse analysis | F 26. Oct | F 2. Nov | 8 days | 5% | ||
Programming Project 5 | P5 | Application | F 26. Oct | F 9. Nov | 15 days | 8% - 10% | ||
Total 55% |
You will find information about the exam in this exam preparation document.
After every lecture, the presenter will post a discussion question on an online forum and the students will be asked to contribute to the discussion of that topic until the following lecture. The discussion takes place on an external forum page at the following address. Note that the students have to register on this forum to post their replies (simply go to the address below to register).
Host | Forum Name | Address | Discussion Questions |
---|---|---|---|
ProBoards | Málvinnsla | http://malv2007.proboards50.com/ | Read Questions |
Week | Date | Topic | Who | Due |
---|---|---|---|---|
1 | Tue 21. Aug | Introduction to NLP (Chapter 1) | both | |
Fri 24. Aug | Corpora and finite state automata (Chapter 2) | hrafn | ||
2 | Tue 28. Aug | Regular expressions and Perl (Chapter 2) | hannes | |
Fri 31. Aug | Tokenization | hrafn | ||
3 | Tue 4. Sep | Word counting and n-grams (Chapter 4) | hrafn | A1 |
Fri 7. Sep | Morphology (Chapter 5) | hrafn | ||
4 | Tue 11. Sep | Lexicon Compiler (Chapter 5) | hrafn | |
Fri 14. Sep | POS-Tagging with rules (Chapter 6) | hrafn | ||
5 | Tue 18. Sep | POS-Tagging with stochastic techniques (Chapter 7) | hrafn | P1 |
Fri 21. Sep | Combinations of taggers | hrafn | ||
Fri 21. Sep | Statistical Identification of Language | birna | ||
6 | Tue 25. Sep | Syntax analysis (Chapter 9 in “Speech and Language Processing”) | hrafn | A2 |
Fri 28. Sep | Context-free grammar and PROLOG (Chapter 8) | hrafn | ||
Fri 28. Sep | Comparing a Linguistic and a Stochastic Tagger | indriði | ||
7 | Tue 2. Oct | Partial parsing (Chapter 9) | hrafn | |
7 | Tue 2. Oct | Tagging Icelandic text: A linguistic rule-based approach | ida | |
Fri 5. Oct | Partial parsing (Chapter 9) | hrafn | P2 | |
Fri 5. Oct | A simple rule-based part of speech tagger | vignir | ||
8 | Tue 9. Oct | Parsing techniques (Chapter 11) | hrafn | |
Fri 12. Oct | Semantics and predicate logic (Chapter 12) | hrafn | ||
9 | Tue 16. Oct | Modeling discourse and reference resolution (Chapter 14) | hannes | P3 |
Fri 19. Oct | No class (Ólympíuleikar HR) | |||
10 | Tue 23. Oct | Information structure and newness of information (Chapter 14) | hannes | |
Tue 23. Oct | "Augmenting Online Conversation through Automatic Discourse Tagging" (presentation) | bjarni | ||
Fri 26. Oct | Communicative intent, discourse structure and discourse markers (Chapter 14) | hannes | P4 | |
11 | Tue 30. Oct | Adjacency pairs, speech acts and grounding in dialogue (Chapter 15) | hannes | |
Tue 30. Oct | "Towards a Model of Face-to-Face Grounding" (presentation ) | ægir | ||
Fri 2. Nov | The role of nonverbal behavior in communication | hannes | A3 | |
Fri 2. Nov | “Discourse-Oriented Facial Displays in Conversation” (presentation) | sigrún | ||
12 | Tue 6. Nov | Dialogue systems and embodied conversational agents | hannes | |
Fri 9. Nov | Final project presentations and demos | both | P5 | |
Part of Course | Total Weight |
---|---|
Programming Project | 40% |
Participation | 15% |
Homework Assignments | 15% |
Final Written Exam | 30% |
Total 100% |