Table of Contents

T-725-MALV, Natural Language Processing Fall 2015

Basic Info

Description

The goal of language technology (LT) is to develop systems which allow people to communicate with computers using natural languages. LT is an interdisciplinary field, requiring knowledge from subjects like linguistics, statistics, psychology, engineering and computer science. This course discusses fundamentals of natural language processing (NLP), which is one of the subfields of LT, and introduces research in the field, in part with regard to the Icelandic language. Students acquire understanding of the various stages of NLP, e.g. morphological analysis, part-of-speech tagging, syntactic analysis, semantic analysis, discourse and dialogue. In the course, students work on programming projects related to the aforementioned stages.

Learning Outcome

On completing the course, students should:

Course Assessment

The course assessment is as follows:

Part of CourseTotal Weight
Three individual projects/assignments; 3*10% 30%
A final project (can be worked on in a group of two students) 30%
Participation in class (discussions, Piazza, labs) 10%
A final written exam 30%
Total 100%

To provide a rich hands-on experience, students will build their own application (a final project) that relies on NLP over the course of the semester. A number of project proposals will be provided by the instructors, but students are also encouraged to come up with their own ideas. Three homework projects/assignments will also be distributed during the semester to reinforce some of the more theoretical material.

Everything that has to be turned in should arrive no later than at 23:59 on the due date, or else incur 10% penalty for each additional day, including weekends and holidays. Projects are not accepted if handed in more than two days late.

Students need to hand in at least 70% of the lab projects in order to take the final exam.

Assignments and Projects

AssignmentAssignedDueDurationWeight
Assignment I Mon 7. Sep. Wed 16. Sep. (Week 5) 10 days 10%
Assignment II Mon 28. Sep. Wed 7. Oct (Week 8) 10 days 10%
Assignment III Tue 20. Oct. Wed 28. Oct (Week 11) 9 days 10%
Final project Mon 28. Sep. Fri 6. Nov (Week 12) 6 weeks 30%
Total 60%

Final Project Teams

Team Members Proejct
1 Szymon Klepacz and Felix Weissl 1.15 Question-Answering System
2 Steve Losh Tweet Scraping and Sentiment Analysis
3 Jacopo de Berardinis, Carlo Castagnari, Giorgio Focina 1.13 Intelligent Computer-Assisted Languge Learning (Italian?)
4 Kristján Rúnarsson 1.15 Intonation for Text-to-Speech
5 Tinna Frímann Jökulsdóttir 1.13 Intelligent Computer-Assisted Language Learning (Icelandic)
6 Starkaður Barkarson 1.13 Intelligent Computer-Assisted Language Learning (Icelandic)
7 Sigurður Jónsson Knowledge Representation and Reasoning
8 Arnar Freyr Bjarnason TBD
9 Ívar Örn Ragnarsson Text Summarization

Final Exam

There will be a final written exam counting 30% towards your grade.

Online Discussion Forum

Syllabus

Week Theory (Tues) Practice (Thurs)
TextbookSlidesVideoTextVideoLab
1 Chapter 1 About course
Introduction
Introduction Python Tutorial
Language Processing
Install Python on Windows 1
2 Chapter 2 Regular Expressions and Automata Regular expressions
Convert Regular Expression to Finite-State Automaton
Accessing Text Corpora and Lexical Resources 2
3 Chapter 3.1-3.5, 3.9 Words and Transducers
Various text processing tools
Tokenisation
Normalisation and Stemming
Sentence segmentation
Processing Raw Text Tokenizing words and Sentences
Stop words
Stemming
3
4 Chapter 4.1 - 4.5.1 N-grams Introduction to N-grams
Estimating N-gram Probabilities
Evaluation and Perplexity
Generalisation and zeros
Smoothing
Collocations 4
5 Chapter 5.1-5.8 POS tagging An Intro to Parts of Speech and POS Tagging
Some Methods and Results on Sequence Models for POS Tagging
The Tagging Problem
Generative models
Hidden Markov Models (HMMs)
Parameter Estimation in HMMs
Categorizing and Tagging Words Part of Speech Tagging 5
6 Chapter 12.1-12.4, 12.7, 13.1-13.3 Formal grammars
Parsing
The parsing problem (1) The parsing problem (2)
Context-free grammars (1)
Context-free grammars (2)
Simple English Grammar (1)
Simple English Grammar (2)
Examples of Ambiguity
Two views of syntactic structure
Analyzing Sentence Structure 6
7 Chapter 13.4-13.5, 14 Parsing
Partial Parsing
Statistical Parsing
An exponential number of attachments
CFGs and PCFGs
Lexicalization of PCFGs
The model of Charniak
PCFG Independence Assumptions
Constituency Parser Evaluation
Analyzing Sentence Structure
Extracting Information from Text
Chunking 7
8 Chapter 17.1-3
Chapter 18.1-2
The Representation of Meaning
Computational Semantics
Semantics an Overview
Semantics and Pragmatics - Sentence Semantics
Analyzing the Meaning of Sentences 8
9 Chapter 21.1-5 Computational Discourse
Reference
9
10 Chapter 24.1
Vilhjálmsson 2005
Cassell et al. 2001
Conversation
Nonverbal Behavior
10
11 Chapter 24.2 Dialog Systems 11
12 Chapter 9.1-6 Automatic Speech Recognition 12 No 12th lab ass.
FRIDAY: Final Project Presentations!
13 Material for final exam

Other material

TopicTitleLink
Lexical analysis / Regex matching JFlex http://jflex.de/
Lexical analysis / Regex matching Basic tokenizer for Icelandic http://www.ru.is/kennarar/hrafn/courses/nlp/IceBasic.zip
Regular expressions grep tutorial http://www.uccs.edu/~ahitchco/grep/
Regular expressions sed tutorial http://www.grymoire.com/Unix/Sed.html
Finite state transducers HFST http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/
Finite state transducers foma https://code.google.com/p/foma/