User Tools

Site Tools


public:t-malv-15-3:5

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

public:t-malv-15-3:5 [2015/09/17 08:10]
orvark [4. Most frequent tag of Hapax legomenon]
public:t-malv-15-3:5 [2015/09/17 08:18] (current)
orvark [4. Most frequent tag of Hapax legomenon]
Line 30: Line 30:
 See if making minor changes to the wording of the sentences is enough for the tagger to tag it correctly. For example adding ''that was'' after ''horse'' in the example above. See if making minor changes to the wording of the sentences is enough for the tagger to tag it correctly. For example adding ''that was'' after ''horse'' in the example above.
  
-===== 2. Training and Testing Data and Finding the Baseline  =====+===== 2. Training and Testing Dataand Finding the Baseline  =====
  
 <code python> <code python>
Line 61: Line 61:
 Instead of using the most common tag overall for the Default Tagger some say that [[https://en.wikipedia.org/wiki/Hapax_legomenon|hapax legomenon]] is a better model for unknown words. That is, that words that occur only once in the corpus are likely to be representative of the words that never occur, the unseen words. Instead of using the most common tag overall for the Default Tagger some say that [[https://en.wikipedia.org/wiki/Hapax_legomenon|hapax legomenon]] is a better model for unknown words. That is, that words that occur only once in the corpus are likely to be representative of the words that never occur, the unseen words.
  
-See if you can write code to find the most common tag in the set of words that occur only once in the Brown corpus. You might prefer to use ''brown.tagged_words()'' here, and even ''brown.tagged_words(categories='news')'' during testing.+See if you can write code to find the most common tags in the set of words that occur only once in the Brown corpus. You might prefer to use ''brown.tagged_words()'' here, and even ''brown.tagged_words(categories='news')'' during testing.
  
 What is the most common tag of //hapax legomenon//? What is the most common tag of //hapax legomenon//?
  
 Looking at the twenty most common tags, how do you think the difference between the overall model and hapax legomenon model will develop as the training corpus grows larger? Looking at the twenty most common tags, how do you think the difference between the overall model and hapax legomenon model will develop as the training corpus grows larger?
/var/www/ailab/WWW/wiki/data/attic/public/t-malv-15-3/5.1442477445.txt.gz ยท Last modified: 2015/09/17 08:10 by orvark