Learning Decision Trees

Material:

The zip archive contains three data sets and Java code to learn a decision tree from those data sets. Your task is to check how well the learning algorithm performs on the given data sets.

Tasks:

  1. Change the existing code, so that it prints the necessary data for a learning curve. That is, change the code such that it learns trees for increasing numbers of training examples and test each of the trees using the test set.
  2. Plot the learning curves for all three data sets (e.g., using a spreadsheet application of your choice) (monk-1, monk-2, monk-3) and interpret them.
  3. Are all the trees that are learned consistent with the training data (consistent = all examples are classified correctly)? If not, what could be the reason? (Hint: Consistency of the tree with the training set can be checked easily by adding a single line of code.)
  4. Look at the true functions for the three data sets in monk.names (Section 9). Design a good decision tree for the concept of monks-1 by hand.
  5. Compare this decision trees to the one that was learned by the algorithm using the whole training set.

Hand In:

  1. Learning curves for the three data sets.
  2. Interpretation of the learning curves.
  3. Answer to 3.
  4. Decision tree for 4.
  5. Interpretation of your findings for 5.