Lab 3

Try to complete as many of the problems as you can. Hand in your code files with what you have done in MySchool before midnight today (3 September).

The first and the last problems should be in a file named lab3.py, the others in myscript.py, mytokenize.py and rot13.py.

If you can't manage to complete a particular problem please hand in your incomplete code – comment it out if it produces an error.

1. String slicing and stepping

monty = 'Monty Python'

We can specify a “step” size for the slice. The following returns every second character within the slice: monty[6:11:2]. It also works in the reverse direction: monty[10:5:-2] Try these for yourself, then experiment with different step values.

What happens if you ask the interpreter to evaluate monty[::-1]? Explain why this is a reasonable result.

(Problem 4 and 5 in Chapter 3)

2. myscript.py: argv

from sys import argv
 
print(argv)

The list argv contains the name of the script plus any parameters used in the invocation.

$ python myscript.py One TWO three
['myscript.py', 'One', 'TWO', 'three']

If the number of parameters is fixed beforehand you can unpack them all into variables. Otherwise you use the index number to get a particular parameter.

first_param = argv[1]    #using the index
 
script_name, first, second, third = argv     #unpacking the argv list into four variables

Create a script named myscript.py that produces the following output when executed with the parameters indicated:

$ python myscript.py file1.txt file2.txt
Number of parameters: 2
Script name: myscript.py
First parameter: file1.txt
Second parameter: file2.txt

NOTE: The python installer for Windows does not seem to add python to the path by default. If you can't invoke python in the Command Prompt (cmd) the simplest solution might be to install python again (choose “Change Python”) and then make sure “Add python.exe to Path” is selected (last option undir “Customize Python”).

(Information on executing python scripts in Windows)

3. mytokenize.py: Read file

Create a script name mytokenize.py that reads a file contents, tokenizes them, removes stopwords and print out the remaining tokens, one per line.

from sys import argv
from nltk import word_tokenize
from nltk.corpus import stopwords
 
#Get file name from argv (see problem 3).
#Open file for reading.
#Read contents into a string.
#Tokenize the string.
#Remove stopwords (words in stopwords.words('english')).
#Print out the tokens, one per line.

You should be able to invoke the script using python mytokenize.py test.txt.

4. rot13.py: Read and Write file

Now create a script named rot13.py that reads the contents from one file, line by line and alters the lines with a simple algorithm before writing them to another file.

from sys import argv
from codecs import encode
 
#Get two file names from argv (see problem 3).
#Open file1 for reading.
#Open file2 for writing.
#  Loop; read one line from file1.
     line = encode(line, 'rot_13')
#    Write the line to file2.
#(Close file2)
#(Close file1)

You should be able to invoke the script using python rot13.py input.txt output.txt for example. Copy some text and put into a test file.

The function used to alter the lines is encode(string, 'rot_13') from codecs, see ROT13 on Wikipedia

>>> from codecs import encode
>>> print(encode('nun', 'rot_13'))
aha

5. List comprehension

Rewrite the following loop as a “list comprehension”:

>>> sent = ['The', 'dog', 'gave', 'John', 'the', 'newspaper']
>>> result = []
>>> for word in sent:
...     word_len = (word, len(word))
...     result.append(word_len)
>>> result
[('The', 3), ('dog', 3), ('gave', 4), ('John', 4), ('the', 3), ('newspaper', 9)]

(Problem 10 in Chapter 3).

List comprehensions enable the descriptive construction of lists in a very compact, yet easily readable way. See some examples on this page or google for others.

If you feel this problem is easy you should also try your hand at problems 31 and 41.

Possible Solutions

#1
>>> monty[::-1] == 'nohtyP ytnoM'
True
 
#2
from sys import argv
 
print('Number of parameters: ', len(argv)-1)
print('Script name: ', argv[0])
print('First parameter: ', argv[1])
print('Second parameter: ', argv[2])
 
#3
from sys import argv
from nltk import word_tokenize
from nltk.corpus import stopwords
 
with open(argv[1]) as infile:
    for w in word_tokenize(infile.read()):
        if w.lower() not in stopwords.words('english'):
            print(w)
 
#Since files are context managers, they can be used in a with-statement.
#The file will close when the code block is finished, even if an exception occurs
 
#4
from sys import argv
from codecs import encode
 
with open(argv[1]) as infile, open(argv[2], 'w') as outfile:
    for line in infile:
        outfile.write(encode(line, 'rot_13'))
 
#5
[(w, len(w)) for w in sent]

Center for Analysis and Design of Intelligent Agents

Table of Contents