This is an old revision of the document!
Table of Contents
Lab 3
Try to complete as many of the problems as you can. Hand in your code files with what you have done in MySchool before midnight today (3 September).
The first and the last problems should be in a file named lab3.py
, the others in myscript.py
, mytokenize.py
and rot13.py
.
If you can't manage to complete a particular problem please hand in your incomplete code – comment it out if it produces an error.
1. String slicing and stepping
monty = 'Monty Python'
We can specify a “step” size for the slice. The following returns every second character within the slice: monty[6:11:2]
. It also works in the reverse direction: monty[10:5:-2]
Try these for yourself, then experiment with different step values.
What happens if you ask the interpreter to evaluate monty[::-1]
? Explain why this is a reasonable result.
(Problem 4 and 5 in Chapter 3)
2. myscript.py: argv
from sys import argv print(argv)
The list argv
contains the name of the script plus any parameters used in the invocation.
$ python myscript.py One TWO three ['myscript.py', 'One', 'TWO', 'three']
If the number of parameters is fixed beforehand you can unpack them all into variables. Otherwise you use the index number to get a particular parameter.
first_param = argv[1] #using the index script_name, first, second, third = argv #unpacking the argv list into four variables
Create a script named myscript.py
that produces the following output when executed with the parameters indicated:
$ python myscript.py file1.txt file2.txt Number of parameters: 2 Script name: myscript.py First parameter: file1.txt Second parameter: file2.txt
NOTE: The python installer for Windows does not seem to add python to the path by default. If you can't invoke python in the Command Prompt (cmd) the simplest solution might be to install python again (choose “Change Python”) and then make sure “Add python.exe to Path” is selected (last option undir “Customize Python”).
3. mytokenize.py: Read file
Create a script name mytokenize.py
that reads a file contents, tokenizes them, removes stopwords and print out the remaining tokens, one per line.
from sys import argv from nltk import word_tokenize from nltk.corpus import stopwords #Get file name from argv (see problem 3). #Open file for reading. #Read contents into a string. #Tokenize the string. #Remove stopwords (words in stopwords.words('english')). #Print out the tokens, one per line.
You should be able to invoke the script using python mytokenize.py test.txt
.
4. rot13.py: Read and Write file
Now create a script named rot13.py
that reads the contents from one file, line by line and alters the lines with a simple algorithm before writing them to another file.
from sys import argv from codecs import encode #Get two file names from argv (see problem 3). #Open file1 for reading. #Open file2 for writing. # Loop; read one line from file1. line = encode(line, 'rot_13') # Write the line to file2. #(Close file2) #(Close file1)
You should be able to invoke the script using python rot13.py input.txt output.txt
for example. Copy some text and put into a test file.
The function used to alter the lines is encode(string, 'rot_13')
from codecs
, see ROT13 on Wikipedia
>>> from codecs import encode >>> print(encode('nun', 'rot_13')) aha
5. List comprehension
Rewrite the following loop as a “list comprehension”:
>>> sent = ['The', 'dog', 'gave', 'John', 'the', 'newspaper'] >>> result = [] >>> for word in sent: ... word_len = (word, len(word)) ... result.append(word_len) >>> result [('The', 3), ('dog', 3), ('gave', 4), ('John', 4), ('the', 3), ('newspaper', 9)]
(Problem 10 in Chapter 3).
List comprehensions enable the descriptive construction of lists in a very compact, yet easily readable way. See some examples on this page or google for others.
If you feel this problem is easy you should also try your hand at problems 31 and 41.
Possible Solutions
#1 >>> monty[::-1] == 'nohtyP ytnoM' True #2 from sys import argv print('Number of parameters: ', len(argv)-1) print('Script name: ', argv[0]) print('First parameter: ', argv[1]) print('Second parameter: ', argv[2]) #3 from sys import argv from nltk import word_tokenize from nltk.corpus import stopwords with open(argv[1]) as infile: for w in word_tokenize(infile.read()): if w.lower() not in stopwords.words('english'): print(w) #Since files are context managers, they can be used in a with-statement. #The file will close when the code block is finished, even if an exception occurs #4 from sys import argv from codecs import encode with open(argv[1]) as infile, open(argv[2], 'w') as outfile: for line in infile: outfile.write(encode(line, 'rot_13')) #5 [(w, len(w)) for w in sent]