User Tools

Site Tools


public:t-malv-15-3:3

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
public:t-malv-15-3:3 [2015/09/03 09:18] – [3. tokenize.py: Read file] orvarkpublic:t-malv-15-3:3 [2024/04/29 13:33] (current) – external edit 127.0.0.1
Line 31: Line 31:
 <code> <code>
 $ python myscript.py One TWO three $ python myscript.py One TWO three
-['lab3-1.py', 'One', 'TWO', 'three']+['myscript.py', 'One', 'TWO', 'three']
 </code> </code>
  
Line 52: Line 52:
 </code> </code>
  
-**NOTE: The python installer for Windows does not seem to add python to the path by default. If you can't invoke python in the Command Prompt (cmd) the simples solution might be to install python again and make sure "Add python.exe to Path" is selected (last option in customize).**+**NOTE: The python installer for Windows does not seem to add python to the path by default. If you can't invoke python in the Command Prompt (cmd) the simplest solution might be to install python again (choose "Change Python"and then make sure "Add python.exe to Path" is selected (last option undir "Customize Python").**
  
 {{:public:t-malv-15-3:python-path-install.png?direct&200|}} {{:public:t-malv-15-3:python-path-install.png?direct&200|}}
Line 60: Line 60:
 ===== 3. mytokenize.py: Read file ===== ===== 3. mytokenize.py: Read file =====
  
-**Create a script name ''tokenize.py'' that reads a file contents, tokenizes them, removes stopwords and print out the remaining tokens, one per line.**+**Create a script name ''mytokenize.py'' that reads a file contents, tokenizes them, removes stopwords and print out the remaining tokens, one per line.**
  
 <code python> <code python>
Line 69: Line 69:
 #Get file name from argv (see problem 3). #Get file name from argv (see problem 3).
 #Open file for reading. #Open file for reading.
-#Read contents into string.+#Read contents into string.
 #Tokenize the string. #Tokenize the string.
 #Remove stopwords (words in stopwords.words('english')). #Remove stopwords (words in stopwords.words('english')).
Line 75: Line 75:
 </code> </code>
  
-You should be able to invoke the script using ''python tokenize.py test.txt''.+You should be able to invoke the script using ''python mytokenize.py test.txt''.
  
  
Line 128: Line 128:
  
 **If you feel this problem is easy you should also try your hand at problems 31 and 41.** **If you feel this problem is easy you should also try your hand at problems 31 and 41.**
 +
 +===== Possible Solutions =====
 +
 +<code python>
 +#1
 +>>> monty[::-1] == 'nohtyP ytnoM'
 +True
 +
 +#2
 +from sys import argv
 +
 +print('Number of parameters: ', len(argv)-1)
 +print('Script name: ', argv[0])
 +print('First parameter: ', argv[1])
 +print('Second parameter: ', argv[2])
 +
 +#3
 +from sys import argv
 +from nltk import word_tokenize
 +from nltk.corpus import stopwords
 +
 +with open(argv[1]) as infile:
 +    for w in word_tokenize(infile.read()):
 +        if w.lower() not in stopwords.words('english'):
 +            print(w)
 +
 +#Since files are context managers, they can be used in a with-statement.
 +#The file will close when the code block is finished, even if an exception occurs
 +
 +#4
 +from sys import argv
 +from codecs import encode
 +
 +with open(argv[1]) as infile, open(argv[2], 'w') as outfile:
 +    for line in infile:
 +        outfile.write(encode(line, 'rot_13'))
 +
 +#5
 +[(w, len(w)) for w in sent]
 +</code>
/var/www/cadia.ru.is/wiki/data/attic/public/t-malv-15-3/3.1441271929.txt.gz · Last modified: 2024/04/29 13:32 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki