You are not logged in.
hello archers
i wrote this silly little script to add line numbers to a file...
#!/usr/bin/env python
import sys
if len(sys.argv) != 3:
  print "Correct syntax: addlinenums [input_file] [output_file]"
  exit()
try:
  buffer = open(sys.argv[1], 'rU').readlines()
except:
  print "Error reading input file - you sure it exists?"
  exit()  
try:
  outfile = open(sys.argv[2], 'w')
  for num, line in enumerate(buffer):
    print >> outfile, str(num+1) + "  " + line,
  outfile.close()
except:
  print "Error writing output file - you sure you have permission?"
exit()...and i was wondering how to properly close the file i opened for the buffer. since i never explicitly instantiated a file object, does python close the file for me after i read the lines into the buffer?
(by the way, is there already some other general utility to add line numbers to a file that i don't know about? in other words, am i reinventing the wheel?)
Offline

You can't.
#!/usr/bin/env python
import sys
if len(sys.argv) != 3:
  print "Correct syntax: %s [input_file] [output_file]" % sys.argv[0]
  exit()
try:
  buffer = open(sys.argv[1], 'rU')
except IOError:
  print "Error reading input file - you sure it exists?"
  exit()  
try:
  outfile = open(sys.argv[2], 'w')
  for (num, line) in enumerate(buffer.readlines()):
    print >> outfile, str(num+1) + "  " + line,
  outfile.close()
except IOError:
  print "Error writing output file - you sure you have permission?"Don't use try/except without choosing errors that you want to catch.
btw:
$ cat -n file_from >> file_toLast edited by Husio (2007-11-23 09:12:50)
Offline
...and i was wondering how to properly close the file i opened for the buffer. since i never explicitly instantiated a file object, does python close the file for me after i read the lines into the buffer?
You can't.
@ ironbug
I'm not sure I understand why 'outfile' is closed explicitly in your code, yet you ask about closing 'buffer'. In both cases, a file object was created with 'open()', so both can / should be closed in the same way. Strategically placing some print commands at the end of the script (see Mod1 code below) should show this:
- when 'buffer.closed()' is commented:
Open File Test 1 - buffer At Time Of Exit: <open file 'buffer.txt', mode 'rU' at 0xb7c68d58>
Open File Test 2 - outfile At Time Of Exit: <closed file 'outfile.txt', mode 'w' at 0xb7c6cd58>- when 'buffer.closed()' is uncommented:
Open File Test 1 - buffer At Time Of Exit: <closed file 'buffer.txt', mode 'rU' at 0xb7befd58>
Open File Test 2 - outfile At Time Of Exit: <closed file 'outfile.txt', mode 'w' at 0xb7bf3d58>But, maybe I misunderstood the point of your question and Husio's subsequent answer.
Also, your code will print unpadded line numbers like this:
1  a
2  b
3  c
4  d
...
10  e
...
100  f
...
1000  gwhich gets messy. I added some code to modify that behavior if you are interested.
Mod1 code is a simple modification that is hard coded but manually changeable (so not very elegant). Output will then be this with Mod1:
0001  a
0002  b
0003  c
0004  d
...
0010  e
...
0100  f
...
1000  gMod1 code - less radical changes:
#!/usr/bin/env python
import sys
from itertools import count, izip                                   # new code
if len(sys.argv) != 3:
  print "Correct syntax: %s [input_file] [output_file]" % sys.argv[0]
  exit()
try:
  buffer = open(sys.argv[1], 'rU')
except IOError:
  print "Error reading input file - you sure it exists?"
  exit()  
try:
  outfile = open(sys.argv[2], 'w')
  #for num, line in enumerate(buffer.readlines()):                  # orig. code
  for (num, line) in izip(count(1), buffer.readlines()):            # new code
    #print >> outfile, str(num+1) + "  " + line,                    # orig. code
    outfile.write("%.04d  %s" % (num, line))                        # new code
  outfile.close()
  #buffer.close()                                                    # new code
except IOError:
  print "Error writing output file - you sure you have permission?"
print "Open File Test 1 - buffer At Time Of Exit: %s" % buffer      # test 1
print "Open File Test 2 - outfile At Time Of Exit: %s" % outfile    # test 2Mod 2 code is a more extensive modification that dynamically pads the line numbers with spaces depending on how many lines are being written to file. Notice that I had to create a list ('lines') of the lines read from the file object, 'buffer' so that I could operate on the information more than once. This probably gets to the heart of your original question. Working directly with the file object removes the lines from the buffer and it then becomes empty (but it is still open, thus it can be / should be closed). Mod2 code will produce this:
1  a
2  b
3  c
4  dor this:
 1  a
 2  b
 3  c
 4  d
...
10  eor this (etc):
  1  a
  2  b
  3  c
  4  d
...
 10  e
...
100  fMod 2 code:
#!/usr/bin/env python
import sys
from itertools import count, izip                                   # new code
if len(sys.argv) != 3:
  print "Correct syntax: %s [input_file] [output_file]" % sys.argv[0]
  exit()
try:
  buffer = open(sys.argv[1], 'rU', 0)
except IOError:
  print "Error reading input file - you sure it exists?"
  exit()  
lines = buffer.readlines()                                          # new code
buffer.close()                                                      # new code
lenpad = len(str(len(lines)))                                       # new code
try:
  outfile = open(sys.argv[2], 'w')
  #for num, line in enumerate(buffer.readlines()):                  # orig. code
  for (num, line) in izip(count(1), lines):                         # new code
    #print >> outfile, str(num+1) + "  " + line,                    # orig. code
    outfile.write("%s  %s" % ((str(num).rjust(lenpad)), line))      # new code
  outfile.close()
except IOError:
  print "Error writing output file - you sure you have permission?"And as Husio pointed out, maybe other 'non-python' methods are easier to achieve similar results.
Last edited by MrWeatherbee (2007-11-23 14:45:13)
Offline
Notice that I had to create a list ('lines') of the lines read from the file object, 'buffer' so that I could operate on the information more than once. This probably gets to the heart of your original question. Working directly with the file object removes the lines from the buffer and it then becomes empty (but it is still open, thus it can be / should be closed).
that's exactly what i was asking. thank you very much.
i admit i was being lazy with the actual numbering in the script, but now that you've beefed up my original code i'll tuck your version away somewhere
also, i think its really funny that you love itertools so much (i noticed you used it also in another recent python thread)... any reason?
Offline
also, i think its really funny that you love itertools so much (i noticed you used it also in another recent python thread)... any reason?
"Love" is a strong (not to mention odd) emotion to have for something like a Python generator. 
For now (things may change as I learn more), I'll just call it a preference in cases where the count needs to begin at something other than zero (0). Simply put, using 'izip' combined with 'count' is pretty much 'enumerate' with an optional 'start' argument. That definitely comes in handy.
To beat a dead horse, using enumerate in cases where you need a non-zero starting point is like doing this:
cnt = 0
for i in li:
    print "%s %s" % (cnt + 1, i)
    cnt += 1In the above example, why set 'cnt' to zero, and then immediately increment it by one in the 'print' statement? Why not just initialize 'cnt' to 1?
As far as performance, they seem comparable to me. I cProfiled the Mod 1 script, first with the enumerate code and then with izip + count code (I also removed the itertools import when profiling the 'enumerate' code). The averages over 5 trials run against a file containing 1,000,000 lines were as follows:
Values are CPU seconds
         enumerate               itertools
1          3.495                    3.570
2          3.522                    3.520
3          3.473                    3.412
4          3.450                    3.450
5          3.530                    3.397
Avg.       3.503                    3.470Sample cProfile output for 'enumerate' code:
         1000010 function calls in 3.450 CPU seconds
   Ordered by: standard name
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.450    3.450 <string>:1(<module>)
        1    2.738    2.738    3.449    3.449 addlinenums_alpha.py:5(<module>)
        1    0.000    0.000    3.450    3.450 {execfile}
        1    0.000    0.000    0.000    0.000 {len}
        2    0.000    0.000    0.000    0.000 {method 'close' of 'file' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.138    0.138    0.138    0.138 {method 'readlines' of 'file' objects}
  1000000    0.569    0.000    0.569    0.000 {method 'write' of 'file' objects}
        2    0.004    0.002    0.004    0.002 {open}Sample cProfile output for itertools code:
         1000010 function calls in 3.397 CPU seconds
   Ordered by: standard name
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.397    3.397 <string>:1(<module>)
        1    2.666    2.666    3.396    3.396 addlinenums_alpha.py:5(<module>)
        1    0.000    0.000    3.397    3.397 {execfile}
        1    0.000    0.000    0.000    0.000 {len}
        2    0.000    0.000    0.000    0.000 {method 'close' of 'file' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.141    0.141    0.141    0.141 {method 'readlines' of 'file' objects}
  1000000    0.575    0.000    0.575    0.000 {method 'write' of 'file' objects}
        2    0.014    0.007    0.014    0.007 {open}And ... just to finish up this line of thought, here are the results of cProfiling the Mod 2 code. The trials were run against the code as I posted it originally and then with an optimization. The optimization took the function:
str(num).rjust(lenpad)and assigned it to a variable outside the loop:
rjust = str.rjustthus the code inside the loop is converted to this:
rjust(str(num), lenpad)Here is the cProfile before the optimization:
         2000012 function calls in 4.719 CPU seconds
   Ordered by: standard name
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    4.719    4.719 <string>:1(<module>)
        1    3.498    3.498    4.718    4.718 addlinenums.py:3(<module>)
        1    0.000    0.000    4.719    4.719 {execfile}
        3    0.000    0.000    0.000    0.000 {len}
        2    0.000    0.000    0.000    0.000 {method 'close' of 'file' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.144    0.144    0.144    0.144 {method 'readlines' of 'file' objects}
  1000000    0.520    0.000    0.520    0.000 {method 'rjust' of 'str' objects}
  1000000    0.552    0.000    0.552    0.000 {method 'write' of 'file' objects}
        2    0.005    0.003    0.005    0.003 {open}and after the optimization:
         1000012 function calls in 4.330 CPU seconds
   Ordered by: standard name
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    4.330    4.330 <string>:1(<module>)
        1    3.616    3.616    4.329    4.329 addlinenums.py:3(<module>)
        1    0.000    0.000    4.330    4.330 {execfile}
        3    0.000    0.000    0.000    0.000 {len}
        2    0.000    0.000    0.000    0.000 {method 'close' of 'file' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.135    0.135    0.135    0.135 {method 'readlines' of 'file' objects}
  1000000    0.574    0.000    0.574    0.000 {method 'write' of 'file' objects}
        2    0.005    0.002    0.005    0.002 {open}Offline