You are not logged in.
I was creating a piece of C++ test code for one of my scripts that would dump the file size, and the line count of a specific file. To test the c++ code, I needed a set of files that were from bytes to gigabytes in size. To do this I used the command below.
SIZE_FACTOR=M ; dd if=/dev/zero of=test bs=1$SIZE_FACTOR count=1
I ran my C++ program, called linereader as:
./linereader test --no-print
Receiving an output of:
LINE_COUNT : 1
FILE_SIZE_BYTES : 1048576
ROUNDED_SIZE : 1.04858Mb
Now when I run the command below:
wc -l test
I receive the following result:
0 test
Now, I know that the 'dd' command copied zero's from the /dev/null and wrote them to the file test, resulting in a 1Mb size file.
So there should be some data in the file, and if there is nothing to delimit it as multiple lines, or even one line, it would stand to reason that 'wc' would not even consider the file to contain any line. But, if there is data within the file without the presence of a delimiter, then it would stand to reason that the file is one line in length. I guess it would be better said that the file consist of a 1Mb long line, without a delimiter, or end of file terminator. So for 'wc' to say that the file 'test' consists of zero lines would be incorrect.
Now, if I were to use the command below to generate an absolutely empty file:
touch Test
and run the following command, knowing that 'touch' creates an empty file.
wc -l Test
I would receive and accurate result, as below.
0 Test
which if I were to use 'linereader':
./linereader Test --no-print
I would also receive:
LINE_COUNT : 0
FILE_SIZE_BYTES : 0
ROUNDED_SIZE : 0b
I would receive the resulting output due to the file containing nothing. But for the output of 'dd', there is data contained within the file, so therefore, it would stand to reason that the file 'test' did have one line, not the 'wc -l' reported 'zero' lines.
To take this further, I created another file called test, using the same 'dd' method, and split the file using 'split'.
split test -n 2
resulting in the files 'xaa' and 'xab'.
I then created a python3 script read the both files called 'tester.py'.
#! /usr/bin/python3
## I prefer to specify which python, so that when I use my scripts on distros that still link to python2.7, I won't have as many issues
for i in open("./xaa","r"):
print(i)
for i in open("./xab","r"):
print(i)
Running the code with:
python tester.py
Results in the displayed result:
[carl@sparknohss proff]$ python tester.py
[carl@sparknohss proff]$
What this shows is that there is indeed a line to each file, if by the definition of a line being data, with or without a line terminator.
Again, let's take this one step further, and run the command displayed below.
python tester.py > tw
After doing so, run the command:
wc -l tw
you should now see the result as:
2 tw
what happened is this, when the python interpreter finished reading, and printing the first file, it placed a newline at the end of the resulting data printed on stdout, and continued on to the next file. Effectively, this added a '\n' character to the end of each printed data set. Thus, now 'wc -l' considers the file to have two lines, because 'wc -l' uses a terminator to determine if a line exists or not, even if the line does not have a terminator.
So, from the side of curiosity, as this is not really that important, would it not be better if 'wc' had a line terminating character option, so that various other types of lines can be determined, or am I barking up the wrong tree(no pun intended)?
Offline
wc doesn't count lines.
NAME
wc - print newline, word, and byte counts for each file
...
-l, --lines
print the newline counts
`grep -c "." filename` can count lines.
"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman
Offline
Missed that one, and now I feel like and idiot. Thanks.
Offline
grep does not work either.
[carl@sparknohss linereader]$ cat test | grep -c "."
0
[carl@sparknohss linereader]$ grep -c . test
0
[carl@sparknohss linereader]$ grep -v . test
Binary file test matches
[carl@sparknohss linereader]$ grep -vc . test <- equivalent to `wc -c test`
1048576
[carl@sparknohss linereader]$ grep -cz '.' test
0
[carl@sparknohss linereader]$ grep -c '.*' test
1048576
From grep man page
-v, --invert-match
Invert the sense of matching, to select non-matching
lines.
-c, --count
Suppress normal output; instead print a count of
matching lines for each input file. With the -v,
--invert-match option (see below), count non-matching
lines.
-z, --null-data
Treat the input as a set of lines, each terminated by
a zero byte (the ASCII NUL character) instead of a
newline. Like the -Z or --null option, this option
can be used with commands like sort -z to process
arbitrary file names.
Do you have any other suggestions?
Offline
I'm confused, what do you want to do? A file only containing null bytes is not a text file, so the notion of applying text file metrics on it makes no sense at all.
So we're clear here - it doesn't contain "0" as in the character, it contains \0 bytes, bytes with the value 0. How is that supposed to be a line?
Last edited by bullet (2016-08-31 05:10:17)
Offline