You are not logged in.
Let assume there is some binary file on disk which contains some data, from beginning of file to some part of it, and the rest of file is empty - filled with zeros. The file has reserved some space on disk.
How to get to know the size of that data without zero part?
Last edited by xerxes_ (2024-05-15 14:49:03)
Offline
Read the file and count its size until the last non-zero value.
Offline
Is there any command line tool for that?
Offline
Take a look at Sparse files https://wiki.archlinux.org/title/Sparse_file
Offline
If your file is not a sparse files, but is actually filled with tailing zeros:
#include <stdio.h>
long get_file_size(FILE *fp);
//-------------------------------------------------------------
// get file size except trailing zeros
long get_file_size(FILE *fp)
{
long end_pos, curr_pos;
int c;
fseek(fp, -1L, SEEK_END);
curr_pos = ftell(fp);
end_pos = curr_pos + 1;
while (curr_pos >= 0) {
c = fgetc(fp);
if (c != 0x00)
return curr_pos + 1;
fseek(fp, --curr_pos, SEEK_SET);
}
return end_pos;
}//------------------------------------------------------------
int main(int argc, char* argv[])
{
long size;
FILE *fp;
if (argc != 2) {
printf("Usage: %s <FILENAME>\n", argv[0]);
return 1;
}
fp = fopen(argv[1], "r");
if (!fp) {
printf("Error: Can't open file\n");
return 1;
}
size = get_file_size(fp);
fclose(fp);
printf("File size except trailing zeros: %ld bytes\n", size);
return 0;
}
Offline
Update:
@jaywk:
Thanks, your program works great in my case (when there is data in file first and then nulls) and is nice.
Or I can do:
bbe -b "/\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00/:10" -s -e "F d" -e "p h" -e "A \n" /dev/shm/i3-log-$(pidof i3) | head -n1
and make alias for that.
Next I thinked about more general approach: finding all null bytes in file regardless of their place and count them to count difference between whole file size. I thought I didn't find any ready command/program, but I found two:
grep -obUaP "\x00" binfile | wc -l
bbe -b "/\x00/:1" -s -e "F d" -e "p h" -e "A \n" binfile | wc -l
grep is slow for bigger files, bbe is more complicated, but has good speed.
Last edited by xerxes_ (2024-05-12 09:53:57)
Offline
Counting "null" bytes in a binary file is nonsensical. Many of those bytes are not "empty" but representing actual data just with a zero value at that position.
What is this binfile? Do you know the format? If so, there's likely something specific for the format in question. You know enough, apparently, to know that the unused tail-end of the file has been zeroed out (rather than just being random data which would be just as likely in unused space).
Last edited by Trilby (2024-05-11 19:35:35)
"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman
Online
@Trilby:
See my updated #6 post about binfile log file format.
Moreover It's interesting, for example, how many nulls contain different type of compressed files, jpg files and how many nulls contain iso images, etc. I don't want to remove these nulls.
Offline