You are not logged in.

#1 2024-05-05 10:00:34

xerxes_
Member
Registered: 2018-04-29
Posts: 690

[SOLVED] Get real size of content of file not empty part of it

Let assume there is some binary file on disk which contains some data, from beginning of file to some part of it, and the rest of file is empty - filled with zeros. The file has reserved some space on disk.
How to get to know the size of that data without zero part?

Last edited by xerxes_ (2024-05-15 14:49:03)

Offline

#2 2024-05-05 10:02:02

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,411
Website

Re: [SOLVED] Get real size of content of file not empty part of it

Read the file and count its size until the last non-zero value.

Offline

#3 2024-05-05 10:04:22

xerxes_
Member
Registered: 2018-04-29
Posts: 690

Re: [SOLVED] Get real size of content of file not empty part of it

Is there any command line tool for that?

Offline

#4 2024-05-05 10:53:48

AtomicFS
Member
Registered: 2024-05-05
Posts: 1

Re: [SOLVED] Get real size of content of file not empty part of it

Take a look at Sparse files https://wiki.archlinux.org/title/Sparse_file

Offline

#5 2024-05-06 21:02:54

jaywk
Member
Registered: 2020-12-14
Posts: 12

Re: [SOLVED] Get real size of content of file not empty part of it

If your file is not a sparse files, but is actually filled with tailing zeros:

#include <stdio.h>

long get_file_size(FILE *fp);

//-------------------------------------------------------------
// get file size except trailing zeros
long get_file_size(FILE *fp)
{
    long end_pos, curr_pos;
    int c;

    fseek(fp, -1L, SEEK_END);
    curr_pos = ftell(fp);
    end_pos = curr_pos + 1;

    while (curr_pos >= 0) {
        c = fgetc(fp);
        if (c != 0x00)
            return curr_pos + 1;
        fseek(fp, --curr_pos, SEEK_SET);
    }

    return end_pos;
}//------------------------------------------------------------

int main(int argc, char* argv[])
{
   long size;
   FILE *fp;

   if (argc != 2) {
       printf("Usage: %s <FILENAME>\n", argv[0]);
       return 1;
   }

   fp = fopen(argv[1], "r");
   if (!fp) {
       printf("Error: Can't open file\n");
       return 1;
  }

  size = get_file_size(fp);
  fclose(fp);
  printf("File size except trailing zeros: %ld bytes\n", size);

  return 0;
}

Offline

#6 2024-05-11 19:07:16

xerxes_
Member
Registered: 2018-04-29
Posts: 690

Re: [SOLVED] Get real size of content of file not empty part of it

Update:
@jaywk:
Thanks, your program works great in my case (when there is data in file first and then nulls) and is nice.
Or I can do:

bbe -b "/\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00/:10" -s -e "F d" -e "p h" -e "A \n" /dev/shm/i3-log-$(pidof i3) | head -n1

and make alias for that.

Next I thinked about more general approach: finding all null bytes in file regardless of their place and count them to count difference between whole file size. I thought I didn't find any ready command/program, but I found two:

grep -obUaP "\x00" binfile | wc -l
bbe -b "/\x00/:1" -s -e "F d" -e "p h" -e "A \n" binfile | wc -l

grep is slow for bigger files, bbe is more complicated, but has good speed.

Last edited by xerxes_ (2024-05-12 09:53:57)

Offline

#7 2024-05-11 19:34:05

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,616
Website

Re: [SOLVED] Get real size of content of file not empty part of it

Counting "null" bytes in a binary file is nonsensical.  Many of those bytes are not "empty" but representing actual data just with a zero value at that position.

What is this binfile?  Do you know the format?  If so, there's likely something specific for the format in question.  You know enough, apparently, to know that the unused tail-end of the file has been zeroed out (rather than just being random data which would be just as likely in unused space).

Last edited by Trilby (2024-05-11 19:35:35)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#8 2024-05-12 10:11:02

xerxes_
Member
Registered: 2018-04-29
Posts: 690

Re: [SOLVED] Get real size of content of file not empty part of it

@Trilby:
See my updated #6 post about binfile log file format.

Moreover It's interesting, for example, how many nulls contain different type of compressed files, jpg files and how many nulls contain iso images, etc. I don't want to remove these nulls.

Offline

Board footer

Powered by FluxBB