You are not logged in.
Hello,
I have almost 400,000 images from a scientific experiment. All are in binary format, specific to the facility where the experiment took place, but can be opened with Octave or ImageJ (both verified) and maybe ImageMagick as well (I didn't succeed).
All have a header, that starts with "{" and ends with "}".
An example:
{
HeaderID = EH:000001:000000:000000 ;
Image = 1 ;
ByteOrder = LowByteFirst ;
DataType = UnsignedShort ;
Dim_1 = 2048;
Dim_2 = 2048;
Size = 8388608;
count_time = Na ;
point_no = 0 ;
preset = Na ;
col_end = 2047;
col_beg = 0;
row_end = 2047;
row_beg = 0;
col_bin = 1;
row_bin = 1;
time = Wed Nov 23 21:41:05 2011;
time_of_day = 1322080865.991661;
dir = /data/visitor/HM1_diffTomo_zone2_LR2um_15x12;
suffix = .edf;
prefix = HM1_diffTomo_zone2_LR;
run = 1;
title = ESPIA FRELON Image 0001 [# 0];
time_of_frame = 0.271140;
}
^daedee_c^bbbidg`hfigg`cabddeede_ggida...
MORE DATA MORE DATA.........
The problem is that these files are very large and the format is not recognized by many "regular" image processing programs.
My goal is to make two files instead of the original one: one would be a text file containing only the header, and the other - a lossless .tif image (converted from all the binary data, minus the header).
I'm looking for a script that could do it for many images. If anyone has any idea - it will be so helpful.
Thanks,
L.
Offline
It is hard to tell you anything without knowing more about the files. Could you upload two examples (probably stripped of all relevant data or created as dummies)?
Offline
Would
sed -n 2,25p $filename > $filename.header
work? It simply prints the lines 2 through 25 to a file.
sed 1,26d $filename > $filename.tif
prints all but the first 26 lines to a file.
Offline
Thanks for the replies!
the problem with:
sed -n 2,25p $filename > $filename.header
is that the header is not always 24 lines long. In some files it is longer, and in some - shorter. The only sure way to separate the header from the rest of the file, is that it begins with "{" and ends with "}".
as for:
sed 1,26d $filename > $filename.tif
Even ignoring what was mentioned before, it can't be done like that, because the file is in raw binary format, while .tif image is compressed. I was thinking calling Octave or ImageMagick to somehow transform the file stripped from the header into .tif, and then to delete the stripped file, while leaving the original. Can't figure out how to do this though - very new to scripting.
I uploaded one file for example:
http://dl.dropbox.com/u/14434681/HM1_di … LR0001.edf
Awebb, I could upload two or more, but they all are basically the same, except the length of the header and the data itself, obviously.
Thank you again for your help!
Offline
You say that ImageJ will view the files. I don't use ImageJ, but if the one I've found is the same one you could write a batch file that uses ImageJ to convert each file.
Here: http://rsbweb.nih.gov/ij/docs/guide/use … Section-16 and here: http://rsbweb.nih.gov/ij/docs/guide/use … tion-23.10 are particularly relevant.
awk should get the header information for you:
awk '!p;/^\}/{p=1}'
Last edited by Roken (2012-02-12 12:11:08)
Ryzen 5900X 12 core/24 thread - RTX 3090 FE 24 Gb, Asus B550-F Gaming MB, 128Gb Corsair DDR4, Cooler Master N300 chassis, 5 HD (2 NvME PCI, 4SSD) + 1 x optical.
Linux user #545703
/ is the root of all problems.
Offline
The problem is not only getting the info from the header, but also writing a new file, which is like the original, but without the header. (Because only then I can open it in ImageJ)
I found two useful sed commands that can do this:
First reads everything between FOO and BAR:
sed -n '/FOO/,/BAR/p' test.txt
And the second - writes a file with everything except what's between FOO and BAR:
sed '/FOO/,/BAR/d' input.txt > output.txt
The only problem that remains is: when I switch FOO with { and BAR with }, it finds me many instances of such combination in the file.
How can I do it only for the first instance of FOO and the first instance of BAR?
Offline
I have written a sed script that deletes the contents of the first opening and closing {}'s.
Here is a test file
{
1
22
333
4444
}
55555
{
666666
}
7777777
88888888
sed -n ':start; s/{.*}//;t blank; N; s/\n//;t start; :blank;N;s/\n//; :rest;p;N;s/.*\n//;b rest' < test_file
Generates:
55555
{
666666
}
7777777
88888888
There must be a simpler way to achieve this...
Last edited by zorro (2012-02-12 21:46:47)
Offline
How can I do it only for the first instance of FOO and the first instance of BAR?
What about this (for, say, test_file):
tail --lines=+$(($(grep -nm1 '}' test_file | awk -F':' '{print $1}')+1)) test_file
The grep piped through awk returns the line number of the first "BAR".
Offline
What about this (for, say, test_file):
tail --lines=+$(($(grep -nm1 '}' test_file | awk -F':' '{print $1}')+1)) test_file
The grep piped through awk returns the line number of the first "BAR".
Y'know, I saw this as something of a challenge, and I spent ages trying to work that out. I finally got a rather inelegant
solution using wc and bc, but I was sure it should be doable with grep and awk. TY
Ryzen 5900X 12 core/24 thread - RTX 3090 FE 24 Gb, Asus B550-F Gaming MB, 128Gb Corsair DDR4, Cooler Master N300 chassis, 5 HD (2 NvME PCI, 4SSD) + 1 x optical.
Linux user #545703
/ is the root of all problems.
Offline
Happy to help
Offline
The only problem that remains is: when I switch FOO with { and BAR with }, it finds me many instances of such combination in the file.
How can I do it only for the first instance of FOO and the first instance of BAR?
Maybe you can try this ?!
sed -n '/^{/,/}/p' input
or this
sed '/^{/,/}/d' input > output
Ask, and it shall be given you.
Seek, and ye shall find.
Knock, and it shall be opened unto you.
Offline
Hi,
just do it with Python:
#!/usr/bin/env python
import sys
def main(filename):
with open(filename, 'rb') as fp:
header, imgdata = fp.read().split('}\n')
with open('{0}.txt'.format(filename), 'w') as fp:
fp.write(header.strip('{\n '))
with open('{0}.tif'.format(filename), 'wb') as fp:
fp.write(imgdata)
if __name__ == '__main__':
try:
main(sys.argv[1])
except IndexError:
print('Usage: python {0} FILENAME'.format(sys.argv[0]))
Whitie
Offline
igndenok, what you suggested is not good, because there may be a line that starts with "{" somewhere in the file, not just the first line of the header.
Found a way to extract the header only using sed:
sed -n -e '/{/,/}/p' -e '/}/q' HM1_diffTomo_zone2_LR0001.edf > hout
And found a way to convert the image to tif, even if it does have a header:
convert -endian LSB -depth 16 -size 2048x2048+1024 gray:HM1_diffTomo_zone2_LR0001.edf -auto-level -compress zip image.tif
Here, 2048 is the x and y dimension, and 1024 is the length of the header in bytes.
So now, there is no need to extract the header from the original, just to know it's length. But another problem arose: not all the pictures are 2048 pixels in x,y. (Although all are square)
So now, my goal is:
1) To copy the header to new file. (now solved)
2) Define a variable of it's length in bytes. (du -b? but du gives me the length + name, and I just want the length...)
3) Define a variable of the dimensions from the header, meaning extract the number from the line "Dim_1 = 2048; ". (how?)
4) Convert to tif using these two variables. (now solved)
Thanks for any ideas!
Last edited by srulop (2012-02-13 13:40:46)
Offline
You can get the dimensions with:
XDIM=`cat $FILE | grep Dim_1 | sed 's/.*= \([0-9]*\).*/\1/'`
YDIM=`cat $FILE | grep Dim_2 | sed 's/.*= \([0-9]*\).*/\1/'`
Where $FILE is either the name of the original file or the extracted header saved to a file.
Last edited by Roken (2012-02-13 14:36:16)
Ryzen 5900X 12 core/24 thread - RTX 3090 FE 24 Gb, Asus B550-F Gaming MB, 128Gb Corsair DDR4, Cooler Master N300 chassis, 5 HD (2 NvME PCI, 4SSD) + 1 x optical.
Linux user #545703
/ is the root of all problems.
Offline
Of course! Thanks Roken!
And for the length in bytes:
headLength=`du -b $header | sed 's/[ \ta-z]*$//'`
Thank you very much guys, helped me a lot!
Offline
Using the test file from my previous post.
Create the header:
sed -n '/{/,/}/p; /}/q' < test_file > header
Extract the body:
comm -13 --nocheck-order header test_file > body
Offline