You are not logged in.

#1 2014-12-10 12:13:46

kopiersperre
Member
Registered: 2011-03-22
Posts: 48

Renaming PDFs by first line in pdftotext output

Dear Arch community,

I would like to rename PDFs with cryptic names into their titles, which are in the first line of the files

pdftotext sw-b-13-0094.pdf
head -n 1 sw-b-13-0094.txt > title
mv sw-b-13-0094.pdf $(title)

But I can't figure out how to do this exactly.

Offline

#2 2014-12-10 13:13:38

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 30,332
Website

Re: Renaming PDFs by first line in pdftotext output

You're just about there.  The problem is your second line creates a file called title, then the third line tries to execute that file to get the new name.  You could add "cat" to the third line as follows.  This should work, but it would not be my recommended approach:

pdftotext sw-b-13-0094.pdf
head -n 1 sw-b-13-0094.txt > title
mv sw-b-13-0094.pdf $(cat title)

Instead, it'd be much cleaner to just use a shell variable:

pdftotext sw-b-13-0094.pdf
title=$(head -n 1 sw-b-13-0094.txt)
mv sw-b-13-0094.pdf $title

But this can be further improved by not littering all these text files all over - instead use a pipeline rather than actually creating a txt file:

mv sw-b-13-0094.pdf "$(pdftotext sw-b-13-0094.pdf | head -n 1).pdf"

Now, hopefully it should be clear how you can even replace the current pdf filename (sw-b-13...) with a parameter ($1) for a script or shell function - or have this loop through all pdf files in a directory.  If you want help with that too, let us know.

EDIT: be careful to ensure that the first line of pdftotext actually has something meaningful.  If all the pdfs were created in the same way, this might be known.  But it is common for some whitespace or formatting character to be the first line.


"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman

Offline

#3 2014-12-10 14:03:01

bulletmark
Member
From: Brisbane, Australia
Registered: 2013-10-22
Posts: 681

Re: Renaming PDFs by first line in pdftotext output

Personally, I think it is clearer to split that into 2 lines:

title=$(pdftotext $doc - | head -n 1)
mv $doc "$title".pdf

BTW, you need that - to get pdftotext to output to standard output.

Offline

#4 2014-12-10 14:39:11

kopiersperre
Member
Registered: 2011-03-22
Posts: 48

Re: Renaming PDFs by first line in pdftotext output

Works great with a bash loop. Thanks a lot!

Sometimes  the title is too long and spans over a second line. May you help me solving this too?

Offline

#5 2014-12-11 01:19:34

bulletmark
Member
From: Brisbane, Australia
Registered: 2013-10-22
Posts: 681

Re: Renaming PDFs by first line in pdftotext output

OP, you need to be a little clearer in your requirements. How are you to decide when to read 2 head lines instead of 1, etc? Anyhow to grab 2 lines, joined by a single space, as the title then an approach could be:

title=$(pdftotext $doc - | head -n 2 | paste -d' ' -s)

Last edited by bulletmark (2014-12-11 01:20:00)

Offline

Board footer

Powered by FluxBB