You are not logged in.
Pages: 1
I'm trying to open pdf, docx, xlsx and other files on command line. I have lesspipe installed, but it fails with similar errors:
lesspipe.sh file.pdf
==> append : to filename to view the PDF source
usage: html2text [-h] [--default-image-alt DEFAULT_IMAGE_ALT] [--pad-tables] [--no-wrap-links] [--wrap-list-items] [--ignore-emphasis]
[--reference-links] [--ignore-links] [--protect-links] [--ignore-images] [--images-as-html] [--images-to-alt]
[--images-with-size] [-g] [-d] [-e] [-b BODY_WIDTH] [-i LIST_INDENT] [-s] [--escape-all] [--bypass-tables] [--ignore-tables]
[--single-line-break] [--unicode-snob] [--no-automatic-links] [--no-skip-internal-links] [--links-after-para] [--mark-code]
[--decode-errors DECODE_ERRORS] [--open-quote OPEN_QUOTE] [--close-quote CLOSE_QUOTE] [--version]
[filename] [encoding]
html2text: error: unrecognized arguments: -from_encoding
lesspipe.sh file.xlsx
==> append : to filename to view the raw word document
usage: html2text [-h] [--default-image-alt DEFAULT_IMAGE_ALT] [--pad-tables] [--no-wrap-links] [--wrap-list-items] [--ignore-emphasis]
[--reference-links] [--ignore-links] [--protect-links] [--ignore-images] [--images-as-html] [--images-to-alt]
[--images-with-size] [-g] [-d] [-e] [-b BODY_WIDTH] [-i LIST_INDENT] [-s] [--escape-all] [--bypass-tables] [--ignore-tables]
[--single-line-break] [--unicode-snob] [--no-automatic-links] [--no-skip-internal-links] [--links-after-para] [--mark-code]
[--decode-errors DECODE_ERRORS] [--open-quote OPEN_QUOTE] [--close-quote CLOSE_QUOTE] [--version]
[filename] [encoding]
html2text: error: unrecognized arguments: -from_encodingThe above is the most frequent case; for some files it simply prints nothing, but so I haven't successfully opened a non-text file. I do have libreoffice and pandoc installed, and I don't know what the problem can be.
Edit: This seems to be a bug in lesspipe; I was able to fix it by editing the script.
Last edited by Togop (2021-10-03 21:08:08)
Offline
I have not used lesspipe myself, but reading the README and man page suggest that lesspipe enables less to view other file formats i.e. try `less file.pdf`.
Otherwise could you share the output of
echo $LESSOPEN
which lesspipe.shHave you sourced your shellrc / restarted your shell after changing the environment?
Also always provide what you have done as otherwise the only replys you get are guesswork at best.
Edit: fixed link
Last edited by lmn (2021-10-03 12:52:09)
Offline
Well, less simply calls lesspipe.sh to decode the file, so I get the exact same error. It just gets printed in the less screen.
What I've done is to install the lesspipe package, make sure I have listed dependencies for the file formats I'm interested in (libreoffice-fresh, pandoc, pfdtotext, pdftohtml), and tried to read some files. I got the error instead of the file contents.
Offline
Edit: This seems to be a bug in lesspipe; I was able to fix it by editing the script.
Same problem here. Would you share what exactly (which file and which lines) needs editing?
Offline
I can not speak for Togop but I have tested lesspipe myself and can confirm this behavior.
The problematic lines seem to be in the `parsehtml` function in `/usr/bin/lesspipe.sh` one example
html2text -utf8 2>/dev/null || html2text -from_encoding utf-8so the problem arises from the nonexistent `-utf8` option.
There are several differing implementations for html2text.
[1] https://github.com/Alir3z4/html2text
[2] http://www.mbayer.de/html2text
There are some more.
[1] is the implementation packaged in the repos that provide `html2text` and [2] is another one explicitly mentioned in the README.
[2] does provide the `-from_encoding` option.
All of these are incompatible as they have different flags/syntax and lesspipe.sh is trying to accommodate at least 2 different ones (I cant say which)
When lesspipe does not detect `html2text` on the system it falls back on another way of parsing files. I managed to get pdfs working by uninstalling `python-html2text`.
We should clarify whether this is an packaging error in the sense that this optional dependency does not provide for this functionality or if this is an upstream bug.
Also there is already a Github issue for this.
PS:
I could fix it by essentially separating the dash from utf8 and make it adhere to the syntax of the packaged `html2text`
html2text - utf8 2>/dev/null || html2text -from_encoding utf-8This was just for testing please avoid meddling with packaged files.
Edit: added warning
Last edited by lmn (2021-10-04 13:44:11)
Offline
Thank you, lmn!
I could not uninstall 'python-html2text' cause it's needed by calibre. I actually do not use calibre but keep it cause sometimes I use its 'ebook-convert' option.
Anyway, I edited the '/usr/bin/lesspipe.sh' file with your suggestion and got it to work.
Offline
Pages: 1