You are not logged in.

#1 2020-10-01 00:13:32

porcelain1
Member
Registered: 2020-01-18
Posts: 101

The many ways of command-line options

Sorry for the wall of text, but I've been reading about this and I'd like to share a braindump of what I learned.

The first time I came across the notions of the ways one pass options to a command was when I looked up the man page for ps, and it right away explained that it "accepts several kinds of options:"

       1   UNIX options, which may be grouped and must be preceded by a dash.
       2   BSD options, which may be grouped and must not be used with a dash.
       3   GNU long options, which are preceded by two dashes.

And then I began paying attention and not longer simply typing magic character sequences to get what I want done.

So there's also the X toolkit style according to ESR, which "confusingly, uses a single dash and keyword options". He also explains UNIX and GNU styles and their origins, but now I'm left in doubt about BSD. Oh, and we have Microsoft software that accepts options as either keyword and letters, but preceded by a slash like robocopy and chkdsk, and everything written in caps lock. And there are the programs that accept a command -- keywords without preceding dashes -- as the first argument, such as git, udisksctl, pip, apt and others. Last, there's there's the operator (in uppercase) + options style like in pacman, gcc and groff.

I also saw divergences on how to pass an argument to an option. The argument for -O optimization option in GCC receives its argument immediately after the option without anything inbetween, but I guess what is more commonplace is to pass separated by space. I've seen also equal signs (=) being present between the option and the argument, and also colons (:).

Finally, the handling of contradictory arguments. I don't know if there are programs that abort when conflicting options are passed, but I think generally it is given priority to the last option passed like links Web browser when g and -no-g options (for text mode and graphical mode respectively) are passed at the same time. But then there is a program that I can't recall the name that preferred to use plus signal (+) when an option had to have a positive effect instead of a dash, which would in turn disable or have an opposite effect. I think I've seen too man pages describing strict precedence and priority rules for conflicting options, regardless of their positioning.

I think there's more to talk. While some programs by default read from stdin and output to stdout and expect the user to explicitly use -i or -o to read and write to files, others expect a single dash (-) when stdin must be read. How the position of options affect the execution of the program, like in ImageMagick. And how youtube-dl deprecated -t, --id and -A in favor of using output template exclusively with the -o option.

Anyway, I'm trying to get the hang of programming, and I'm wondering many things. What style should I choose? Should I support many styles or if it is okay to pick only one? Or if there are any sort of standards that needs to be followed, which would them be? I guess it depends on use cases, right? I really like the operator + options way of passing arguments, it feels so intuitive.


Behemoth, wake up!

Offline

#2 2020-10-01 01:33:40

GaKu999
Member
From: US/Eastern
Registered: 2020-06-21
Posts: 696

Re: The many ways of command-line options

porcelain1 wrote:

What style should I choose?

For your software? The style that you prefer and it's easy to implement in your argument parser, or just the style your parser accepts

porcelain1 wrote:

Should I support many styles or if it is okay to pick only one?

You are not forced to support every argument style out there, just declare the style in your help|usage|manpage and you're fine
If you have a magic parser that accepts everything and doesn't give you headache then support many styles if you wish, otherwise pick the one you know how to implement

porcelain1 wrote:

Or if there are any sort of standards that needs to be followed, which would them be?

AFAIK, that the options get parsed without bugs tongue

porcelain1 wrote:

I guess it depends on use cases, right?

Yep


My reposSome snippets

Heisenberg might have been here.

Offline

#3 2020-10-01 02:01:25

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 30,410
Website

Re: The many ways of command-line options

I second all of the above.  This is entirely up to you and what's useful to your program.  Does your program even have options?  If not, then there is no reason to concern yourself.  If it has options, do the options also have arguments?  If so, are they mandatory or optional, or does it vary per option?

In any case, many programming languages have tools / libraries to take much of this off your plate and handle it in (relatively) standardized ways.  For shell scripts there is getopt or getopts, in C there is a getopt function in unistd.h, in python there is also a getopt implementation but argparse may be preferred.

That said, I do a fair bit of programming and have very rarely had neeed for any of these.  If your program accepts just one or two options or flags and one or two arguments, it just doesn't seem worth it to me.  The getopt-related tools are definitely useful for something like a shell interpreter or compiler that has a wide range of flags, (optional) arguments, and parameters.

You may also find programs that will document in a specific style, but really accept a much wider range of options.  Many of my programs, for example, would process only the first letter of a flag ignoring any leading hyphens, so all of the following would be treated identically:

command a
command -a
command --a
command --all
command all

But the man page might specify just one or two of these, e.g.:

DESCRIPTION
   -a, --all
      Do foo to all bars

Again, this is practical (and potentially useful) if the program in question has a limited range of options / arguments.  Redundancy can be useful in such cases.  Of course if you need the option / argument syntax to be able to express many more possibilities, then redundancy will get in the way, and you might need to differentiate --all, --and, and --any, or yet other flags that could start with 'a'.

So pick the approach that will work for your program (while also thinking ahead a bit for features you might want to add in the future).  Realize too that these are all just tradition.  Sticking within a tradition can be helpful to provide a familiar interface to experienced users, but still these styles are just traditions.  All executables are simply provided with an array including everything passed on the command line tokenized either by the shell or by the calling exec function.

Last edited by Trilby (2020-10-01 02:10:42)


"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman

Offline

#4 2020-10-01 21:12:43

porcelain1
Member
Registered: 2020-01-18
Posts: 101

Re: The many ways of command-line options

Truly this clears some of my doubts on passing options, since I didn't know in the first place the reasoning for the different forms of passing options in different software. Actually, I think I could safely conclude that anything besides personal taste and ease of implementation, if consistent, could be bikeshedding.

Trilby wrote:

You may also find programs that will document in a specific style, but really accept a much wider range of options.  Many of my programs, for example, would process only the first letter of a flag ignoring any leading hyphens, so all of the following would be treated identically:

command a
command -a
command --a
command --all
command all

This is very unexpected... I tried something similar with youtube-dl, and it actually works passing incomplete options like --audio-f (for --audio-format) or --form (for --format), but further than this (e.g. --audio- mp3) it actually complains:

youtube-dl: error: ambiguous option: --audio- (--audio-format, --audio-quality?)

I experimented passing options surrounded by quotes, and as far I saw, programs would correctly accept them even with quotes inside the option like rm -"R"f or pacman '-Sy'u.

Out of curiosity, I created a file named -Rf among others, and typing rm * actually removes everything, including directories as if the file name was parsed as an option, and thus leaving the file named -Rf intact as if it wasn't a file. Now I see the usefulness of -- to inform explicitly when to stop reading options and begin treating all literally (which I had to use when I failed to run touch -Rf). I tried to create a file named something like /home/bbs/Random/test/important-files just to see if rm would interpret as a location, but I couldn't create the file, and searching online, apparently slashes are illegal for filenames haha

I guess this ends in the code injection category of attacks, like when a SQL command passed inside user input is executed, or the explanation behind always quoting variables inside shell scripts.


Behemoth, wake up!

Offline

#5 2020-10-01 21:26:10

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 30,410
Website

Re: The many ways of command-line options

porcelain1 wrote:

I experimented passing options surrounded by quotes, and as far I saw, programs would correctly accept them even with quotes inside the option like rm -"R"f or pacman '-Sy'u.

That has nothing to do with how 'rm' or 'pacman' process arguments.  The shell removes those quotes before passing the arguments to rm or pacman.  As seen in the following example, 'script' just prints the arguments it receives, one per line:

$ cat script
#!/bin/sh

printf "%s\n" $@

$ ./script -"R"f '-Sy'u
-Rf
-Syu
porcelain1 wrote:

Out of curiosity, I created a file named -Rf among others, and typing rm * actually removes everything

This too has nothing to do with how 'rm' handles arguments.  The * is expanded by the shell before it is passed to rm.  So assuming the following hypothetical content:

$ ls -1 -F
subdir1/
subdir2/
-Rf
file1
file2

When you run 'rm *' in that directory, the 'rm' program is passed the following argument array:

rm
-Rf
subdir1
subdir2
file1
file2

'rm' then interprets this as and -R and -f flag from -Rf as well as 4 targets (subdir1/2 and file1/2, but not a file actually named '-Rf').  Again, this is purely due to shell expansion rules (and may even differ on different shells, though the present behaviors are defined by posix).  There is also nothing (globally or generally) special about "--" in an argument list.  The shell will expand and tokenize input just the same still.  Many programs however (particularly those using certain getopt implementations) will stop looking for arguments / options when it encounters an argument of just "--", everything after that will be interpreted as a target.  But this assumes the program in question even makes a distinction between "arguments" and "targets" - this depends on the logic of the program.

porcelain1 wrote:

I guess this ends in the code injection category of attacks, like when a SQL command passed inside user input is executed, or the explanation behind always quoting variables inside shell scripts.

Those are conceptually related issues, but not really the same.

Last edited by Trilby (2020-10-01 21:30:04)


"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman

Offline

#6 2020-10-19 18:29:03

porcelain1
Member
Registered: 2020-01-18
Posts: 101

Re: The many ways of command-line options

Thanks for the insights on shell expansion, Trilby. I've been progressing my C knowledge and I've finally tried getopt() from unistd.h and really, I previously had some misunderstanding that the program "knew" what was an option, an argument or a filepath, and I never realized how these semantics needs to be deduced (hence parsed) by the program receiving the arguments, and how crazy would be the obligation to handle quotes, think ahead of edge cases like when filesnames look like options or to decide whether to handle * as filename expansions or as part of an expression of grep or find.

From my experiments with getopt() from unistd.h, it's behavior permits, say the synopsis is:

foo [-abc] [-d arg] [-e arg] [-f arg]

To interpret both as the same:

$ foo -abcdef
$ foo -a -b -c -d ef

I saw fuzz online over glibc implementation of getopt(), since it defaults to a non-standard behavior where non-options are permuted to the end of the argument array, which can be only disabled with a non-standard '+' in the beginning of optstring. I initially implemented argument parsing on my own, but in the end this did the trick afaik:

char optstring[] =
#ifdef __GLIBC__
    "+"
#endif
    "abcd:e:f:";
GaKu999 wrote:
porcelain1 wrote:

Or if there are any sort of standards that needs to be followed, which would them be?

AFAIK, that the options get parsed without bugs tongue

Trilby wrote:

You may also find programs that will document in a specific style, but really accept a much wider range of options. [..]

I think this is the most important lesson. Besides all weird syntax permited, it must be fundamental to provide a simple documentation that makes sense.

Meanwhile, I found some interesting things. I'm currently not into shell scripting, but BashFAQ/035 cites options in the form of a single dash followed by a keyword (-foo) as "Tcl-style".

I also stumbled upon a link to the chapter 12 "Utility Conventions" of POSIX.1-2017, which "describes the argument syntax of the standard utilities". Apart from some widespread concepts, I found most interesting how it refers to the term flag as historical, and states that commands should support options and its argument appearing together without any interveining space between them to "ensure continued operation of historical applications" -- I guess this syntax must be one of the oldest, then, merely recognized for backwards compatibility.

Last, I see the how the use of Tcl-style or X toolkit style (-bar) kills opportunities of option grouping (-b -a -r), and I'd say the same for when the option-argument is passed without interveining spaces with its options: is -Tpdf equal to -T pdf or to -fdpT? The uppercase mitigates the confusion, nevertheless.


Behemoth, wake up!

Offline

#7 2020-10-20 00:13:49

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: The many ways of command-line options

If -T requires a value, then anything following it *must* be a value and not be interpreted using grouping.

Last edited by eschwartz (2020-10-20 00:14:13)


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#8 2021-06-01 02:54:02

cyberphiliac
Member
From: Golgafrincham
Registered: 2004-08-20
Posts: 9

Re: The many ways of command-line options

I know this thread is several months old, but I was browsing the forums and noticed that there are some useful things to add here that don't really merit a new thread.

I hope the necrobump gods are appeased by my offerings.

Trilby wrote:
porcelain1 wrote:

I experimented passing options surrounded by quotes, and as far I saw, programs would correctly accept them even with quotes inside the option like rm -"R"f or pacman '-Sy'u.

That has nothing to do with how 'rm' or 'pacman' process arguments.  The shell removes those quotes before passing the arguments to rm or pacman.  As seen in the following example, 'script' just prints the arguments it receives, one per line:

$ cat script
#!/bin/sh

printf "%s\n" $@

$ ./script -"R"f '-Sy'u
-Rf
-Syu

However, 'script' will print an argument it receives across multiple lines if that argument contains any IFS (Internal [or Input] Field Separator) characters. Typically, these are space, tab, and newline (i.e., IFS=$' \t\n').

For example:

$ ./script 'foo bar' yada
foo
bar
yada

$ ./script 'foo
> bar' yada
foo
bar
yada

You can more clearly see what is happening by adding brackets to the printf format string in the script.

For example:

$ cat ./script
#!/bin/sh

printf "[%s]\n" $@

$ ./script 'foo bar' yada
[foo]
[bar]
[yada]

$ ./script 'foo
> bar' yada
[foo]
[bar]
[yada]

To fix this, just double-quote the reference to the '@' special parameter.

For example:

$ cat ./script
#!/bin/sh

printf "[%s]\n" "$@"

$ ./script 'foo bar' yada
[foo bar]
[yada]

$ ./script 'foo
> bar' yada
[foo
bar]
[yada]

The output of the second command line in the previous example still doesn't satisfy the  "one per line" criterion (because of the embedded newline in the first argument), but at least it clearly indicates that the first two lines of output constitute a single argument.

If you use Bash (since version 4.4), you can genuinely achieve the "one per line" objective by applying the 'Q'  parameter transformation operator to the '@' parameter. This will output each argument on a single line (with no literal newlines at all, apart from the one used as the end-of-argument delimiter in the printf format string), and it will be in a form suitable for reuse as input.

For example:

$ cat ./script
#!/bin/bash

printf "%s\n" "${@@Q}"

$ ./script 'foo bar' yada
'foo bar'
'yada'

$ ./script 'foo
> bar' yada
$'foo\nbar'
'yada'

Last edited by cyberphiliac (2021-06-01 02:55:44)

Offline

#9 2022-07-10 22:12:40

porcelain1
Member
Registered: 2020-01-18
Posts: 101

Re: The many ways of command-line options

I read something interesting:

DOUGHERTY; O'REILLV, 1988, p. 74 wrote:

The UNIX hyphen command can be used to print out all of the hyphenation points in a file formatted with nroff or troff -a.

$ nroff options files | hyphen

or:

$ troff options -a files | hyphen

If your system doesn’t have the hyphen command, you can use grep instead:

$ nroff options files | grep '-$'

(The single quotation marks are important because they keep grep from interpreting the - as the beginning of an option.)

Earlier, the it is stated:

DOUGHERTY; O'REILLV, 1988, p. 13 wrote:

The prompt that appears on your screen may be different from the one shown in the examples in this book. There are two widely used shells: the Bourne shell and the C shell. Traditionally, the Bourne shell uses a dollar sign ($) as a system prompt, and the C shell uses a percent sign (%). The two shells differ in the features they provide and in the syntax of their programming constructs. However, they are fundamentally very similar. In this book, we use the Bourne shell.

I wonder if Bash was actually not handling quotes and passing them as-is to grep, leaving to getopt() to parse what we take for granted from shells nowadays.

I'd like to study Bash or Unix source code from back then, and check myself what was going on; nor now I'll just leave it registered here, however.

______

DOUGHERTY, Dale; O'REILLV, Tim. 1988. UNlX Text Processing. 2nd ed. Available at: https://www.oreilly.com/openbook/utp/.


Behemoth, wake up!

Offline

#10 2022-07-11 19:28:42

rowdog
Member
From: East Texas
Registered: 2009-08-19
Posts: 118

Re: The many ways of command-line options

That book was written before bash was released. It's a reference to the original Bourne shell from Bell Labs. Wikipedia has a pretty good history and includes a link to some historical source code from 1979.

https://en.wikipedia.org/wiki/Bourne_shell

Offline

#11 2022-07-11 19:48:33

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 30,410
Website

Re: The many ways of command-line options

While I've never used shells that old, I have used various "descendents" of the Bourne shell on various *nixes all of which handle quoting in similar ways due to shared ancestry.  I'd be quite surprised to find that /bin/sh from 1988 would actually pass quotes verbatim in an argument like that.  So surprised, in fact, that I'd sooner suspect that this was an error in the book and the intent was to have a backslash before the hyphen in the quotes (and perhaps this was lost in some stage of the text processing).

Last edited by Trilby (2022-07-11 19:50:36)


"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman

Offline

Board footer

Powered by FluxBB