[program_options] Proposal: self-contained, header-only port of Boost Program Options library

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
I am interested in creating a header-only implementation of the Boost
Program
Options library that only depends on the C++ standard library. Program
Options
uses several other Boost libraries, so I would have to re-implement some of
it using standard library constructs.

I have 2 questions for the community:
1. Would you use something like this if it were available?
2. Do you know of any implementation details of Program Options which might
make some part of this difficult or impossible?

To be clear, I do not intend for this to be merged into Boost in any form.

Rationale:
There is no portable command-line argument-parsing capability in the C++
standard library. There's getopt, but that's in unistd.h which is only
available on Unix-based systems. The only widely-used C++ command-line
parsing
library I am aware of is Program Options, but that requires adding a
dependency
on Boost to your project, which seems like overkill to me. I would like to
be
able to simply add a project as a submodule in my Git repo and #include it
without even having to add anything to my build files. The goal is to ensure
that the library is as portable and easy to include as possible, because it
shouldn't be difficult to parse command-line options.

I appreciate any thoughts, comments, or criticisms!

-Vicram Rajagopalan

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
On Thu, 12 Sep 2019 at 08:04, Vicram Rajagopalan via Boost <
[hidden email]> wrote:

> I would like to
> be
> able to simply add a project as a submodule in my Git repo and #include it
> without even having to add anything to my build files.
>

Why not try the recently proposed https://bfgroup.github.io/Lyra/ ,
released under the Boost Software License. Lyra is forked (from dormant
Clara) and maintained by Rene Rivera, who is also a contributor to Boost
(and Conan I believe).

degski
--
@realdegski
https://brave.com/google-gdpr-workaround/
"We value your privacy, click here!" Sod off! - degski
"Anyone who believes that exponential growth can go on forever in a finite
world is either a madman or an economist" - Kenneth E. Boulding
"Growth for the sake of growth is the ideology of the cancer cell" - Edward
P. Abbey

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 12.09.19 05:02, Vicram Rajagopalan via Boost wrote:
> 1. Would you use something like this if it were available?

I would not use it because I do not use Boost Program Options, and I do
not expect a straight port to solve the problems I have with Boost
Program Options.  These problems are:

1. Unicode support is based on wchar_t instead of utf8.  wchar_t has an
implementation-defined width which makes it unsuitable for portable
Unicode code.  The correct way to handle Unicode in general is to use
narrow strings encoded as utf-8.  The correct way to handle Unicode on
Unix systems is to accept narrow strings and to assume that they are in
utf-8, regardless of locale.  The correct way to handle Unicode on
Windows is to accept wide strings and convert them to utf-8 immediately
when received.

I could, of course, perform my own conversion to utf-8 and pass the
result to Boost Program Options, but that approach seems brittle given
that Boost Program Options assumes that 8-bit strings are in the "local
8-bit encoding".

2. I have found that code that uses Boost Program Options is neither
easier to write nor more maintainable than code which parses command
line options manually.


--
Rainer Deyke ([hidden email])


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 2019-09-12 06:02, Vicram Rajagopalan via Boost wrote:
> I am interested in creating a header-only implementation of the Boost
> Program
> Options library that only depends on the C++ standard library. Program
> Options
> uses several other Boost libraries, so I would have to re-implement some of
> it using standard library constructs.
>
> I have 2 questions for the community:
> 1. Would you use something like this if it were available?

No.

> 2. Do you know of any implementation details of Program Options which might
> make some part of this difficult or impossible?

Even if you require C++11, there is a considerable amount of Boost used
in ProgramOptions:

https://pdimov.github.io/boostdep-report/develop/program_options.html

It is not impossible to reimplement all that or redesign the library to
not require some of the components, but that would be a considerable
amount of work.

> To be clear, I do not intend for this to be merged into Boost in any form.
>
> Rationale:
> There is no portable command-line argument-parsing capability in the C++
> standard library. There's getopt, but that's in unistd.h which is only
> available on Unix-based systems. The only widely-used C++ command-line
> parsing
> library I am aware of is Program Options, but that requires adding a
> dependency
> on Boost to your project, which seems like overkill to me. I would like to
> be
> able to simply add a project as a submodule in my Git repo and #include it
> without even having to add anything to my build files. The goal is to ensure
> that the library is as portable and easy to include as possible, because it
> shouldn't be difficult to parse command-line options.
>
> I appreciate any thoughts, comments, or criticisms!

Boost is almost an implicit dependency of any of my projects, I find
myself using it extensively, so the dependency on it is not a problem.
Adding yet another dependency might be problematic, especially given
that there is Boost.ProgramOptions already.

I understand there probably are projects that need nothing but
Boost.ProgramOptions, where a standalone version might be useful.
However, I do not believe reimplementing well-known components, like
boost::any or boost::function or type traits for example, is a good
approach. As I said, you can mitigate some of this by raising the
minimum C++ version you require, but I don't believe raising it to e.g.
C++17 would ease the library adoption.

Another point is that I'm not quite happy with the API
Boost.ProgramOptions provides. If there is a new library, I would
probably prefer a simpler API, possibly employing C++11 features, rather
than a straight reimplementation. The new library should offer something
new compared to the existing solutions.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

[program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
> Gesendet: Donnerstag, 12. September 2019 um 05:02 Uhr
> Von: "Vicram Rajagopalan via Boost" <[hidden email]>
>
> I am interested in creating a header-only implementation of the Boost
> Program
> Options library that only depends on the C++ standard library. Program
> Options
> uses several other Boost libraries, so I would have to re-implement some of
> it using standard library constructs.
>
> I have 2 questions for the community:
> 1. Would you use something like this if it were available?

Probably not. If boost is a dependency anyway, I'd probably stick to the
boost version and if I don't use boost, I'm perfectly happy with one of the
alternatives (cxxopts, clara - to name two).
A modernized version of Boost.ProgramOptions that is part of boost would
probably be more appealing to me.

> 2. Do you know of any implementation details of Program Options which might
> make some part of this difficult or impossible?

Depends on what c++ standard you are targeting. I tried this myself some
time ago (nothing production ready, just a quick POC). I can't remember the
details, but I think I used a lot of c++17 features, so if you target a lower
standard you probably have to internalize a lot of other boost facilities.

Best

Mike

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
degski wrote:

> Why not try the recently proposed https://bfgroup.github.io/Lyra/ ,
> released under the Boost Software License. Lyra is forked (from dormant
> Clara) and maintained by Rene Rivera, who is also a contributor to Boost
> (and Conan I believe).

Lyra looks pretty good.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
On Thu, Sep 12, 2019 at 7:30 AM Peter Dimov via Boost <[hidden email]>
wrote:

> degski wrote:
>
> > Why not try the recently proposed https://bfgroup.github.io/Lyra/ ,
> > released under the Boost Software License. Lyra is forked (from dormant
> > Clara) and maintained by Rene Rivera, who is also a contributor to Boost
> > (and Conan I believe).
>
> Lyra looks pretty good.
>

I agree with a lot of the points raised above about the problematic nature
of Boost.ProgramOptions.  I also think Lyra looks interesting.

If you're interesting in solving problems in this space, rather than doing
a straight port, here are some things I would find very helpful, not all of
which Lyra provides:

- An options-specifying API similar to Python's argparse library (
https://docs.python.org/2/library/argparse.html).  That covers all the
permutations I've ever needed, and then some.
- The ability to serialize the options, so that I can easily use "response
files" (files containing command line options or some serialized form of
them), and/or hand-editable config files.  I find YAML to be an attractive
format for saving such things. YMMV.

Zach

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Thu, Sep 12, 2019 at 2:57 AM Rainer Deyke via Boost
<[hidden email]> wrote:
> I would not use it because I do not use Boost Program Options, and I do
> not expect a straight port to solve the problems I have with Boost
> Program Options.

My initial thinking was that to increase likelihood of adoption, it would
be a good idea to provide an interface that people are familiar with, but
it's true that the API has a lot of issues. This proposal may be a
non-starter, which is fine; I'm glad to get constructive feedback.

> 1. Unicode support is based on wchar_t instead of utf8.  wchar_t has an
> implementation-defined width which makes it unsuitable for portable
> Unicode code.  The correct way to handle Unicode in general is to use
> narrow strings encoded as utf-8.  The correct way to handle Unicode on
> Unix systems is to accept narrow strings and to assume that they are in
> utf-8, regardless of locale.  The correct way to handle Unicode on
> Windows is to accept wide strings and convert them to utf-8 immediately
> when received.

I'm not too familiar with dealing with non-ASCII character encodings
in argv. Is it portable to assume that the input is UTF-8, regardless of
locale?

-Vicram Rajagopalan

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Thu, Sep 12, 2019 at 11:01 AM Zach Laine via Boost
<[hidden email]> wrote:
> I agree with a lot of the points raised above about the problematic nature
> of Boost.ProgramOptions.  I also think Lyra looks interesting.

Regarding Lyra, I took a look at the Github repo a few weeks ago, but
as far as I could tell, it hasn't gained much traction. That was just my
impression from the low number of stars, watches, issues, and pull
requests. Are there any particular reasons that y'all recommend Lyra
in particular?
Granted, development/maintenance does seem to be active, which
is a good sign in my book.

> If you're interesting in solving problems in this space, rather than doing
> a straight port, here are some things I would find very helpful, not all of
> which Lyra provides:

I suppose what I'd really like to see is a de-facto standard; right now,
it doesn't seem that one exists. Given that Boost.ProgramOptions is
not a particularly good example to follow, the best use of my time
may be to contribute to a healthy project. cxxopts is one that caught
my eye, as it seems more well-known than most other similar
projects. Does anyone have any impressions of cxxopts (or others)?

> - The ability to serialize the options, so that I can easily use "response
> files" (files containing command line options or some serialized form of
> them), and/or hand-editable config files.

Judging from the documentation, the Gflags library (Google's command-
line flags library) supports something like this, which they call a "flagfile":
https://gflags.github.io/gflags/
I've never used Gflags so I can't speak to whether it's any good.

-Vicram Rajagopalan

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 17/09/2019 16:10, Vicram Rajagopalan wrote:
> I'm not too familiar with dealing with non-ASCII character encodings
> in argv. Is it portable to assume that the input is UTF-8, regardless of
> locale?

It is not.

I'm probably ignorant of several things in this area myself, but the
basic version is:

* On Windows, argv is converted to the current system codepage unless
you are using the wmain/wWinMain entrypoints to get wchar_t strings
instead.  (And you should never ever use the converted values, as they
will only sometimes work, due to being a lossy conversion.)  It will
never be UTF-8, but you can rely on it being UTF-16 (when using
wmain/wWinMain).

* On Unixes, argv contains whatever byte sequence the shell/caller put
there.  This might be the actual filename on disk (if they used tab
completion) or it might be something subtly different (if they typed it
themselves using some kind of IME), or even a binary blob.  In the first
two cases, while it is fairly *likely* to be UTF-8 (especially in modern
systems), it is not guaranteed to be -- the user could be running a
non-UTF-8 locale, or be accessing a filesystem created by someone who
was.  Ideally, treat them as an opaque blob that can only be passed to
open() etc and never manipulated as text.  (Obviously, this is
frequently impractical.)


So, on Windows, you must use the wchar_t as input, and while you *could*
convert this to UTF-8 for internal use you still have to convert it back
to UTF-16 to actually make use of it with the OS.  Which is fine if
you're doing a lot of string manipulation (including option parsing) but
seems a bit wasteful if you're only using it as an opaque filename
token.  (And if you forget to convert back to UTF-16, it may interpret
your UTF-8 string as a local-codepage-ANSI string, and hilarity ensues.)

Whereas on Linux you can often get away with assuming that it's UTF-8,
but some valid filenames will break encoder-savvy code, and any string
conversions might output a no-longer-valid filename.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
abseil flags is good and well supported https://abseil.io/docs/cpp/guides/flags

J

> On 17 Sep 2019, at 07:32, Gavin Lambert via Boost <[hidden email]> wrote:
>
>> On 17/09/2019 16:10, Vicram Rajagopalan wrote:
>> I'm not too familiar with dealing with non-ASCII character encodings
>> in argv. Is it portable to assume that the input is UTF-8, regardless of
>> locale?
>
> It is not.
>
> I'm probably ignorant of several things in this area myself, but the basic version is:
>
> * On Windows, argv is converted to the current system codepage unless you are using the wmain/wWinMain entrypoints to get wchar_t strings instead.  (And you should never ever use the converted values, as they will only sometimes work, due to being a lossy conversion.)  It will never be UTF-8, but you can rely on it being UTF-16 (when using wmain/wWinMain).
>
> * On Unixes, argv contains whatever byte sequence the shell/caller put there.  This might be the actual filename on disk (if they used tab completion) or it might be something subtly different (if they typed it themselves using some kind of IME), or even a binary blob.  In the first two cases, while it is fairly *likely* to be UTF-8 (especially in modern systems), it is not guaranteed to be -- the user could be running a non-UTF-8 locale, or be accessing a filesystem created by someone who was.  Ideally, treat them as an opaque blob that can only be passed to open() etc and never manipulated as text.  (Obviously, this is frequently impractical.)
>
>
> So, on Windows, you must use the wchar_t as input, and while you *could* convert this to UTF-8 for internal use you still have to convert it back to UTF-16 to actually make use of it with the OS.  Which is fine if you're doing a lot of string manipulation (including option parsing) but seems a bit wasteful if you're only using it as an opaque filename token.  (And if you forget to convert back to UTF-16, it may interpret your UTF-8 string as a local-codepage-ANSI string, and hilarity ensues.)
>
> Whereas on Linux you can often get away with assuming that it's UTF-8, but some valid filenames will break encoder-savvy code, and any string conversions might output a no-longer-valid filename.
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 17.09.19 08:32, Gavin Lambert via Boost wrote:
> * On Unixes, argv contains whatever byte sequence the shell/caller put
> there.  This might be the actual filename on disk (if they used tab
> completion) or it might be something subtly different (if they typed it
> themselves using some kind of IME), or even a binary blob.  In the first
> two cases, while it is fairly *likely* to be UTF-8 (especially in modern
> systems), it is not guaranteed to be -- the user could be running a
> non-UTF-8 locale, or be accessing a filesystem created by someone who
> was.

Or the user could be running a non-UTF-8 locale, but accessing a
filesystem created by somebody who was using UTF-8 - in which case any
filenames should be in UTF-8, even if the user's locale disagrees.

It is because of this last possibility that I recommend treating all
command-line arguments as UTF-8 on Unix systems, even if running a
non-UTF-8 locale, for all cases where treating them as binary blobs is
impractical.  Unix filenames are binary blobs, but the de-facto standard
for interpreting these binary blobs as text is to use UTF-8.  How can
two users, running two different locales, share a filesystem?  By using
UTF-8 for all filenames, regardless of locale.  How should a program
convert command-line arguments into UTF-8 filenames?  By assuming that
they are already in UTF-8, because performing any kind of conversion
will cause more problems than it will fix.


--
Rainer Deyke ([hidden email])


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
Rainer Deyke wrote:
> Or the user could be running a non-UTF-8 locale, but accessing a
> filesystem created by somebody who was using UTF-8 - in which case any
> filenames should be in UTF-8, even if the user's locale disagrees.
>
> It is because of this last possibility that I recommend treating all
> command-line arguments as UTF-8 on Unix systems, even if running a
> non-UTF-8 locale, for all cases where treating them as binary blobs is
> impractical.  Unix filenames are binary blobs, but the de-facto standard
> for interpreting these binary blobs as text is to use UTF-8. [...]

How does any of this affect the library? It just gives you whatever you
passed as `argv`, without needing to interpret it.

Windows is a different story.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, Sep 17, 2019 at 12:02 AM Vicram Rajagopalan via Boost <
[hidden email]> wrote:

> On Thu, Sep 12, 2019 at 11:01 AM Zach Laine via Boost
> <[hidden email]> wrote:
> > I agree with a lot of the points raised above about the problematic
> nature
> > of Boost.ProgramOptions.  I also think Lyra looks interesting.
>
> Regarding Lyra, I took a look at the Github repo a few weeks ago, but
> as far as I could tell, it hasn't gained much traction. That was just my
> impression from the low number of stars, watches, issues, and pull
> requests. Are there any particular reasons that y'all recommend Lyra
> in particular?
>

I only just heard about it in this thread.


> Granted, development/maintenance does seem to be active, which
> is a good sign in my book.
>
> > If you're interesting in solving problems in this space, rather than
> doing
> > a straight port, here are some things I would find very helpful, not all
> of
> > which Lyra provides:
>
> I suppose what I'd really like to see is a de-facto standard; right now,
> it doesn't seem that one exists. Given that Boost.ProgramOptions is
> not a particularly good example to follow, the best use of my time
> may be to contribute to a healthy project. cxxopts is one that caught
> my eye, as it seems more well-known than most other similar
> projects. Does anyone have any impressions of cxxopts (or others)?
>

IMO, what made libfmt (which became C++20's std::format) a success is that
it took an existing and popular API for string formatting (from Python) and
implemented it efficiently.  If you were to do the same thing with Python's
argparse, I think the result would be similar.  I say this because all the
libraries above, and probably others besides, are each taking a particular
point of view, API-wise, that has not necessarily caught on.  Perhaps one
of them will, I don't know.  I *do* know that the argparse API has been
stable for years, and covers every scenario for handling command line
arguments that I have seen.

Zach

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [program_options] Proposal: self-contained, header-only port of Boost Program Options library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, Sep 17, 2019 at 8:17 AM Peter Dimov via Boost <[hidden email]>
wrote:

> Rainer Deyke wrote:
> > Or the user could be running a non-UTF-8 locale, but accessing a
> > filesystem created by somebody who was using UTF-8 - in which case any
> > filenames should be in UTF-8, even if the user's locale disagrees.
> >
> > It is because of this last possibility that I recommend treating all
> > command-line arguments as UTF-8 on Unix systems, even if running a
> > non-UTF-8 locale, for all cases where treating them as binary blobs is
> > impractical.  Unix filenames are binary blobs, but the de-facto standard
> > for interpreting these binary blobs as text is to use UTF-8. [...]
>
> How does any of this affect the library? It just gives you whatever you
> passed as `argv`, without needing to interpret it.
>
> Windows is a different story.
>

Indeed, you can just use UTF-8 (as long as you document this!) for
everything except Windows.  With Windows, you need to provide a
wchar_t/UTF-16 overload for every char/UTF-8 overload in your lib.

If you want 100% correctness, you are not allowed to arbitrarily convert
the wchar_t strings.  In particular, you are not allowed to convert them to
UTF-8, because it is possible that one of them is a filename, and it is
possible to construct filenames on the Windows platform that are not
properly UTF-16-encoded.  This means that the UTF-16 -> UTF-8 conversion is
lossy, if you follow the Unicode guidelines for that conversion -- you
should produce a replacement character (U+FFFD) where you encounter the
broken UTF-16.

Though such broken-UTF-16-named files are possible to create, they do not
come up often in practice; they almost never do.  So, if you don't care
about this case that prevents 100% correctness, just provide wchar_t
overloads, and implement each one by converting to UTF-8 and calling your
UTF-8 overload, and only define the wchar_t overloads when building on
Windows.

Zach

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost