buffer_size behavior in boost::iostreams::filtering_istream

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

buffer_size behavior in boost::iostreams::filtering_istream

Boost - Dev mailing list
I have some code that works as intended, but it requires setting a
buffer_size parameter to zero on a std::ifstream pushed onto a filtering
chain, and I'd like to understand why, to ensure I'm not introducing a bug
or a hack.

I have essentially the following code:
--------------------------------------------------------------------------------------------------------
std::ifstream m_jf("json_filename", std::ios_base::in |
std::ios_base::binary);
std::locale utf8_locale("en_US.UTF-8");
m_jf.imbue(utf8_locale);

boost::iostreams::filtering_istream m_inbuf;
m_inbuf.push(boost::iostreams::bzip2_decompressor());
m_inbuf.push(m_jf);

std::string m_line;
while (std::getline(m_inbuf, m_line)) {
  // Process the current line from the JSON file
}
--------------------------------------------------------------------------------------------------------

What I find is that the std::getline call will fail before the code has
reached the EOF. It will always fail at the same line in a given JSON file,
but it will fail on different lines in different JSON files. It's perfectly
reproducible.

However, if I change lines 4 and 5 to
     m_inbuf.push(boost::iostreams::bzip2_decompressor(), *0*);
     m_inbuf.push(m_jf, *0*);
then the problem goes away.

My question is, Why does setting the buffer_size parameter to zero solve
the issue? What does this do, exactly? I saw the suggestion to set the
buffer size this way from an old post in 2009, and it appears to work, but
I'd like a deeper understanding of what's happening under the hood. If the
buffer size is set to zero, what does the underlying implementation do, and
how might this influence whether std::getline fails before the EOF?

Thanks very much,
Justin

--
Justin McManus, Ph.D.
Principal Scientist
Lead Computational Biologist and Statistical Geneticist
Kallyope, Inc.
430 East 29th Street, Suite 1050
New York, NY 10016

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: buffer_size behavior in boost::iostreams::filtering_istream

Boost - Dev mailing list
To follow up on my original post, I have two additional observations:
1.) I'm currently using boost version 1_65_1. In version 1_58_0, the code
always read to the EOF without an issue, even with the default buffering.
2.) I tried making the buffer size arbitrarily large (1e8), but this had
almost no effect at all on the behavior of the code. Since that buffer size
is guaranteed to be large enough to hold any line in the input files I'm
processing, it would seem that a limitation in the buffer size is not the
underlying problem.

On Sat, Sep 5, 2020 at 2:23 PM Justin McManus <[hidden email]> wrote:

> I have some code that works as intended, but it requires setting a
> buffer_size parameter to zero on a std::ifstream pushed onto a filtering
> chain, and I'd like to understand why, to ensure I'm not introducing a bug
> or a hack.
>
> I have essentially the following code:
>
> --------------------------------------------------------------------------------------------------------
> std::ifstream m_jf("json_filename", std::ios_base::in |
> std::ios_base::binary);
> std::locale utf8_locale("en_US.UTF-8");
> m_jf.imbue(utf8_locale);
>
> boost::iostreams::filtering_istream m_inbuf;
> m_inbuf.push(boost::iostreams::bzip2_decompressor());
> m_inbuf.push(m_jf);
>
> std::string m_line;
> while (std::getline(m_inbuf, m_line)) {
>   // Process the current line from the JSON file
> }
>
> --------------------------------------------------------------------------------------------------------
>
> What I find is that the std::getline call will fail before the code has
> reached the EOF. It will always fail at the same line in a given JSON file,
> but it will fail on different lines in different JSON files. It's perfectly
> reproducible.
>
> However, if I change lines 4 and 5 to
>      m_inbuf.push(boost::iostreams::bzip2_decompressor(), *0*);
>      m_inbuf.push(m_jf, *0*);
> then the problem goes away.
>
> My question is, Why does setting the buffer_size parameter to zero solve
> the issue? What does this do, exactly? I saw the suggestion to set the
> buffer size this way from an old post in 2009, and it appears to work, but
> I'd like a deeper understanding of what's happening under the hood. If the
> buffer size is set to zero, what does the underlying implementation do, and
> how might this influence whether std::getline fails before the EOF?
>
> Thanks very much,
> Justin
>
> --
> Justin McManus, Ph.D.
> Principal Scientist
> Lead Computational Biologist and Statistical Geneticist
> Kallyope, Inc.
> 430 East 29th Street, Suite 1050
> New York, NY 10016
>


--
Justin McManus, Ph.D.
Principal Scientist
Lead Computational Biologist and Statistical Geneticist
Kallyope, Inc.
430 East 29th Street, Suite 1050
New York, NY 10016
(646) 596-3471

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: buffer_size behavior in boost::iostreams::filtering_istream

Boost - Dev mailing list
Perhaps file an issue here?

https://github.com/boostorg/iostreams/issues


On Sat, 5 Sep 2020 at 20:58, Justin McManus via Boost <[hidden email]>
wrote:

> To follow up on my original post, I have two additional observations:
> 1.) I'm currently using boost version 1_65_1. In version 1_58_0, the code
> always read to the EOF without an issue, even with the default buffering.
> 2.) I tried making the buffer size arbitrarily large (1e8), but this had
> almost no effect at all on the behavior of the code. Since that buffer size
> is guaranteed to be large enough to hold any line in the input files I'm
> processing, it would seem that a limitation in the buffer size is not the
> underlying problem.
>
> On Sat, Sep 5, 2020 at 2:23 PM Justin McManus <[hidden email]> wrote:
>
> > I have some code that works as intended, but it requires setting a
> > buffer_size parameter to zero on a std::ifstream pushed onto a filtering
> > chain, and I'd like to understand why, to ensure I'm not introducing a
> bug
> > or a hack.
> >
> > I have essentially the following code:
> >
> >
> --------------------------------------------------------------------------------------------------------
> > std::ifstream m_jf("json_filename", std::ios_base::in |
> > std::ios_base::binary);
> > std::locale utf8_locale("en_US.UTF-8");
> > m_jf.imbue(utf8_locale);
> >
> > boost::iostreams::filtering_istream m_inbuf;
> > m_inbuf.push(boost::iostreams::bzip2_decompressor());
> > m_inbuf.push(m_jf);
> >
> > std::string m_line;
> > while (std::getline(m_inbuf, m_line)) {
> >   // Process the current line from the JSON file
> > }
> >
> >
> --------------------------------------------------------------------------------------------------------
> >
> > What I find is that the std::getline call will fail before the code has
> > reached the EOF. It will always fail at the same line in a given JSON
> file,
> > but it will fail on different lines in different JSON files. It's
> perfectly
> > reproducible.
> >
> > However, if I change lines 4 and 5 to
> >      m_inbuf.push(boost::iostreams::bzip2_decompressor(), *0*);
> >      m_inbuf.push(m_jf, *0*);
> > then the problem goes away.
> >
> > My question is, Why does setting the buffer_size parameter to zero solve
> > the issue? What does this do, exactly? I saw the suggestion to set the
> > buffer size this way from an old post in 2009, and it appears to work,
> but
> > I'd like a deeper understanding of what's happening under the hood. If
> the
> > buffer size is set to zero, what does the underlying implementation do,
> and
> > how might this influence whether std::getline fails before the EOF?
> >
> > Thanks very much,
> > Justin
> >
> > --
> > Justin McManus, Ph.D.
> > Principal Scientist
> > Lead Computational Biologist and Statistical Geneticist
> > Kallyope, Inc.
> > 430 East 29th Street, Suite 1050
> > New York, NY 10016
> >
>
>
> --
> Justin McManus, Ph.D.
> Principal Scientist
> Lead Computational Biologist and Statistical Geneticist
> Kallyope, Inc.
> 430 East 29th Street, Suite 1050
> New York, NY 10016
> (646) 596-3471
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


--
Richard Hodges
[hidden email]
office: +442032898513
home: +376841522
mobile: +376380212

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: buffer_size behavior in boost::iostreams::filtering_istream

Boost - Dev mailing list
It would be courteous (and probably wise) to update to latest Boost version before raising an issue?

Both 1.65 and 1.55 are way out of date.

Hundreds of bugs have been squashed since then!

Users of Boost should plan their development process to update Boost version regularly so as not to
get too far behind like this.

Paul

> -----Original Message-----
> From: Boost <[hidden email]> On Behalf Of Richard Hodges via Boost
> Sent: 6 September 2020 08:55
> To: [hidden email] List <[hidden email]>
> Cc: Richard Hodges <[hidden email]>
> Subject: Re: [boost] buffer_size behavior in boost::iostreams::filtering_istream
>
> Perhaps file an issue here?
>
> https://github.com/boostorg/iostreams/issues
>
>
> On Sat, 5 Sep 2020 at 20:58, Justin McManus via Boost <[hidden email]>
> wrote:
>
> > To follow up on my original post, I have two additional observations:
> > 1.) I'm currently using boost version 1_65_1. In version 1_58_0, the
> > code always read to the EOF without an issue, even with the default buffering.
> > 2.) I tried making the buffer size arbitrarily large (1e8), but this
> > had almost no effect at all on the behavior of the code. Since that
> > buffer size is guaranteed to be large enough to hold any line in the
> > input files I'm processing, it would seem that a limitation in the
> > buffer size is not the underlying problem.
> >
> > On Sat, Sep 5, 2020 at 2:23 PM Justin McManus <[hidden email]> wrote:
> >
> > > I have some code that works as intended, but it requires setting a
> > > buffer_size parameter to zero on a std::ifstream pushed onto a
> > > filtering chain, and I'd like to understand why, to ensure I'm not
> > > introducing a
> > bug
> > > or a hack.
> > >
> > > I have essentially the following code:
> > >
> > >
> > ----------------------------------------------------------------------
> > ----------------------------------
> > > std::ifstream m_jf("json_filename", std::ios_base::in |
> > > std::ios_base::binary); std::locale utf8_locale("en_US.UTF-8");
> > > m_jf.imbue(utf8_locale);
> > >
> > > boost::iostreams::filtering_istream m_inbuf;
> > > m_inbuf.push(boost::iostreams::bzip2_decompressor());
> > > m_inbuf.push(m_jf);
> > >
> > > std::string m_line;
> > > while (std::getline(m_inbuf, m_line)) {
> > >   // Process the current line from the JSON file }
> > >
> > >
> >
----------------------------------------------------------------------------------------------------
----

> > >
> > > What I find is that the std::getline call will fail before the code has
> > > reached the EOF. It will always fail at the same line in a given JSON
> > file,
> > > but it will fail on different lines in different JSON files. It's
> > perfectly
> > > reproducible.
> > >
> > > However, if I change lines 4 and 5 to
> > >      m_inbuf.push(boost::iostreams::bzip2_decompressor(), *0*);
> > >      m_inbuf.push(m_jf, *0*);
> > > then the problem goes away.
> > >
> > > My question is, Why does setting the buffer_size parameter to zero solve
> > > the issue? What does this do, exactly? I saw the suggestion to set the
> > > buffer size this way from an old post in 2009, and it appears to work,
> > but
> > > I'd like a deeper understanding of what's happening under the hood. If
> > the
> > > buffer size is set to zero, what does the underlying implementation do,
> > and
> > > how might this influence whether std::getline fails before the EOF?
> > >
> > > Thanks very much,
> > > Justin
> > >
> > > --
> > > Justin McManus, Ph.D.
> > > Principal Scientist
> > > Lead Computational Biologist and Statistical Geneticist
> > > Kallyope, Inc.
> > > 430 East 29th Street, Suite 1050
> > > New York, NY 10016
> > >
> >
> >
> > --
> > Justin McManus, Ph.D.
> > Principal Scientist
> > Lead Computational Biologist and Statistical Geneticist
> > Kallyope, Inc.
> > 430 East 29th Street, Suite 1050
> > New York, NY 10016
> > (646) 596-3471
> >
> > _______________________________________________
> > Unsubscribe & other changes:
> > http://lists.boost.org/mailman/listinfo.cgi/boost
> >
>
>
> --
> Richard Hodges
> [hidden email]
> office: +442032898513
> home: +376841522
> mobile: +376380212
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: buffer_size behavior in boost::iostreams::filtering_istream

Boost - Dev mailing list
Thanks very much for the replies on this post.

Following Paul's advice, I did some thorough testing with version 1.74.0
(the latest boost version, as of this writing). The new version of boost
seems to resolve the issue entirely. This latest version works without
error regardless of how I set the buffering parameters.

Regards,
Justin

On Sun, Sep 6, 2020 at 7:44 AM Paul A Bristow via Boost <
[hidden email]> wrote:

> It would be courteous (and probably wise) to update to latest Boost
> version before raising an issue?
>
> Both 1.65 and 1.55 are way out of date.
>
> Hundreds of bugs have been squashed since then!
>
> Users of Boost should plan their development process to update Boost
> version regularly so as not to
> get too far behind like this.
>
> Paul
>
> > -----Original Message-----
> > From: Boost <[hidden email]> On Behalf Of Richard Hodges
> via Boost
> > Sent: 6 September 2020 08:55
> > To: [hidden email] List <[hidden email]>
> > Cc: Richard Hodges <[hidden email]>
> > Subject: Re: [boost] buffer_size behavior in
> boost::iostreams::filtering_istream
> >
> > Perhaps file an issue here?
> >
> > https://github.com/boostorg/iostreams/issues
> >
> >
> > On Sat, 5 Sep 2020 at 20:58, Justin McManus via Boost <
> [hidden email]>
> > wrote:
> >
> > > To follow up on my original post, I have two additional observations:
> > > 1.) I'm currently using boost version 1_65_1. In version 1_58_0, the
> > > code always read to the EOF without an issue, even with the default
> buffering.
> > > 2.) I tried making the buffer size arbitrarily large (1e8), but this
> > > had almost no effect at all on the behavior of the code. Since that
> > > buffer size is guaranteed to be large enough to hold any line in the
> > > input files I'm processing, it would seem that a limitation in the
> > > buffer size is not the underlying problem.
> > >
> > > On Sat, Sep 5, 2020 at 2:23 PM Justin McManus <[hidden email]>
> wrote:
> > >
> > > > I have some code that works as intended, but it requires setting a
> > > > buffer_size parameter to zero on a std::ifstream pushed onto a
> > > > filtering chain, and I'd like to understand why, to ensure I'm not
> > > > introducing a
> > > bug
> > > > or a hack.
> > > >
> > > > I have essentially the following code:
> > > >
> > > >
> > > ----------------------------------------------------------------------
> > > ----------------------------------
> > > > std::ifstream m_jf("json_filename", std::ios_base::in |
> > > > std::ios_base::binary); std::locale utf8_locale("en_US.UTF-8");
> > > > m_jf.imbue(utf8_locale);
> > > >
> > > > boost::iostreams::filtering_istream m_inbuf;
> > > > m_inbuf.push(boost::iostreams::bzip2_decompressor());
> > > > m_inbuf.push(m_jf);
> > > >
> > > > std::string m_line;
> > > > while (std::getline(m_inbuf, m_line)) {
> > > >   // Process the current line from the JSON file }
> > > >
> > > >
> > >
>
> ----------------------------------------------------------------------------------------------------
> ----
> > > >
> > > > What I find is that the std::getline call will fail before the code
> has
> > > > reached the EOF. It will always fail at the same line in a given JSON
> > > file,
> > > > but it will fail on different lines in different JSON files. It's
> > > perfectly
> > > > reproducible.
> > > >
> > > > However, if I change lines 4 and 5 to
> > > >      m_inbuf.push(boost::iostreams::bzip2_decompressor(), *0*);
> > > >      m_inbuf.push(m_jf, *0*);
> > > > then the problem goes away.
> > > >
> > > > My question is, Why does setting the buffer_size parameter to zero
> solve
> > > > the issue? What does this do, exactly? I saw the suggestion to set
> the
> > > > buffer size this way from an old post in 2009, and it appears to
> work,
> > > but
> > > > I'd like a deeper understanding of what's happening under the hood.
> If
> > > the
> > > > buffer size is set to zero, what does the underlying implementation
> do,
> > > and
> > > > how might this influence whether std::getline fails before the EOF?
> > > >
> > > > Thanks very much,
> > > > Justin
> > > >
> > > > --
> > > > Justin McManus, Ph.D.
> > > > Principal Scientist
> > > > Lead Computational Biologist and Statistical Geneticist
> > > > Kallyope, Inc.
> > > > 430 East 29th Street, Suite 1050
> > > > New York, NY 10016
> > > >
> > >
> > >
> > > --
> > > Justin McManus, Ph.D.
> > > Principal Scientist
> > > Lead Computational Biologist and Statistical Geneticist
> > > Kallyope, Inc.
> > > 430 East 29th Street, Suite 1050
> > > New York, NY 10016
> > > (646) 596-3471
> > >
> > > _______________________________________________
> > > Unsubscribe & other changes:
> > > http://lists.boost.org/mailman/listinfo.cgi/boost
> > >
> >
> >
> > --
> > Richard Hodges
> > [hidden email]
> > office: +442032898513
> > home: +376841522
> > mobile: +376380212
> >
> > _______________________________________________
> > Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


--
Justin McManus, Ph.D.
Principal Scientist
Lead Computational Biologist and Statistical Geneticist
Kallyope, Inc.
430 East 29th Street, Suite 1050
New York, NY 10016
(646) 596-3471

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost