[Openvpn-devel] Bug-fix: segfault in dco_get_peer_stats()

Message ID 20230327171236.51771-1-selva.nair@gmail.com
State Accepted
Headers show
Series [Openvpn-devel] Bug-fix: segfault in dco_get_peer_stats() | expand

Commit Message

Selva Nair March 27, 2023, 5:12 p.m. UTC
From: Selva Nair <selva.nair@gmail.com>

  We persist peer-stats when restarting, but an early restart
  before open_tun results in a segfault in dco_get_peer_stats().
  To reproduce, trigger a TLS handshake error due to lack of common
  protocols, for example.

  Fix by checking  that tuntap is defined before dereferencing it.

Signed-off-by: Selva Nair <selva.nair@gmail.com>
---
I'm not entirely sure this is the right place to fix this.
Or is it the caller at fault exercising  dco_get_peer_stats()
when tuntap is not set?

 src/openvpn/dco_linux.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Gert Doering March 27, 2023, 6:53 p.m. UTC | #1
Hi,

On Mon, Mar 27, 2023 at 01:12:36PM -0400, selva.nair@gmail.com wrote:
> From: Selva Nair <selva.nair@gmail.com>
> 
>   We persist peer-stats when restarting, but an early restart
>   before open_tun results in a segfault in dco_get_peer_stats().
>   To reproduce, trigger a TLS handshake error due to lack of common
>   protocols, for example.

This definitely looks like a good catch, and the "TLS handshake error"
is part of what is needed to trigger it - "just having a connection
failure" (TCP refused) is *not*, but "--tls-version-max 1.0" *in
connection with* --management + "bytecount" is...

2023-03-27 20:52:41 TCP connection established with [AF_INET6]2607:fc50:1001:5200::4:51194
2023-03-27 20:52:41 TCPv6_CLIENT link local: (not bound)
2023-03-27 20:52:41 TCPv6_CLIENT link remote: [AF_INET6]2607:fc50:1001:5200::4:51194
Segmentation fault

interesting combination of features.  Need to extend my test rig to
add management-driven tests, and also TLS failure tests (*using management*).

gert
Antonio Quartulli March 27, 2023, 8:30 p.m. UTC | #2
Hi,

On 27/03/2023 19:12, selva.nair@gmail.com wrote:
> From: Selva Nair <selva.nair@gmail.com>
> 
>    We persist peer-stats when restarting, but an early restart
>    before open_tun results in a segfault in dco_get_peer_stats().
>    To reproduce, trigger a TLS handshake error due to lack of common
>    protocols, for example.
> 
>    Fix by checking  that tuntap is defined before dereferencing it.
> 
> Signed-off-by: Selva Nair <selva.nair@gmail.com>

Nice catch! Thanks a lot!

> ---
> I'm not entirely sure this is the right place to fix this.
> Or is it the caller at fault exercising  dco_get_peer_stats()
> when tuntap is not set?

Indeed that was my assumption.

Does somebody have the stacktrace?
I wanted to check where this call is coming from exactly.

Imho it'd be reasonable if the caller could check if we have a device 
before invoking any DCO API.

Alla other DCO API just assume that everything was properly initialized 
before being invoked, therefore it'd be nice to keep the same assumption 
here.

(better would even be if this gets_stats function would not require to 
fiddle with a struct context at all....but that's material for master)

Cheers,

> 
>   src/openvpn/dco_linux.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/src/openvpn/dco_linux.c b/src/openvpn/dco_linux.c
> index 317f9dc0..41540c0f 100644
> --- a/src/openvpn/dco_linux.c
> +++ b/src/openvpn/dco_linux.c
> @@ -975,6 +975,11 @@ dco_get_peer_stats(struct context *c)
>       uint32_t peer_id = c->c2.tls_multi->dco_peer_id;
>       msg(D_DCO_DEBUG, "%s: peer-id %d", __func__, peer_id);
>   
> +    if (!c->c1.tuntap)
> +    {
> +        return 0;
> +    }
> +
>       dco_context_t *dco = &c->c1.tuntap->dco;
>       struct nl_msg *nl_msg = ovpn_dco_nlmsg_create(dco, OVPN_CMD_GET_PEER);
>       struct nlattr *attr = nla_nest_start(nl_msg, OVPN_ATTR_GET_PEER);
Selva Nair March 27, 2023, 8:42 p.m. UTC | #3
On Mon, Mar 27, 2023 at 4:30 PM Antonio Quartulli <a@unstable.cc> wrote:

> Hi,
>
> On 27/03/2023 19:12, selva.nair@gmail.com wrote:
> > From: Selva Nair <selva.nair@gmail.com>
> >
> >    We persist peer-stats when restarting, but an early restart
> >    before open_tun results in a segfault in dco_get_peer_stats().
> >    To reproduce, trigger a TLS handshake error due to lack of common
> >    protocols, for example.
> >
> >    Fix by checking  that tuntap is defined before dereferencing it.
> >
> > Signed-off-by: Selva Nair <selva.nair@gmail.com>
>
> Nice catch! Thanks a lot!
>
> > ---
> > I'm not entirely sure this is the right place to fix this.
> > Or is it the caller at fault exercising  dco_get_peer_stats()
> > when tuntap is not set?
>
> Indeed that was my assumption.
>
> Does somebody have the stacktrace?
> I wanted to check where this call is coming from exactly.
>

I didn't save the stacktrace.. The call comes through

 persist_client_stats(c); (openvpn.c ~ line 100)

which gets called on exiting the main eventloop for restart. Without it we
used to lose
the statistics on restart.

Imho it'd be reasonable if the caller could check if we have a device
> before invoking any DCO API.
>

Sure, I would think that the caller of dco_get_peer_stats() should just
pass the dco handle to it
and get the stats back. It's debatable who should check whether
the dco handle is valid or not.

But this may not be the time and place for a refactor. As dco is young, I
think it can mature in 2.7..

Selva



>
> Cheers,
>
> >
> >   src/openvpn/dco_linux.c | 5 +++++
> >   1 file changed, 5 insertions(+)
> >
> > diff --git a/src/openvpn/dco_linux.c b/src/openvpn/dco_linux.c
> > index 317f9dc0..41540c0f 100644
> > --- a/src/openvpn/dco_linux.c
> > +++ b/src/openvpn/dco_linux.c
> > @@ -975,6 +975,11 @@ dco_get_peer_stats(struct context *c)
> >       uint32_t peer_id = c->c2.tls_multi->dco_peer_id;
> >       msg(D_DCO_DEBUG, "%s: peer-id %d", __func__, peer_id);
> >
> > +    if (!c->c1.tuntap)
> > +    {
> > +        return 0;
> > +    }
> > +
> >       dco_context_t *dco = &c->c1.tuntap->dco;
> >       struct nl_msg *nl_msg = ovpn_dco_nlmsg_create(dco,
> OVPN_CMD_GET_PEER);
> >       struct nlattr *attr = nla_nest_start(nl_msg, OVPN_ATTR_GET_PEER);
>
> --
> Antonio Quartulli
>
Selva Nair March 27, 2023, 8:50 p.m. UTC | #4
Hi,

I should have added that as written the fault is with get_dco_peer stats().
It takes a pointer to "struct context" and dereferences a couple of layers
below it. The caller has no idea what it wants to see set in c. So, until
that is changed on all platforms (right now windows and linux), too much to
ask the caller to check that a valid dco handle is available.

Selva

On Mon, Mar 27, 2023 at 4:42 PM Selva Nair <selva.nair@gmail.com> wrote:

>
>
> On Mon, Mar 27, 2023 at 4:30 PM Antonio Quartulli <a@unstable.cc> wrote:
>
>> Hi,
>>
>> On 27/03/2023 19:12, selva.nair@gmail.com wrote:
>> > From: Selva Nair <selva.nair@gmail.com>
>> >
>> >    We persist peer-stats when restarting, but an early restart
>> >    before open_tun results in a segfault in dco_get_peer_stats().
>> >    To reproduce, trigger a TLS handshake error due to lack of common
>> >    protocols, for example.
>> >
>> >    Fix by checking  that tuntap is defined before dereferencing it.
>> >
>> > Signed-off-by: Selva Nair <selva.nair@gmail.com>
>>
>> Nice catch! Thanks a lot!
>>
>> > ---
>> > I'm not entirely sure this is the right place to fix this.
>> > Or is it the caller at fault exercising  dco_get_peer_stats()
>> > when tuntap is not set?
>>
>> Indeed that was my assumption.
>>
>> Does somebody have the stacktrace?
>> I wanted to check where this call is coming from exactly.
>>
>
> I didn't save the stacktrace.. The call comes through
>
>  persist_client_stats(c); (openvpn.c ~ line 100)
>
> which gets called on exiting the main eventloop for restart. Without it we
> used to lose
> the statistics on restart.
>
> Imho it'd be reasonable if the caller could check if we have a device
>> before invoking any DCO API.
>>
>
> Sure, I would think that the caller of dco_get_peer_stats() should just
> pass the dco handle to it
> and get the stats back. It's debatable who should check whether
> the dco handle is valid or not.
>
> But this may not be the time and place for a refactor. As dco is young, I
> think it can mature in 2.7..
>
> Selva
>
>
>
>>
>> Cheers,
>>
>> >
>> >   src/openvpn/dco_linux.c | 5 +++++
>> >   1 file changed, 5 insertions(+)
>> >
>> > diff --git a/src/openvpn/dco_linux.c b/src/openvpn/dco_linux.c
>> > index 317f9dc0..41540c0f 100644
>> > --- a/src/openvpn/dco_linux.c
>> > +++ b/src/openvpn/dco_linux.c
>> > @@ -975,6 +975,11 @@ dco_get_peer_stats(struct context *c)
>> >       uint32_t peer_id = c->c2.tls_multi->dco_peer_id;
>> >       msg(D_DCO_DEBUG, "%s: peer-id %d", __func__, peer_id);
>> >
>> > +    if (!c->c1.tuntap)
>> > +    {
>> > +        return 0;
>> > +    }
>> > +
>> >       dco_context_t *dco = &c->c1.tuntap->dco;
>> >       struct nl_msg *nl_msg = ovpn_dco_nlmsg_create(dco,
>> OVPN_CMD_GET_PEER);
>> >       struct nlattr *attr = nla_nest_start(nl_msg, OVPN_ATTR_GET_PEER);
>>
>> --
>> Antonio Quartulli
>>
>
Antonio Quartulli March 27, 2023, 8:51 p.m. UTC | #5
Hi,

On 27/03/2023 22:42, Selva Nair wrote:
> But this may not be the time and place for a refactor. As dco is young, 
> I think it can mature in 2.7..
I agree on all fronts.
Let's get this patch in with a minimal fix that deals with what we have.

We can make this better in master.

Acked-by: Antonio Quartulli <a@unstable.cc>
Gert Doering March 28, 2023, 7:25 a.m. UTC | #6
Acked-by: Antonio Quartulli <a@unstable.cc>

Thanks for the good find.  Since I could reproduce the crash yesterday
(and I do need management for it) I can verify that it does no longer
crash with the patch.

(For whatever reason, --tls-version-max 1.0 did not suffice to trigger
the error this morning, but --tls-version-min 1.3 + management + bytecount
still did)

Your patch has been applied to the master branch.

commit 10c3f25a26bce480f80624c5ef4cb6774a31c305 (master)
commit d01b9d751d6a5b4dc3737b339ec1ca5c23705243 (release/2.6)
Author: Selva Nair
Date:   Mon Mar 27 13:12:36 2023 -0400

     Bug-fix: segfault in dco_get_peer_stats()

     Signed-off-by: Selva Nair <selva.nair@gmail.com>
     Acked-by: Antonio Quartulli <a@unstable.cc>
     Message-Id: <20230327171236.51771-1-selva.nair@gmail.com>
     URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg26530.html
     Signed-off-by: Gert Doering <gert@greenie.muc.de>


--
kind regards,

Gert Doering
Selva Nair March 28, 2023, 1:34 p.m. UTC | #7
On Tue, Mar 28, 2023 at 3:25 AM Gert Doering <gert@greenie.muc.de> wrote:

> Acked-by: Antonio Quartulli <a@unstable.cc>
>
> Thanks for the good find.  Since I could reproduce the crash yesterday
> (and I do need management for it) I can verify that it does no longer
> crash with the patch.
>
> (For whatever reason, --tls-version-max 1.0 did not suffice to trigger
> the error this morning, but --tls-version-min 1.3 + management + bytecount
> still did)
>

Since then I've found this can be triggered quite simply by adding
"--status /dev/stdout" or some file to the client options.

Which means this should also fix #301

Selva

Patch

diff --git a/src/openvpn/dco_linux.c b/src/openvpn/dco_linux.c
index 317f9dc0..41540c0f 100644
--- a/src/openvpn/dco_linux.c
+++ b/src/openvpn/dco_linux.c
@@ -975,6 +975,11 @@  dco_get_peer_stats(struct context *c)
     uint32_t peer_id = c->c2.tls_multi->dco_peer_id;
     msg(D_DCO_DEBUG, "%s: peer-id %d", __func__, peer_id);
 
+    if (!c->c1.tuntap)
+    {
+        return 0;
+    }
+
     dco_context_t *dco = &c->c1.tuntap->dco;
     struct nl_msg *nl_msg = ovpn_dco_nlmsg_create(dco, OVPN_CMD_GET_PEER);
     struct nlattr *attr = nla_nest_start(nl_msg, OVPN_ATTR_GET_PEER);