[Openvpn-devel] Allow reconnecting in p2p mode work under FreeBSD

Message ID 20221201110128.271064-1-arne@rfc2549.org
State Accepted
Headers show
Series [Openvpn-devel] Allow reconnecting in p2p mode work under FreeBSD | expand

Commit Message

Arne Schwabe Dec. 1, 2022, 11:01 a.m. UTC
This commit consists of two parts.
 - explicitly removing an existing peer in p2p mode
 - ignoring the ping timeout notification that is generated by the first part

Signed-off-by: Arne Schwabe <arne@rfc2549.org>
---
 src/openvpn/dco.c         |  9 +++++++++
 src/openvpn/dco_freebsd.c |  2 ++
 src/openvpn/forward.c     | 13 +++++++++++++
 3 files changed, 24 insertions(+)

Comments

Gert Doering Dec. 1, 2022, 3:18 p.m. UTC | #1
Acked-by: Gert Doering <gert@greenie.muc.de>

Indeed, that fixes the p2p dco reconnect problem we had with FreeBSD,
and with "verb 6" debugging one can nicely see what happens:

14:28:55 P2P mode NCP negotiation result: TLS_export=1, DATA_v2=1, peer-id 10167064, cipher=AES-256-GCM

reconnect, then

14:29:17 P2P mode NCP negotiation result: TLS_export=1, DATA_v2=1, peer-id 3502029, cipher=AES-256-GCM
14:29:17 dco_del_key: peer-id 10167064, slot 0
14:29:18 dco_del_peer: peer-id 10167064
14:29:18 dco_new_peer: peer-id 3502029, fd 7
14:29:18 process_incoming_dco: received message for mismatching peer-id 10167064, expected 3502029

(and we ignore this, not killing the new 3502029 peer)


My own pokings in kernel space confirmed what I assumed - we just add
peers, and they do not expire quickly.  So after the first reconnect,
without this patch, we have 2 peers in kernel with no vpn_ip address, 
so "lookup on nexthop" is not working, and that particular ovpn(4)
interface is dead until ifdown/ifup or all the peers expire.  I did
experiment with a kernel patch that will remove all existing peers on
install of a new p2p peer - and that worked, kernel side, but confused
OpenVPN for the reasons we have a new "check the peer id!" check in this
patch... so we need this patch anyway, obsoleting the need for a
kernel patch...

Tested on 
  - FreeBSD 14 / CURRENT DCO, client and server
  - Ubuntu 20.04, Linux DCO, client and server
  - Gentoo, Linux with no DCO, client and server

Your patch has been sho(u|o)ted into to the master branch.

commit 0f7c5dde1bbd23353467ebd549ae955a6a03746f
Author: Arne Schwabe
Date:   Thu Dec 1 12:01:28 2022 +0100

     Allow reconnecting in p2p mode work under FreeBSD

     Signed-off-by: Arne Schwabe <arne@rfc2549.org>
     Acked-by: Gert Doering <gert@greenie.muc.de>
     Message-Id: <20221201110128.271064-1-arne@rfc2549.org>
     URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg25602.html
     Signed-off-by: Gert Doering <gert@greenie.muc.de>

--
kind regards,

Gert Doering
Gert Doering Dec. 7, 2022, 7:56 a.m. UTC | #2
Hi,

bad news...

On Thu, Dec 01, 2022 at 12:01:28PM +0100, Arne Schwabe wrote:
> diff --git a/src/openvpn/forward.c b/src/openvpn/forward.c
> index 1b418b1bc..958bf0b56 100644
> --- a/src/openvpn/forward.c
> +++ b/src/openvpn/forward.c
> @@ -1174,9 +1174,22 @@ process_incoming_dco(struct context *c)
>  
>      dco_do_read(dco);
>  
> +    /* FreeBSD currently sends us removal notifcation with the old peer-id in
> +     * p2p mode with the ping timeout reason, so ignore that one to not shout
> +     * ourselves in the foot and removing the just established session */
> +    if (dco->dco_message_peer_id != c->c2.tls_multi->dco_peer_id)
> +    {
> +        msg(D_DCO_DEBUG, "%s: received message for mismatching peer-id %d, "
> +            "expected %d", __func__, dco->dco_message_peer_id,
> +            c->c2.tls_multi->dco_peer_id);
> +        return;
> +    }

This code works perfectly well for p2p TLS environments, but makes OpenVPN
explode in a scenario that my tests did not trigger properly

 - you have a p2p tun instance with --secret & DCO
 - you have a second p2p DCO instance on the same machine, with TLS
    - peers connect and disconnect to the p2p TLS instance

 - the *first* instance receives a DCO message, and explodes...

2022-12-07 08:34:50 us=63605 setsockopt(IPV6_V6ONLY=0)
2022-12-07 08:34:50 us=63826 UDPv6 link local (bound): [AF_INET6][undef]:51204
2022-12-07 08:34:50 us=64015 UDPv6 link remote: [AF_UNSPEC]
2022-12-07 08:35:00 us=977186 Attempting to send data packet while data channel offload is in use. Dropping packet

Program received signal SIGSEGV, Segmentation fault.
0x00005555555772b9 in process_incoming_dco (c=0x7fffffffd460) at forward.c:1180
1180        if (dco->dco_message_peer_id != c->c2.tls_multi->dco_peer_id)
(gdb) where
#0  0x00005555555772b9 in process_incoming_dco (c=0x7fffffffd460)
    at forward.c:1180
#1  process_io (c=0x7fffffffd460) at forward.c:2283
#2  0x000055555559db4c in tunnel_point_to_point (c=0x7fffffffd460)
    at openvpn.c:94
#3  openvpn_main (argc=2, argv=0x7fffffffe608) at openvpn.c:311
#4  0x00007ffff7cbf083 in __libc_start_main (main=0x55555555e500 <main>, 
    argc=2, argv=0x7fffffffe608, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffe5f8)
    at ../csu/libc-start.c:308
#5  0x000055555555e53e in _start () at socket.h:1206
(gdb) print *dco
$2 = {nl_sock = 0x555555642bd0, nl_cb = 0x55555563b310, status = 0, 
  ifmode = __OVPN_MODE_FIRST, ovpn_dco_id = 37, ovpn_dco_mcast_id = 5, 
  ifindex = 27659, dco_packet_in = {capacity = 65536, offset = 0, len = 0, 
    data = 0x555555643370 ""}, dco_message_type = 0, dco_message_peer_id = -1, 
  dco_del_peer_reason = 0}

(gdb) print c->c2.tls_multi
$3 = (struct tls_multi *) 0x0

^^^^ dat.

Bäm.


My tests did not alert me of this, as the sequence of things is

 - build new openvpn binary
 - stop all instances
 - start all instances with new binary
 - run client side tests against all these instances
 - be happy if client side tests succeed

the client side tests tested p2p udp (8) before p2p udp TLS (11), so
never noticed that after running (11), (8) would not work any longer...


One item of obvious confusion - yes, this is --secret and not TLS, and
yes, we use DCO...

gert
Gert Doering Dec. 7, 2022, 8:07 a.m. UTC | #3
Hi,

On Wed, Dec 07, 2022 at 08:56:04AM +0100, Gert Doering wrote:
> the client side tests tested p2p udp (8) before p2p udp TLS (11), so
> never noticed that after running (11), (8) would not work any longer...

More specifically, because I never tested p2p tun (8) on the DCO-enabled
servers ("why would I test this, it just slows down things, --secret won't
do DCO anyway").

But it seems, "p2p --secret" is broken hard in current code - it enables
DCO, and fails when the first client connects...

2022-12-07 09:01:32 us=920037 net_iface_new: add tun3 type ovpn-dco
2022-12-07 09:01:32 us=921045 DCO device tun3 opened
...
2022-12-07 09:02:02 us=527605 Attempting to send data packet while data channel offload is in use. Dropping packet
...

start client, send packet:

2022-12-07 09:02:38 us=285472 Peer Connection Initiated with [AF_INET6]::ffff:194.97.140.5:18999

Program received signal SIGSEGV, Segmentation fault.
0x000055555556beaf in addr_set_dco_installed (c=<optimized out>) at dco.c:447
447                 ks->remote_addr.dco_installed = true;

(gdb) print ks
$1 = (struct key_state *) 0x5f8

... so whatever "ks" is in here, it's not a valid pointer...


Starting the p2p --secret instance with --disable-dco makes it behave
normally.

Since, I think, the claim is "DCO needs TLS", we just need a patch that
disables DCO for p2p --secret mode...

gert

Patch

diff --git a/src/openvpn/dco.c b/src/openvpn/dco.c
index 03ac8438a..cbd834194 100644
--- a/src/openvpn/dco.c
+++ b/src/openvpn/dco.c
@@ -459,6 +459,15 @@  dco_p2p_add_new_peer(struct context *c)
 
     struct sockaddr *remoteaddr = &ls->info.lsa->actual.dest.addr.sa;
     struct tls_multi *multi = c->c2.tls_multi;
+#ifdef TARGET_FREEBSD
+    /* In Linux in P2P mode the kernel automatically removes an existing peer
+     * when adding a new peer. FreeBSD needs to explicitly be told to do that */
+    if (c->c2.tls_multi->dco_peer_id != -1)
+    {
+        dco_del_peer(&c->c1.tuntap->dco, c->c2.tls_multi->dco_peer_id);
+        c->c2.tls_multi->dco_peer_id = -1;
+    }
+#endif
     int ret = dco_new_peer(&c->c1.tuntap->dco, multi->peer_id,
                            c->c2.link_socket->sd, NULL, remoteaddr, NULL, NULL);
     if (ret < 0)
diff --git a/src/openvpn/dco_freebsd.c b/src/openvpn/dco_freebsd.c
index 4e03f52e9..a52ac8c1b 100644
--- a/src/openvpn/dco_freebsd.c
+++ b/src/openvpn/dco_freebsd.c
@@ -312,6 +312,8 @@  dco_del_peer(dco_context_t *dco, unsigned int peerid)
     nvlist_t *nvl;
     int ret;
 
+    msg(D_DCO_DEBUG, "%s: peer-id %d", __func__, peerid);
+
     nvl = nvlist_create(0);
     nvlist_add_number(nvl, "peerid", peerid);
 
diff --git a/src/openvpn/forward.c b/src/openvpn/forward.c
index 1b418b1bc..958bf0b56 100644
--- a/src/openvpn/forward.c
+++ b/src/openvpn/forward.c
@@ -1174,9 +1174,22 @@  process_incoming_dco(struct context *c)
 
     dco_do_read(dco);
 
+    /* FreeBSD currently sends us removal notifcation with the old peer-id in
+     * p2p mode with the ping timeout reason, so ignore that one to not shout
+     * ourselves in the foot and removing the just established session */
+    if (dco->dco_message_peer_id != c->c2.tls_multi->dco_peer_id)
+    {
+        msg(D_DCO_DEBUG, "%s: received message for mismatching peer-id %d, "
+            "expected %d", __func__, dco->dco_message_peer_id,
+            c->c2.tls_multi->dco_peer_id);
+        return;
+    }
+
     if ((dco->dco_message_type == OVPN_CMD_DEL_PEER)
         && (dco->dco_del_peer_reason == OVPN_DEL_PEER_REASON_EXPIRED))
     {
+        msg(D_DCO_DEBUG, "%s: received peer expired notification of for peer-id "
+            "%d", __func__, dco->dco_message_peer_id);
         trigger_ping_timeout_signal(c);
         return;
     }