Message ID | 20221201110128.271064-1-arne@rfc2549.org |
---|---|
State | Accepted |
Headers | show |
Series | [Openvpn-devel] Allow reconnecting in p2p mode work under FreeBSD | expand |
Acked-by: Gert Doering <gert@greenie.muc.de> Indeed, that fixes the p2p dco reconnect problem we had with FreeBSD, and with "verb 6" debugging one can nicely see what happens: 14:28:55 P2P mode NCP negotiation result: TLS_export=1, DATA_v2=1, peer-id 10167064, cipher=AES-256-GCM reconnect, then 14:29:17 P2P mode NCP negotiation result: TLS_export=1, DATA_v2=1, peer-id 3502029, cipher=AES-256-GCM 14:29:17 dco_del_key: peer-id 10167064, slot 0 14:29:18 dco_del_peer: peer-id 10167064 14:29:18 dco_new_peer: peer-id 3502029, fd 7 14:29:18 process_incoming_dco: received message for mismatching peer-id 10167064, expected 3502029 (and we ignore this, not killing the new 3502029 peer) My own pokings in kernel space confirmed what I assumed - we just add peers, and they do not expire quickly. So after the first reconnect, without this patch, we have 2 peers in kernel with no vpn_ip address, so "lookup on nexthop" is not working, and that particular ovpn(4) interface is dead until ifdown/ifup or all the peers expire. I did experiment with a kernel patch that will remove all existing peers on install of a new p2p peer - and that worked, kernel side, but confused OpenVPN for the reasons we have a new "check the peer id!" check in this patch... so we need this patch anyway, obsoleting the need for a kernel patch... Tested on - FreeBSD 14 / CURRENT DCO, client and server - Ubuntu 20.04, Linux DCO, client and server - Gentoo, Linux with no DCO, client and server Your patch has been sho(u|o)ted into to the master branch. commit 0f7c5dde1bbd23353467ebd549ae955a6a03746f Author: Arne Schwabe Date: Thu Dec 1 12:01:28 2022 +0100 Allow reconnecting in p2p mode work under FreeBSD Signed-off-by: Arne Schwabe <arne@rfc2549.org> Acked-by: Gert Doering <gert@greenie.muc.de> Message-Id: <20221201110128.271064-1-arne@rfc2549.org> URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg25602.html Signed-off-by: Gert Doering <gert@greenie.muc.de> -- kind regards, Gert Doering
Hi, bad news... On Thu, Dec 01, 2022 at 12:01:28PM +0100, Arne Schwabe wrote: > diff --git a/src/openvpn/forward.c b/src/openvpn/forward.c > index 1b418b1bc..958bf0b56 100644 > --- a/src/openvpn/forward.c > +++ b/src/openvpn/forward.c > @@ -1174,9 +1174,22 @@ process_incoming_dco(struct context *c) > > dco_do_read(dco); > > + /* FreeBSD currently sends us removal notifcation with the old peer-id in > + * p2p mode with the ping timeout reason, so ignore that one to not shout > + * ourselves in the foot and removing the just established session */ > + if (dco->dco_message_peer_id != c->c2.tls_multi->dco_peer_id) > + { > + msg(D_DCO_DEBUG, "%s: received message for mismatching peer-id %d, " > + "expected %d", __func__, dco->dco_message_peer_id, > + c->c2.tls_multi->dco_peer_id); > + return; > + } This code works perfectly well for p2p TLS environments, but makes OpenVPN explode in a scenario that my tests did not trigger properly - you have a p2p tun instance with --secret & DCO - you have a second p2p DCO instance on the same machine, with TLS - peers connect and disconnect to the p2p TLS instance - the *first* instance receives a DCO message, and explodes... 2022-12-07 08:34:50 us=63605 setsockopt(IPV6_V6ONLY=0) 2022-12-07 08:34:50 us=63826 UDPv6 link local (bound): [AF_INET6][undef]:51204 2022-12-07 08:34:50 us=64015 UDPv6 link remote: [AF_UNSPEC] 2022-12-07 08:35:00 us=977186 Attempting to send data packet while data channel offload is in use. Dropping packet Program received signal SIGSEGV, Segmentation fault. 0x00005555555772b9 in process_incoming_dco (c=0x7fffffffd460) at forward.c:1180 1180 if (dco->dco_message_peer_id != c->c2.tls_multi->dco_peer_id) (gdb) where #0 0x00005555555772b9 in process_incoming_dco (c=0x7fffffffd460) at forward.c:1180 #1 process_io (c=0x7fffffffd460) at forward.c:2283 #2 0x000055555559db4c in tunnel_point_to_point (c=0x7fffffffd460) at openvpn.c:94 #3 openvpn_main (argc=2, argv=0x7fffffffe608) at openvpn.c:311 #4 0x00007ffff7cbf083 in __libc_start_main (main=0x55555555e500 <main>, argc=2, argv=0x7fffffffe608, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe5f8) at ../csu/libc-start.c:308 #5 0x000055555555e53e in _start () at socket.h:1206 (gdb) print *dco $2 = {nl_sock = 0x555555642bd0, nl_cb = 0x55555563b310, status = 0, ifmode = __OVPN_MODE_FIRST, ovpn_dco_id = 37, ovpn_dco_mcast_id = 5, ifindex = 27659, dco_packet_in = {capacity = 65536, offset = 0, len = 0, data = 0x555555643370 ""}, dco_message_type = 0, dco_message_peer_id = -1, dco_del_peer_reason = 0} (gdb) print c->c2.tls_multi $3 = (struct tls_multi *) 0x0 ^^^^ dat. Bäm. My tests did not alert me of this, as the sequence of things is - build new openvpn binary - stop all instances - start all instances with new binary - run client side tests against all these instances - be happy if client side tests succeed the client side tests tested p2p udp (8) before p2p udp TLS (11), so never noticed that after running (11), (8) would not work any longer... One item of obvious confusion - yes, this is --secret and not TLS, and yes, we use DCO... gert
Hi, On Wed, Dec 07, 2022 at 08:56:04AM +0100, Gert Doering wrote: > the client side tests tested p2p udp (8) before p2p udp TLS (11), so > never noticed that after running (11), (8) would not work any longer... More specifically, because I never tested p2p tun (8) on the DCO-enabled servers ("why would I test this, it just slows down things, --secret won't do DCO anyway"). But it seems, "p2p --secret" is broken hard in current code - it enables DCO, and fails when the first client connects... 2022-12-07 09:01:32 us=920037 net_iface_new: add tun3 type ovpn-dco 2022-12-07 09:01:32 us=921045 DCO device tun3 opened ... 2022-12-07 09:02:02 us=527605 Attempting to send data packet while data channel offload is in use. Dropping packet ... start client, send packet: 2022-12-07 09:02:38 us=285472 Peer Connection Initiated with [AF_INET6]::ffff:194.97.140.5:18999 Program received signal SIGSEGV, Segmentation fault. 0x000055555556beaf in addr_set_dco_installed (c=<optimized out>) at dco.c:447 447 ks->remote_addr.dco_installed = true; (gdb) print ks $1 = (struct key_state *) 0x5f8 ... so whatever "ks" is in here, it's not a valid pointer... Starting the p2p --secret instance with --disable-dco makes it behave normally. Since, I think, the claim is "DCO needs TLS", we just need a patch that disables DCO for p2p --secret mode... gert
diff --git a/src/openvpn/dco.c b/src/openvpn/dco.c index 03ac8438a..cbd834194 100644 --- a/src/openvpn/dco.c +++ b/src/openvpn/dco.c @@ -459,6 +459,15 @@ dco_p2p_add_new_peer(struct context *c) struct sockaddr *remoteaddr = &ls->info.lsa->actual.dest.addr.sa; struct tls_multi *multi = c->c2.tls_multi; +#ifdef TARGET_FREEBSD + /* In Linux in P2P mode the kernel automatically removes an existing peer + * when adding a new peer. FreeBSD needs to explicitly be told to do that */ + if (c->c2.tls_multi->dco_peer_id != -1) + { + dco_del_peer(&c->c1.tuntap->dco, c->c2.tls_multi->dco_peer_id); + c->c2.tls_multi->dco_peer_id = -1; + } +#endif int ret = dco_new_peer(&c->c1.tuntap->dco, multi->peer_id, c->c2.link_socket->sd, NULL, remoteaddr, NULL, NULL); if (ret < 0) diff --git a/src/openvpn/dco_freebsd.c b/src/openvpn/dco_freebsd.c index 4e03f52e9..a52ac8c1b 100644 --- a/src/openvpn/dco_freebsd.c +++ b/src/openvpn/dco_freebsd.c @@ -312,6 +312,8 @@ dco_del_peer(dco_context_t *dco, unsigned int peerid) nvlist_t *nvl; int ret; + msg(D_DCO_DEBUG, "%s: peer-id %d", __func__, peerid); + nvl = nvlist_create(0); nvlist_add_number(nvl, "peerid", peerid); diff --git a/src/openvpn/forward.c b/src/openvpn/forward.c index 1b418b1bc..958bf0b56 100644 --- a/src/openvpn/forward.c +++ b/src/openvpn/forward.c @@ -1174,9 +1174,22 @@ process_incoming_dco(struct context *c) dco_do_read(dco); + /* FreeBSD currently sends us removal notifcation with the old peer-id in + * p2p mode with the ping timeout reason, so ignore that one to not shout + * ourselves in the foot and removing the just established session */ + if (dco->dco_message_peer_id != c->c2.tls_multi->dco_peer_id) + { + msg(D_DCO_DEBUG, "%s: received message for mismatching peer-id %d, " + "expected %d", __func__, dco->dco_message_peer_id, + c->c2.tls_multi->dco_peer_id); + return; + } + if ((dco->dco_message_type == OVPN_CMD_DEL_PEER) && (dco->dco_del_peer_reason == OVPN_DEL_PEER_REASON_EXPIRED)) { + msg(D_DCO_DEBUG, "%s: received peer expired notification of for peer-id " + "%d", __func__, dco->dco_message_peer_id); trigger_ping_timeout_signal(c); return; }
This commit consists of two parts. - explicitly removing an existing peer in p2p mode - ignoring the ping timeout notification that is generated by the first part Signed-off-by: Arne Schwabe <arne@rfc2549.org> --- src/openvpn/dco.c | 9 +++++++++ src/openvpn/dco_freebsd.c | 2 ++ src/openvpn/forward.c | 13 +++++++++++++ 3 files changed, 24 insertions(+)