Message ID | 20221224194253.3202231-5-arne@rfc2549.org |
---|---|
State | Superseded |
Headers | show |
Series | Various patches to improve DCO behaviour | expand |
Hi, On Sat, Dec 24, 2022 at 08:42:48PM +0100, Arne Schwabe wrote: > with dco sometimes we end up promoting a timeout event to write > event or read event. For the residual read, this problem is probably > not solvable without changing the kernel DCO API > > Signed-off-by: Arne Schwabe <arne@rfc2549.org> This one breaks TCP on a DCO-enabled server for me, though I can't see any obvious reason why. Basic p2mp tcp server, nothing fancy. With "3/7" (1413b38d0eacafb6, as in master) it works fine. With "4/7" (just this patch, on top of master) the "TCP gremlin test" looks good - that's what I started last night, and no crashes/BUGs - but an actual TCP client fails... Client log: 2022-12-25 09:41:32 VERIFY OK: depth=0, C=US, ST=California, L=Pleasanton, O=OpenVPN community project, CN=server, emailAddress=samuli@openvpn.net 2022-12-25 09:41:32 WARNING: 'link-mtu' is used inconsistently, local='link-mtu 1547', remote='link-mtu 1544' 2022-12-25 09:41:32 Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384, peer certificate: 2048 bit RSA, signature: RSA-SHA1 2022-12-25 09:41:32 [server] Peer Connection Initiated with [AF_INET]195.30.8.84:51194 2022-12-25 09:41:33 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1) 2022-12-25 09:41:39 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1) 2022-12-25 09:41:43 TLS Error: local/remote TLS keys are out of sync: [AF_INET]195.30.8.84:51194 (received key id: 0, known key ids: [key#0 state=S_ACTIVE auth=KS_AUTH_TRUE id=0 sid=87065d72 c3889e05] [key#1 state=S_UNDEF auth=KS_AUTH_FALSE id=0 sid=00000000 00000000] [key#2 state=S_UNDEF auth=KS_AUTH_FALSE id=0 sid=00000000 00000000]) 2022-12-25 09:41:44 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1) 2022-12-25 09:41:50 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1) 2022-12-25 09:41:53 TLS Error: local/remote TLS keys are out of sync: [AF_INET]195.30.8.84:51194 (received key id: 0, known key ids: [key#0 state=S_ACTIVE auth=KS_AUTH_TRUE id=0 sid=87065d72 c3889e05] [key#1 state=S_UNDEF auth=KS_AUTH_FALSE id=0 sid=00000000 00000000] [key#2 state=S_UNDEF auth=KS_AUTH_FALSE id=0 sid=00000000 00000000]) 2022-12-25 09:41:55 SENT CONTROL [server]: 'PUSH_REQUEST' (status=1) server log 2022-12-25 09:41:32 us=809562 TCP connection established with [AF_INET6]::ffff:194.97.140.5:28517 2022-12-25 09:41:32 us=809600 TCPv6_SERVER link local: (not bound) 2022-12-25 09:41:32 us=809615 TCPv6_SERVER link remote: [AF_INET6]::ffff:194.97.140.5:28517 2022-12-25 09:41:32 us=809775 194.97.140.5:28517 TLS: Initial packet from [AF_INET6]::ffff:194.97.140.5:28517, sid=130a42ff 79d3e7e6 2022-12-25 09:41:32 us=822866 194.97.140.5:28517 VERIFY OK: depth=1, C=US, ST=California, L=Pleasanton, O=OpenVPN community project, CN=OpenVPN community project CA, emailAddress=samuli@openvpn.net 2022-12-25 09:41:32 us=823211 194.97.140.5:28517 VERIFY OK: depth=0, C=DE, ST=Bavaria, L=Munich, O=OpenVPN community project, OU=DCO F14 client, CN=freebsd-14-amd64, emailAddress=gert@greenie.net ... 2022-12-25 09:41:32 us=824045 194.97.140.5:28517 TLS: move_session: dest=TM_ACTIVE src=TM_INITIAL reinit_src=1 2022-12-25 09:41:32 us=824186 194.97.140.5:28517 TLS: tls_multi_process: initial untrusted session promoted to trusted 2022-12-25 09:41:32 us=863377 194.97.140.5:28517 Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384, peer certificate: 2048 bit RSA, signature: RSA-SHA256 2022-12-25 09:41:32 us=863449 194.97.140.5:28517 [freebsd-14-amd64] Peer Connection Initiated with [AF_INET6]::ffff:194.97.140.5:28517 ... 2022-12-25 09:41:32 us=863895 freebsd-14-amd64/194.97.140.5:28517 Data Channel: using negotiated cipher 'AES-256-GCM' 2022-12-25 09:41:32 us=863935 freebsd-14-amd64/194.97.140.5:28517 Data Channel MTU parms [ mss_fix:1366 max_frag:0 tun_mtu:1500 tun_max_mtu:1600 headroom:136 payload:1768 tailroom:562 ET:0 ] 2022-12-25 09:41:32 us=864176 freebsd-14-amd64/194.97.140.5:28517 SENT CONTROL [freebsd-14-amd64]: 'PUSH_REPLY,route 10.220.0.0 255.255.0.0,route-ipv6 fd00:abcd:220::/48,tun-ipv6,route-gateway 10.220.1.1,topology subnet,ping 10,ping-restart 30,compress stub-v2,ifconfig-ipv6 fd00:abcd:220:1::1002/64 fd00:abcd:220:1::1,ifconfig 10.220.1.4 255.255.255.0,peer-id 0,cipher AES-256-GCM,key-derivation tls-ekm' (status=1) and then nothing more, until the client gives up and reconnects. With --verb 6, I see this dance... 2022-12-25 09:45:51 us=194027 freebsd-14-amd64/194.97.140.5:47015 Data Channel: using negotiated cipher 'AES-256-GCM' 2022-12-25 09:45:51 us=194058 freebsd-14-amd64/194.97.140.5:47015 Data Channel MTU parms [ mss_fix:1366 max_frag:0 tun_mtu:1500 tun_max_mtu:1600 headroom:136 payload:1768 tailroom:562 ET:0 ] 2022-12-25 09:45:51 us=194134 freebsd-14-amd64/194.97.140.5:47015 dco_install_key: peer_id=0 keyid=0, currently 0 keys installed 2022-12-25 09:45:51 us=194167 freebsd-14-amd64/194.97.140.5:47015 dco_new_key: slot 0, key-id 0, peer-id 0, cipher AES-256-GCM 2022-12-25 09:45:51 us=194317 freebsd-14-amd64/194.97.140.5:47015 SENT CONTROL [freebsd-14-amd64]: 'PUSH_REPLY,route 10.220.0.0 255.255.0.0,route-ipv6 fd00:abcd:220::/48,tun-ipv6,route-gateway 10.220.1.1,topology subnet,ping 10,ping-restart 30,compress stub-v2,ifconfig-ipv6 fd00:abcd:220:1::1002/64 fd00:abcd:220:1::1,ifconfig 10.220.1.4 255.255.255.0,peer-id 0,cipher AES-256-GCM,key-derivation tls-ekm' (status=1) 2022-12-25 09:45:51 us=194363 freebsd-14-amd64/194.97.140.5:47015 dco_update_keys: peer_id=0 2022-12-25 09:45:52 us=265607 freebsd-14-amd64/194.97.140.5:47015 dco_update_keys: peer_id=0 2022-12-25 09:45:52 us=455630 dco_do_read 2022-12-25 09:45:52 us=455674 ovpn-dco: received OVPN_PACKET_ATTR_PACKET, ifindex: 31370 peer-id: 0, len 49 2022-12-25 09:45:53 us=336677 freebsd-14-amd64/194.97.140.5:47015 dco_update_keys: peer_id=0 2022-12-25 09:45:54 us=408023 freebsd-14-amd64/194.97.140.5:47015 dco_update_keys: peer_id=0 2022-12-25 09:45:54 us=740566 dco_do_read 2022-12-25 09:45:54 us=740653 ovpn-dco: received OVPN_PACKET_ATTR_PACKET, ifindex: 31370 peer-id: 0, len 49 2022-12-25 09:45:55 us=478633 freebsd-14-amd64/194.97.140.5:47015 dco_update_keys: peer_id=0 2022-12-25 09:45:56 us=550011 freebsd-14-amd64/194.97.140.5:47015 dco_update_keys: peer_id=0 2022-12-25 09:45:57 us=620239 freebsd-14-amd64/194.97.140.5:47015 dco_update_keys: peer_id=0 2022-12-25 09:45:58 us=85776 dco_do_read 2022-12-25 09:45:58 us=85865 ovpn-dco: received OVPN_PACKET_ATTR_PACKET, ifindex: 31370 peer-id: 0, len 49 2022-12-25 09:45:58 us=86702 dco_do_read 2022-12-25 09:45:58 us=86754 ovpn-dco: received OVPN_PACKET_ATTR_PACKET, ifindex: 31370 peer-id: 0, len 49 ... but no incoming PUSH_REQUEST... This is fully reproducible, no special timing or anything required, just "TCP p2mp server with DCO, tcp client". With DCO disabled (same server config, --disable-dco) initialization progresses normally, and everything works.. gert
diff --git a/src/openvpn/forward.c b/src/openvpn/forward.c index 64c8ee6a0..17a14f0bd 100644 --- a/src/openvpn/forward.c +++ b/src/openvpn/forward.c @@ -1674,30 +1674,6 @@ process_ip_header(struct context *c, unsigned int flags, struct buffer *buf) } } -/* - * Linux DCO implementations pass the socket to the kernel and - * disallow usage of it from userland for TCP, so (control) packets - * sent and received by OpenVPN need to go through the DCO interface. - * - * Windows DCO needs control packets to be sent via the normal - * standard Overlapped I/O. - * - * FreeBSD DCO allows control packets to pass through the socket in both - * directions. - * - * Hide that complexity (...especially if more platforms show up - * in the future...) in a small inline function. - */ -static inline bool -should_use_dco_socket(struct link_socket *ls) -{ -#if defined(TARGET_LINUX) - return ls->dco_installed && proto_is_tcp(ls->info.proto); -#else - return false; -#endif -} - /* * Input: c->c2.to_link */ diff --git a/src/openvpn/forward.h b/src/openvpn/forward.h index bd2d96010..e50f235da 100644 --- a/src/openvpn/forward.h +++ b/src/openvpn/forward.h @@ -424,4 +424,34 @@ connection_established(struct context *c) } } + +/** + * @param ls the link_socket the decision should be made for + * @return if we should use the dco kernel api or normal socket APIs for + * write/send + * + * + * Linux DCO implementations pass the socket to the kernel and + * disallow usage of it from userland for TCP, so (control) packets + * sent and received by OpenVPN need to go through the DCO interface. + * + * Windows DCO needs control packets to be sent via the normal + * standard Overlapped I/O. + * + * FreeBSD DCO allows control packets to pass through the socket in both + * directions. + * + * Hide that complexity (...especially if more platforms show up + * in the future...) in a small inline function. + */ +static inline bool +should_use_dco_socket(struct link_socket *ls) +{ +#if defined(TARGET_LINUX) + return ls->dco_installed && proto_is_tcp(ls->info.proto); +#else + return false; +#endif +} + #endif /* FORWARD_H */ diff --git a/src/openvpn/mtcp.c b/src/openvpn/mtcp.c index ac06ddc64..263f4d994 100644 --- a/src/openvpn/mtcp.c +++ b/src/openvpn/mtcp.c @@ -407,7 +407,7 @@ multi_tcp_wait_lite(struct multi_context *m, struct multi_instance *mi, const in /* If we got a socket that has been handed over to the kernel * we must not call the normal socket function to figure out * if it is readable or writable */ - /* Assert that we only have the DCO exptected flags */ + /* Assert that we only have the DCO expected flags */ ASSERT(action & (TA_SOCKET_READ | TA_SOCKET_WRITE)); /* We are always ready! */ @@ -567,7 +567,7 @@ multi_tcp_post(struct multi_context *m, struct multi_instance *mi, const int act { flags |= MTP_TUN_OUT; } - if (LINK_OUT(c)) + if (LINK_OUT(c) && !should_use_dco_socket(c->c2.link_socket)) { flags |= MTP_LINK_OUT; } @@ -586,6 +586,16 @@ multi_tcp_post(struct multi_context *m, struct multi_instance *mi, const int act case MTP_NONE: if (mi && socket_read_residual(c->c2.link_socket)) { + if (should_use_dco_socket(c->c2.link_socket)) + { + struct gc_arena gc = gc_new(); + msg(M_INFO, "ovpn-dco installed socket with residual read " + "len=%d, mi=%s. This connection will probably" + " break.", BLEN(&c->c2.link_socket->stream_buf.residual), + multi_instance_string(mi, false, &gc)); + gc_free(&gc); + break; + } newaction = TA_SOCKET_READ_RESIDUAL; } else
with dco sometimes we end up promoting a timeout event to write event or read event. For the residual read, this problem is probably not solvable without changing the kernel DCO API Signed-off-by: Arne Schwabe <arne@rfc2549.org> --- src/openvpn/forward.c | 24 ------------------------ src/openvpn/forward.h | 30 ++++++++++++++++++++++++++++++ src/openvpn/mtcp.c | 14 ++++++++++++-- 3 files changed, 42 insertions(+), 26 deletions(-)