Message ID | 20200520183404.54822-1-code@jeremyevans.net |
---|---|
State | Accepted |
Headers | show |
Series | [Openvpn-devel] Switch assertion failure to returning false | expand |
Hi, On Wed, May 20, 2020 at 11:34:04AM -0700, Jeremy Evans wrote: > To give some background, we hit this assertion failure, with the > following log output: This should not happen, asserting out in "normal server use" is bad. (Neither should it ever reach that point without ks->authenticated being true) > Tue May 19 15:57:05 2020 username/73.135.141.11:1194 Assertion failed at /path/to/openvpn-2.4.7/src/openvpn/ssl.c:1944 (ks->authenticated) Is this with true 2.4.7, or is this something more recent? > @@ -1930,7 +1930,10 @@ tls_session_generate_data_channel_keys(struct tls_session *session) > &ks->session_id_remote : &session->session_id; > > - ASSERT(ks->authenticated); > + if (!ks->authenticated) { > + msg(D_TLS_ERRORS, "TLS Error: key_state not authenticated"); > + goto cleanup; > + } > > ks->crypto_options.flags = session->opt->crypto_flags; > if (!generate_key_expansion(&ks->crypto_options.key_ctx_bi, I'm not sure if that code is correct, though - it will erase key material (in cleanup) without actually having generated a session key. So "bad things might happen later". But I leave that to Steffan or Arne who understands these intricacies better than I do. (Just confirmed, the ASSERT() is still like this in master) Trying to understand the somewhat twisted maze how ks->authenticated is set to true/false leads me to more questions... - are plugins or auth scripts involved on your side? - can you reproduce this, or was this a once-in-a-lifetime crash? (reproduceable is better, of course :-) - as this could be a plugin or script authentication interfering with changes brought by NCP...) gert
On 05/20 09:33, Gert Doering wrote: > Hi, > > On Wed, May 20, 2020 at 11:34:04AM -0700, Jeremy Evans wrote: > > To give some background, we hit this assertion failure, with the > > following log output: > > This should not happen, asserting out in "normal server use" is bad. > > (Neither should it ever reach that point without ks->authenticated being > true) > > > Tue May 19 15:57:05 2020 username/73.135.141.11:1194 Assertion failed at /path/to/openvpn-2.4.7/src/openvpn/ssl.c:1944 (ks->authenticated) > > Is this with true 2.4.7, or is this something more recent? This failure was with 2.4.7, but the first time it happened we were running 2.4.4 (see below) > > @@ -1930,7 +1930,10 @@ tls_session_generate_data_channel_keys(struct tls_session *session) > > &ks->session_id_remote : &session->session_id; > > > > - ASSERT(ks->authenticated); > > + if (!ks->authenticated) { > > + msg(D_TLS_ERRORS, "TLS Error: key_state not authenticated"); > > + goto cleanup; > > + } > > > > ks->crypto_options.flags = session->opt->crypto_flags; > > if (!generate_key_expansion(&ks->crypto_options.key_ctx_bi, > > I'm not sure if that code is correct, though - it will erase key > material (in cleanup) without actually having generated a session > key. So "bad things might happen later". The same behavior would happen if generate_key_expansion fails a few lines below, so my assumption was it was safe to do so. However, that's just an assumption and not even an educated guess. > But I leave that to Steffan or Arne who understands these intricacies > better than I do. > > (Just confirmed, the ASSERT() is still like this in master) > > > Trying to understand the somewhat twisted maze how ks->authenticated > is set to true/false leads me to more questions... > > - are plugins or auth scripts involved on your side? > - can you reproduce this, or was this a once-in-a-lifetime crash? There is a program to implement multifactor authentication as mentioned in the config (`auth-user-pass-verify "ldap-auth/ldap-auth" via-env`) This is a statically compiled C program that exits 0 on successful multifactor authentication and 1 on unsuccessful authentication. > (reproduceable is better, of course :-) - as this could be a plugin > or script authentication interfering with changes brought by NCP...) It's reproducible in the sense that it has happened to us multiple times. This is actually the fourth time it has happened. Previous times were 2018-04-16 (2.4.4), 2019-09-25 (2.4.7), and 2020-05-13 (2.4.7). Bug #1270 says it also affects 2.4.8. I haven't tried 2.4.9 yet, but I plan to upgrade the server to use it next month. It's not reproducible in the sense that I know how to trigger it, though. Thanks, Jeremy
Hi, Thanks for the clear report, patch and follow-up. On 20-05-2020 23:31, Jeremy Evans wrote: > On 05/20 09:33, Gert Doering wrote: >> On Wed, May 20, 2020 at 11:34:04AM -0700, Jeremy Evans wrote: >>> To give some background, we hit this assertion failure, with the >>> following log output: >> >> This should not happen, asserting out in "normal server use" is bad. >> >> (Neither should it ever reach that point without ks->authenticated being >> true) Agreed. I think it makes sense to first fix the urgent part (ie, not kill the server), then figure out how de code ends up at this ASSERT. Jeremy, can you determine from your logs whether this always happenedwhen --auth-user-pass-verify returns zero or non-zero? Ie, should the connections that trigger this succeed or fail? >>> @@ -1930,7 +1930,10 @@ tls_session_generate_data_channel_keys(struct tls_session *session) >>> &ks->session_id_remote : &session->session_id; >>> >>> - ASSERT(ks->authenticated); >>> + if (!ks->authenticated) { >>> + msg(D_TLS_ERRORS, "TLS Error: key_state not authenticated"); >>> + goto cleanup; >>> + } >>> >>> ks->crypto_options.flags = session->opt->crypto_flags; >>> if (!generate_key_expansion(&ks->crypto_options.key_ctx_bi, >> >> I'm not sure if that code is correct, though - it will erase key >> material (in cleanup) without actually having generated a session >> key. So "bad things might happen later". > > The same behavior would happen if generate_key_expansion fails a few > lines below, so my assumption was it was safe to do so. However, that's > just an assumption and not even an educated guess. This change correctly turns "kill the server" into "reject the connection", which I agree is a good thing. Acked-by: Steffan Karger <steffan.karger@foxcrypto.com> -Steffan
On 05/27 03:08, Steffan Karger wrote: > Hi, > > Thanks for the clear report, patch and follow-up. > > On 20-05-2020 23:31, Jeremy Evans wrote: > > On 05/20 09:33, Gert Doering wrote: > >> On Wed, May 20, 2020 at 11:34:04AM -0700, Jeremy Evans wrote: > >>> To give some background, we hit this assertion failure, with the > >>> following log output: > >> > >> This should not happen, asserting out in "normal server use" is bad. > >> > >> (Neither should it ever reach that point without ks->authenticated being > >> true) > > Agreed. I think it makes sense to first fix the urgent part (ie, not > kill the server), then figure out how de code ends up at this ASSERT. > > Jeremy, can you determine from your logs whether this always > happenedwhen --auth-user-pass-verify returns zero or non-zero? Ie, > should the connections that trigger this succeed or fail? --auth-user-pass-verify returned non-zero in each crash. In that sense, the assert was good, because the alternative would be that an unauthenticated user may be able to get VPN access (assuming there wasn't some later check that caught the issue). Here are the relevant log sections from the 4 crashes: Tue May 19 15:57:03 2020 username/73.135.141.11:1194 WARNING: Failed running command (--auth-user-pass-verify): external program exited with error status: 1 Tue May 19 15:57:03 2020 username/73.135.141.11:1194 TLS Auth Error: Auth Username/Password verification failed for peer Tue May 19 15:57:03 2020 username/73.135.141.11:1194 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 ECDHE-RSA-AES256-GCM-SHA384, 2048 bit RSA Tue May 19 15:57:04 2020 username1/73.112.148.11:1194 TLS: new session incoming connection from [AF_INET]73.112.148.11:1194 Tue May 19 15:57:04 2020 username2/73.190.61.11:1194 TLS: new session incoming connection from [AF_INET]73.190.61.11:1194 Tue May 19 15:57:04 2020 username3/73.190.134.11:1194 TLS: new session incoming connection from [AF_INET]73.190.134.11:1194 Tue May 19 15:57:04 2020 username4/73.120.76.11:1194 TLS: new session incoming connection from [AF_INET]73.120.76.11:1194 Tue May 19 15:57:05 2020 username5/73.16.123.11:1194 TLS: new session incoming connection from [AF_INET]73.16.123.11:1194 Tue May 19 15:57:05 2020 username/73.135.141.11:1194 PUSH: Received control message: 'PUSH_REQUEST' Tue May 19 15:57:05 2020 username/73.135.141.11:1194 SENT CONTROL [username]: 'PUSH_REPLY,redirect-gateway def1,comp-lzo,persist-key,persist-tun,route-gateway 10.28.47.1,topology subnet,ping 10,ping-restart 120,ifconfig 10.28.47.38 255.255.255.0,peer-id 89' (status=1) Tue May 19 15:57:05 2020 username/73.135.141.11:1194 Assertion failed at /path/to/openvpn-2.4.7/src/openvpn/ssl.c:1944 (ks->authenticated) Wed May 13 14:14:54 2020 username/24.106.43.11:1194 WARNING: Failed running command (--auth-user-pass-verify): external program exited with error status: 1 Wed May 13 14:14:54 2020 username/24.106.43.11:1194 TLS Auth Error: Auth Username/Password verification failed for peer Wed May 13 14:14:54 2020 username/24.106.43.11:1194 TLS: move_session: dest=TM_ACTIVE src=TM_UNTRUSTED reinit_src=1 Wed May 13 14:14:54 2020 username/24.106.43.11:1194 TLS: tls_multi_process: untrusted session promoted to semi-trusted Wed May 13 14:14:54 2020 username/24.106.43.11:1194 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 ECDHE-RSA-AES256-GCM-SHA384, 2048 bit RSA Wed May 13 14:14:57 2020 username/24.106.43.11:1194 PUSH: Received control message: 'PUSH_REQUEST' Wed May 13 14:14:57 2020 username/24.106.43.11:1194 SENT CONTROL [username]: 'PUSH_REPLY,redirect-gateway def1,comp-lzo,persist-key,persist-tun,route-gateway 10.28.47.1,topology subnet,ping 10,ping-restart 120,ifconfig 10.28.47.38 255.255.255.0,peer-id 89' (status=1) Wed May 13 14:14:57 2020 username/24.106.43.11:1194 Assertion failed at /path/to/openvpn-2.4.7/src/openvpn/ssl.c:1944 (ks->authenticated) Wed Sep 25 12:14:56 2019 username/174.94.150.11:30407 WARNING: Failed running command (--auth-user-pass-verify): external program exited with error status: 1 Wed Sep 25 12:14:56 2019 username/174.94.150.11:30407 TLS Auth Error: Auth Username/Password verification failed for peer Wed Sep 25 12:14:56 2019 username/174.94.150.11:30407 TLS: move_session: dest=TM_ACTIVE src=TM_UNTRUSTED reinit_src=1 Wed Sep 25 12:14:56 2019 username/174.94.150.11:30407 TLS: tls_multi_process: untrusted session promoted to semi-trusted Wed Sep 25 12:14:56 2019 username/174.94.150.11:30407 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 ECDHE-RSA-AES256-GCM-SHA384, 2048 bit RSA Wed Sep 25 12:14:57 2019 username/174.94.150.11:30407 PUSH: Received control message: 'PUSH_REQUEST' Wed Sep 25 12:14:57 2019 username/174.94.150.11:30407 SENT CONTROL [username]: 'PUSH_REPLY,redirect-gateway def1,comp-lzo,persist-key,persist-tun,route-gateway 10.28.47.1,topology subnet,ping 10,ping-restart 120,ifconfig 10.28.47.38 255.255.255.0,peer-id 89' (status=1) Wed Sep 25 12:14:57 2019 username/174.94.150.11:30407 Assertion failed at /path/to/openvpn-2.4.7/src/openvpn/ssl.c:1944 (ks->authenticated) Mon Apr 16 16:20:27 2018 username/73.112.174.11:2939 WARNING: Failed running command (--auth-user-pass-verify): external program exited with error status: 1 Mon Apr 16 16:20:27 2018 username/73.112.174.11:2939 TLS Auth Error: Auth Username/Password verification failed for peer Mon Apr 16 16:20:27 2018 username/73.112.174.11:2939 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 ECDHE-RSA-AES256-GCM-SHA384, 2048 bit RSA Mon Apr 16 16:20:28 2018 username/73.112.174.11:2939 PUSH: Received control message: 'PUSH_REQUEST' Mon Apr 16 16:20:28 2018 username/73.112.174.11:2939 SENT CONTROL [username]: 'PUSH_REPLY,redirect-gateway def1,comp-lzo,persist-key,persist-tun,route-gateway 10.28.47.1,topology subnet,ping 10,ping-restart 120,ifconfig 10.28.47.38 255.255.255.0,peer-id 89' (status=1) Mon Apr 16 16:20:28 2018 username/73.112.174.11:2939 Assertion failed at /path/to/openvpn-2.4.4/src/openvpn/ssl.c:1929 (ks->authenticated) Thanks, Jeremy
Your patch has been applied to the master and release/2.4 branch (bugfix). Thanks. I do hope Steffan finds time to track down the root cause of this - from your log, it smells like a timing issue if multiple clients connect in quick succession and multiple instances of the plugin are called. Making "something" not finish initializing before the first client sends his PUSH_REQUEST. Or so. commit 984bd1e1601e4b9562dbc88b02a8db60b884286f (master) commit 098edbb1f5a2e1360fd6a4ae0642b63bec12e992 (release/2.4) Author: Jeremy Evans Date: Wed May 20 11:34:04 2020 -0700 Switch assertion failure to returning false Signed-off-by: Jeremy Evans <code@jeremyevans.net> Acked-by: Steffan Karger <steffan.karger@foxcrypto.com> Message-Id: <20200520183404.54822-1-code@jeremyevans.net> URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg19914.html Signed-off-by: Gert Doering <gert@greenie.muc.de> -- kind regards, Gert Doering
diff --git a/src/openvpn/ssl.c b/src/openvpn/ssl.c index c2e9a4f3..c8b3a959 100644 --- a/src/openvpn/ssl.c +++ b/src/openvpn/ssl.c @@ -1930,7 +1930,10 @@ tls_session_generate_data_channel_keys(struct tls_session *session) const struct session_id *server_sid = !session->opt->server ? &ks->session_id_remote : &session->session_id; - ASSERT(ks->authenticated); + if (!ks->authenticated) { + msg(D_TLS_ERRORS, "TLS Error: key_state not authenticated"); + goto cleanup; + } ks->crypto_options.flags = session->opt->crypto_flags; if (!generate_key_expansion(&ks->crypto_options.key_ctx_bi,
This assertion failure can be hit in production, which causes the openvpn server process to stop and all clients to be disconnected. Bug #1270 has been filed for this issue on Trac by another user who has experienced the issue, and this patch attempts to address it. Tracing callers, it appears that some callers check ks->authenticated before calling, but others do not. It may be possible to add the check for the callers that do not check, but this seems to be a simpler solution. To give some background, we hit this assertion failure, with the following log output: ``` Tue May 19 15:57:05 2020 username/73.135.141.11:1194 PUSH: Received control message: 'PUSH_REQUEST' Tue May 19 15:57:05 2020 username/73.135.141.11:1194 SENT CONTROL [username]: 'PUSH_REPLY,redirect-gateway def1,comp-lzo,persist-key,persist-tun,route-gateway 10.28.47.1,topology subnet,ping 10,ping-restart 120,ifconfig 10.28.47.38 255.255.255.0,peer-id 89' (status=1) Tue May 19 15:57:05 2020 username/73.135.141.11:1194 Assertion failed at /path/to/openvpn-2.4.7/src/openvpn/ssl.c:1944 (ks->authenticated) Tue May 19 15:57:05 2020 username/73.135.141.11:1194 Exiting due to fatal error Tue May 19 15:57:05 2020 username/73.135.141.11:1194 Closing TUN/TAP interface ``` using the following OpenVPN server configuration: ``` port 1194 proto udp dev-type tun ca ca.crt cert server.crt key server.key dh dh.pem topology subnet push "redirect-gateway def1" push "comp-lzo" push "persist-key" push "persist-tun" keepalive 10 120 comp-lzo user nobody group nobody persist-key persist-tun cd /home/openvpn/server chroot /var/empty daemon verb 3 crl-verify crl.pem tls-auth ta.key 0 cipher AES-256-CBC tls-version-min 1.2 tls-cipher ECDHE-RSA-AES256-GCM-SHA384 ncp-disable mute-replay-warnings script-security 3 auth-user-pass-verify "ldap-auth/ldap-auth" via-env auth-user-pass-optional ``` and the following command line options: ``` --config openvpn.conf --dev tun1 --local 206.131.72.52 \ --log-append openvpn.log --status openvpn-status.log \ --server 10.28.47.0 255.255.255.0 ``` The failed assertion is inside the function `tls_session_generate_data_channel_keys`, which is called 3 other places in `ssl.c.`: * `key_method_2_write`: checks for `ks->authenticated` before calling * `key_method_2_read`: appears to run in client mode but not in server mode * `tls_session_update_crypto_params`: runs in server mode and does not check before calling That leads me to believe the problem caller is `tls_session_update_crypto_params`. There.s three callers of `tls_session_update_crypto_params`:. * `incoming_push_message` (`push.c`): Probably this caller, since the server pushes configuration to clients, and the log shows the assertion failure right after the push reply. * `multi_process_file_closed` (`multi.c`): Not this caller. NCP is disabled in config, and async push was not enabled when compiling. * `do_deferred_options` (`init.c`): Not this caller. The server configuration doesn't pull. Changing the assertion to returning false appears to be the simplest fix. Another approach would be changing callers to check `ks->authenticated` before calling, either `tls_session_update_crypto_params` or `incoming_push_message`. Signed-off-by: Jeremy Evans <code@jeremyevans.net> --- src/openvpn/ssl.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)