Message ID | 20250505225549.19492-1-a@unstable.cc |
---|---|
State | Accepted |
Delegated to: | Antonio Quartulli |
Headers | show |
Series | [Openvpn-devel,ovpn-net-next,v2] ovpn: ensure sk is still valid during cleanup | expand |
Hi, On 06/05/2025 00:55, Antonio Quartulli wrote: > From: Antonio Quartulli <antonio@openvpn.net> > [...] > Signed-off-by: Antonio Quartulli <antonio@openvpn.net> This is v2 for "[PATCH ovpn-net-next 1/2] ovpn: don't access sk after release". > --- > drivers/net/ovpn/socket.c | 21 ++++++++++++--------- > 1 file changed, 12 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c > index a83cbab72591..66a2ecbc483b 100644 > --- a/drivers/net/ovpn/socket.c > +++ b/drivers/net/ovpn/socket.c > @@ -66,6 +66,7 @@ static bool ovpn_socket_put(struct ovpn_peer *peer, struct ovpn_socket *sock) > void ovpn_socket_release(struct ovpn_peer *peer) > { > struct ovpn_socket *sock; > + struct sock *sk; > bool released; > > might_sleep(); > @@ -75,13 +76,14 @@ void ovpn_socket_release(struct ovpn_peer *peer) > if (!sock) > return; > > - /* sanity check: we should not end up here if the socket > - * was already closed > + /* sock->sk may be released concurrently, therefore we > + * first attempt grabbing a reference. > + * if sock->sk is NULL it means it is already being > + * destroyed and we don't need any further cleanup > */ > - if (!sock->sock->sk) { > - DEBUG_NET_WARN_ON_ONCE(1); > + sk = sock->sock->sk; > + if (!sk || !refcount_inc_not_zero(&sk->sk_refcnt)) > return; > - } Rather than accessing sk without any guarantee of not having been nullified in the meantime, I think it is safer to check and possibly grab a reference. At this point we know for sure that sk must be valid and wasn't released yet. I hope it makes sense. Cheers,
Hi, On Tue, May 06, 2025 at 12:55:49AM +0200, Antonio Quartulli wrote: > From: Antonio Quartulli <antonio@openvpn.net> > > In case of UDP peer timeout, an openvpn client (userspace) > performs the following actions: > 1. receives the peer deletion notification (reason=timeout) > 2. closes the socket [..] > > Signed-off-by: Antonio Quartulli <antonio@openvpn.net> > --- > drivers/net/ovpn/socket.c | 21 ++++++++++++--------- > 1 file changed, 12 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c > index a83cbab72591..66a2ecbc483b 100644 On my ubuntu 20.04 ("backport of ovpn") running a few OpenVPN instances for test purposes, opening and closing DCO interface frequently, I could fairly reliably crash the system in ~1 hour. With this patch, the machine survived the full test parcours. I can not comment on the technical merits on the patch, just state "it fixes my problem", so Tested-By: Gert Doering <gert@greenie.muc.de> gert
Hi Sabrina, On 07/05/2025 13:30, Gert Doering wrote: > Tested-By: Gert Doering <gert@greenie.muc.de> > Gert has stress tested this change quite a bit and he couldn't reproduce any side issue. To me it looks like this is a reasonable solution to avoid clashes due to racing cleanups (ovpn_socket_release vs destroy/close). Please let me know if you have any comment, otherwise I'd like to send this to netdev for inclusion, since it fixes a frequent kernel panic. Regards,
On 06/05/2025 00:55, Antonio Quartulli wrote:
> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Merged to main, commit id 476148af4e28
So far we didn't spot any regression, so we'll move on with this fix for
the time being as we know it is preventing the crash for good.
We can improve it later if we feel this is not the best we can do.
Cheers!
diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c index a83cbab72591..66a2ecbc483b 100644 --- a/drivers/net/ovpn/socket.c +++ b/drivers/net/ovpn/socket.c @@ -66,6 +66,7 @@ static bool ovpn_socket_put(struct ovpn_peer *peer, struct ovpn_socket *sock) void ovpn_socket_release(struct ovpn_peer *peer) { struct ovpn_socket *sock; + struct sock *sk; bool released; might_sleep(); @@ -75,13 +76,14 @@ void ovpn_socket_release(struct ovpn_peer *peer) if (!sock) return; - /* sanity check: we should not end up here if the socket - * was already closed + /* sock->sk may be released concurrently, therefore we + * first attempt grabbing a reference. + * if sock->sk is NULL it means it is already being + * destroyed and we don't need any further cleanup */ - if (!sock->sock->sk) { - DEBUG_NET_WARN_ON_ONCE(1); + sk = sock->sock->sk; + if (!sk || !refcount_inc_not_zero(&sk->sk_refcnt)) return; - } /* Drop the reference while holding the sock lock to avoid * concurrent ovpn_socket_new call to mess up with a partially @@ -90,18 +92,18 @@ void ovpn_socket_release(struct ovpn_peer *peer) * Holding the lock ensures that a socket with refcnt 0 is fully * detached before it can be picked by a concurrent reader. */ - lock_sock(sock->sock->sk); + lock_sock(sk); released = ovpn_socket_put(peer, sock); - release_sock(sock->sock->sk); + release_sock(sk); /* align all readers with sk_user_data being NULL */ synchronize_rcu(); /* following cleanup should happen with lock released */ if (released) { - if (sock->sock->sk->sk_protocol == IPPROTO_UDP) { + if (sk->sk_protocol == IPPROTO_UDP) { netdev_put(sock->ovpn->dev, &sock->dev_tracker); - } else if (sock->sock->sk->sk_protocol == IPPROTO_TCP) { + } else if (sk->sk_protocol == IPPROTO_TCP) { /* wait for TCP jobs to terminate */ ovpn_tcp_socket_wait_finish(sock); ovpn_peer_put(sock->peer); @@ -111,6 +113,7 @@ void ovpn_socket_release(struct ovpn_peer *peer) */ kfree(sock); } + sock_put(sk); } static bool ovpn_socket_hold(struct ovpn_socket *sock)