| Message ID | 20260202132309.567382-1-ralf@mandelbit.com |
|---|---|
| State | New |
| Headers | show |
| Series | [Openvpn-devel,ovpn,net] ovpn: detach TCP socket before invoking close | expand |
This patch is actually fixing the same issue reported here: https://lore.kernel.org/netdev/176996279620.3109699.15382994681575380467@eldamar.lan/ Cheers, On 02/02/2026 14:23, Ralf Lici wrote: > Currently, when a connected peer expires (no packets received within the > keepalive interval), we remove the peer and notify userspace of the > deletion. We then insert the peer in the release list and proceed to > detach and release the socket and the peer. This can be problematic with > TCP because, as soon as we send the notification, openvpn will close the > peer's socket and if ovpn_tcp_close is invoked before > ovpn_tcp_socket_detach we incurr in a NULL pointer dereference when > trying to access sk->sk_socket. > > Enforce correct ordering by calling ovpn_sock_release before invoking > the original socket close callback. This avoids potential race > conditions and guarantees that we completely detach from the socket once > userspace issues the close command. > > Signed-off-by: Ralf Lici <ralf@mandelbit.com> > --- > drivers/net/ovpn/tcp.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c > index 0d7f30360d87..13d2a8069695 100644 > --- a/drivers/net/ovpn/tcp.c > +++ b/drivers/net/ovpn/tcp.c > @@ -553,6 +553,7 @@ static void ovpn_tcp_close(struct sock *sk, long timeout) > rcu_read_unlock(); > > ovpn_peer_del(sock->peer, OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT); > + ovpn_socket_release(peer); > peer->tcp.sk_cb.prot->close(sk, timeout); > ovpn_peer_put(peer); > }
On Mon, 2026-02-02 at 14:23 +0100, Ralf Lici wrote: > Currently, when a connected peer expires (no packets received within > the > keepalive interval), we remove the peer and notify userspace of the > deletion. We then insert the peer in the release list and proceed to > detach and release the socket and the peer. This can be problematic > with > TCP because, as soon as we send the notification, openvpn will close > the > peer's socket and if ovpn_tcp_close is invoked before > ovpn_tcp_socket_detach we incurr in a NULL pointer dereference when > trying to access sk->sk_socket. > > Enforce correct ordering by calling ovpn_sock_release before invoking > the original socket close callback. This avoids potential race > conditions and guarantees that we completely detach from the socket > once > userspace issues the close command. > Fixes: 11851cbd60ea ("ovpn: implement TCP transport") > Signed-off-by: Ralf Lici <ralf@mandelbit.com> > --- > drivers/net/ovpn/tcp.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c > index 0d7f30360d87..13d2a8069695 100644 > --- a/drivers/net/ovpn/tcp.c > +++ b/drivers/net/ovpn/tcp.c > @@ -553,6 +553,7 @@ static void ovpn_tcp_close(struct sock *sk, long > timeout) > rcu_read_unlock(); > > ovpn_peer_del(sock->peer, > OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT); > + ovpn_socket_release(peer); > peer->tcp.sk_cb.prot->close(sk, timeout); > ovpn_peer_put(peer); > }
2026-02-02, 14:33:47 +0100, Antonio Quartulli wrote: > This patch is actually fixing the same issue reported here: > > https://lore.kernel.org/netdev/176996279620.3109699.15382994681575380467@eldamar.lan/ So it would be nice to add a Link: <url> tag just above the sign-off/fixes tag (to the netdev post or to the debian bugtracker), and there could also be a Reported-by tag (you may want to check with the reporter if they agree, but the debian bugtracker publicly lists the name and email so I don't see why not). > Cheers, > > On 02/02/2026 14:23, Ralf Lici wrote: > > Currently, when a connected peer expires (no packets received within the > > keepalive interval), we remove the peer and notify userspace of the > > deletion. We then insert the peer in the release list and proceed to > > detach and release the socket and the peer. This can be problematic with > > TCP because, as soon as we send the notification, openvpn will close the > > peer's socket and if ovpn_tcp_close is invoked before > > ovpn_tcp_socket_detach we incurr in a NULL pointer dereference when > > trying to access sk->sk_socket. > > > > Enforce correct ordering by calling ovpn_sock_release before invoking > > the original socket close callback. This avoids potential race > > conditions and guarantees that we completely detach from the socket once > > userspace issues the close command. The comment for ovpn_sock_release says: * This function is expected to be invoked exactly once per peer But it seems in this case we won't be obeying this assumption? Once from ovpn_tcp_close, and again from unlock_ovpn in the timeout handler? IIRC this "invoked exactly once" was the condition that made rcu_replace_pointer(peer->sock, NULL, true); valid, but what will prevent both sides from seeing peer->sock != NULL now? (not really related, but maybe unlock_ovpn should be using llist_for_each_entry_safe since the current peer might go away before we grab the next list item?)
On 06/02/2026 19:43, Sabrina Dubroca wrote: > 2026-02-02, 14:33:47 +0100, Antonio Quartulli wrote: >> This patch is actually fixing the same issue reported here: >> >> https://lore.kernel.org/netdev/176996279620.3109699.15382994681575380467@eldamar.lan/ > > So it would be nice to add a > > Link: <url> > > tag just above the sign-off/fixes tag (to the netdev post or to the > debian bugtracker), and there could also be a Reported-by tag (you may > want to check with the reporter if they agree, but the debian > bugtracker publicly lists the name and email so I don't see why not). > The bug was discovered[1] and the patch was created before we got the Debian report. Hence there is no mention of them. [1]https://github.com/OpenVPN/ovpn-net-next/issues/29 But we can still add the Link tag at least, to make it clear that the ticket is related. >> On 02/02/2026 14:23, Ralf Lici wrote: >>> Currently, when a connected peer expires (no packets received within the >>> keepalive interval), we remove the peer and notify userspace of the >>> deletion. We then insert the peer in the release list and proceed to >>> detach and release the socket and the peer. This can be problematic with >>> TCP because, as soon as we send the notification, openvpn will close the >>> peer's socket and if ovpn_tcp_close is invoked before >>> ovpn_tcp_socket_detach we incurr in a NULL pointer dereference when >>> trying to access sk->sk_socket. >>> >>> Enforce correct ordering by calling ovpn_sock_release before invoking >>> the original socket close callback. This avoids potential race >>> conditions and guarantees that we completely detach from the socket once >>> userspace issues the close command. > > The comment for ovpn_sock_release says: > > * This function is expected to be invoked exactly once per peer > > But it seems in this case we won't be obeying this assumption? Once > from ovpn_tcp_close, and again from unlock_ovpn in the timeout > handler? > > IIRC this "invoked exactly once" was the condition that made > > rcu_replace_pointer(peer->sock, NULL, true); > > valid, but what will prevent both sides from seeing peer->sock != NULL > now? Mh indeed you are right. The rcu_replace_pointer call must be protected now. We have to double check which lock we can safely hold without incurring in any conflict with the rest of the socket lifecycle. Thanks for pointing this out. > > > (not really related, but maybe unlock_ovpn should be using > llist_for_each_entry_safe since the current peer might go away before > we grab the next list item?) hm it will go away at the next RCU cycle, when the RCU-scheduled call to ovpn_peer_release_rcu() will kick in. And indeed we are not protected against that, so yeah, it can theoretically go away. Thanks for spotting this! Wanna send a patch? Otherwise we'll take care of it :) Regards,
On 09/02/2026 11:21, Antonio Quartulli wrote: >>> On 02/02/2026 14:23, Ralf Lici wrote: >>>> Currently, when a connected peer expires (no packets received within >>>> the >>>> keepalive interval), we remove the peer and notify userspace of the >>>> deletion. We then insert the peer in the release list and proceed to >>>> detach and release the socket and the peer. This can be problematic >>>> with >>>> TCP because, as soon as we send the notification, openvpn will close >>>> the >>>> peer's socket and if ovpn_tcp_close is invoked before >>>> ovpn_tcp_socket_detach we incurr in a NULL pointer dereference when >>>> trying to access sk->sk_socket. >>>> >>>> Enforce correct ordering by calling ovpn_sock_release before invoking >>>> the original socket close callback. This avoids potential race >>>> conditions and guarantees that we completely detach from the socket >>>> once >>>> userspace issues the close command. >> >> The comment for ovpn_sock_release says: >> >> * This function is expected to be invoked exactly once per peer >> >> But it seems in this case we won't be obeying this assumption? Once >> from ovpn_tcp_close, and again from unlock_ovpn in the timeout >> handler? >> >> IIRC this "invoked exactly once" was the condition that made >> >> rcu_replace_pointer(peer->sock, NULL, true); >> >> valid, but what will prevent both sides from seeing peer->sock != NULL >> now? > > Mh indeed you are right. > > The rcu_replace_pointer call must be protected now. > We have to double check which lock we can safely hold without incurring > in any conflict with the rest of the socket lifecycle. > > Thanks for pointing this out. What if we go the other way around: we have ovpn_tcp_close() check if the peer was already gone from the ovpn hash or not. If it did, we know the peer is going through its own lifecycle and we skip the ovpn_peer_del() call. This is what I have mind (not tested yet! only compiled): diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c index 1c02cc06f21b..83e7a5354266 100644 --- a/drivers/net/ovpn/tcp.c +++ b/drivers/net/ovpn/tcp.c @@ -543,8 +543,8 @@ int ovpn_tcp_socket_attach(struct ovpn_socket *ovpn_sock, static void ovpn_tcp_close(struct sock *sk, long timeout) { + struct ovpn_peer *peer, *tmp; struct ovpn_socket *sock; - struct ovpn_peer *peer; rcu_read_lock(); sock = rcu_dereference_sk_user_data(sk); @@ -555,7 +555,26 @@ static void ovpn_tcp_close(struct sock *sk, long timeout) peer = sock->peer; rcu_read_unlock(); - ovpn_peer_del(sock->peer, OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT); + spin_lock_bh(&peer->ovpn->lock); + /* if the peer is already unhashed, it means it already + * went through ovpn_peer_remove() due to the actual removal event. + * This tcp_close call is just the socket being cleaned up by + * userspace and we don't need to double process the deletion + * + * NOTE: we need to lock (instead of just rcu-read-lock) because + * we have to synchronize with other calls to ovpn_peer_remove() + * happening with ovpn->lock held. + * This way we are guaranteed to call ovpn_peer_del() without + * racing with another ovpn_peer_remove(). + */ + tmp = ovpn_peer_get_by_id(peer->ovpn, peer->id); + if (tmp) { + ovpn_peer_del(sock->peer, + OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT); + ovpn_peer_put(tmp); + } + spin_unlock_bh(&peer->ovpn->lock); + peer->tcp.sk_cb.prot->close(sk, timeout); ovpn_peer_put(peer); } What do you think?
On 09/02/2026 16:32, Antonio Quartulli wrote: > What if we go the other way around: we have ovpn_tcp_close() check if > the peer was already gone from the ovpn hash or not. > > If it did, we know the peer is going through its own lifecycle and we > skip the ovpn_peer_del() call. > > This is what I have mind (not tested yet! only compiled): > > diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c > index 1c02cc06f21b..83e7a5354266 100644 > --- a/drivers/net/ovpn/tcp.c > +++ b/drivers/net/ovpn/tcp.c > @@ -543,8 +543,8 @@ int ovpn_tcp_socket_attach(struct ovpn_socket > *ovpn_sock, > > static void ovpn_tcp_close(struct sock *sk, long timeout) > { > + struct ovpn_peer *peer, *tmp; > struct ovpn_socket *sock; > - struct ovpn_peer *peer; > > rcu_read_lock(); > sock = rcu_dereference_sk_user_data(sk); > @@ -555,7 +555,26 @@ static void ovpn_tcp_close(struct sock *sk, long > timeout) > peer = sock->peer; > rcu_read_unlock(); > > - ovpn_peer_del(sock->peer, > OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT); > + spin_lock_bh(&peer->ovpn->lock); > + /* if the peer is already unhashed, it means it already > + * went through ovpn_peer_remove() due to the actual removal event. > + * This tcp_close call is just the socket being cleaned up by > + * userspace and we don't need to double process the deletion > + * > + * NOTE: we need to lock (instead of just rcu-read-lock) because > + * we have to synchronize with other calls to ovpn_peer_remove() > + * happening with ovpn->lock held. > + * This way we are guaranteed to call ovpn_peer_del() without > + * racing with another ovpn_peer_remove(). > + */ > + tmp = ovpn_peer_get_by_id(peer->ovpn, peer->id); > + if (tmp) { or even better, we could check for !hlist_unhashed(&peer->hash_entry_id), to avoid hitting another peer that just got the same ID re-assigned (nearly impossible, but you know ..) Cheers, > + ovpn_peer_del(sock->peer, > + OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT); > + ovpn_peer_put(tmp); > + } > + spin_unlock_bh(&peer->ovpn->lock); > + > peer->tcp.sk_cb.prot->close(sk, timeout); > ovpn_peer_put(peer); > } > > > What do you think? > >
2026-02-09, 16:36:55 +0100, Antonio Quartulli wrote: > On 09/02/2026 16:32, Antonio Quartulli wrote: > > What if we go the other way around: we have ovpn_tcp_close() check if > > the peer was already gone from the ovpn hash or not. > > > > If it did, we know the peer is going through its own lifecycle and we > > skip the ovpn_peer_del() call. To make sure I understand: the bug we're trying to fix here is that we end up calling ovpn_tcp_close() too early, so by the time ovpn_tcp_socket_detach() is called, sk->sk_socket is already NULL so "sk->sk_socket->ops = ..." crashes the kernel. Ralf's patch makes sure ovpn_socket_release() -> ... -> ovpn_tcp_socket_detach() has been called by the time we're done with close(). Is that correct? Then I'm confused by how this would solve the issue. We could still be done with ovpn_tcp_close() before ovpn_peer_keepalive_work() -> unlock_ovpn -> ovpn_socket_release has completed? > > This is what I have mind (not tested yet! only compiled): > > > > diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c > > index 1c02cc06f21b..83e7a5354266 100644 > > --- a/drivers/net/ovpn/tcp.c > > +++ b/drivers/net/ovpn/tcp.c > > @@ -543,8 +543,8 @@ int ovpn_tcp_socket_attach(struct ovpn_socket > > *ovpn_sock, > > > > static void ovpn_tcp_close(struct sock *sk, long timeout) > > { > > + struct ovpn_peer *peer, *tmp; > > struct ovpn_socket *sock; > > - struct ovpn_peer *peer; > > > > rcu_read_lock(); > > sock = rcu_dereference_sk_user_data(sk); > > @@ -555,7 +555,26 @@ static void ovpn_tcp_close(struct sock *sk, long > > timeout) > > peer = sock->peer; > > rcu_read_unlock(); > > > > - ovpn_peer_del(sock->peer, > > OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT); > > + spin_lock_bh(&peer->ovpn->lock); > > + /* if the peer is already unhashed, it means it already > > + * went through ovpn_peer_remove() due to the actual removal event. > > + * This tcp_close call is just the socket being cleaned up by > > + * userspace and we don't need to double process the deletion > > + * > > + * NOTE: we need to lock (instead of just rcu-read-lock) because > > + * we have to synchronize with other calls to ovpn_peer_remove() > > + * happening with ovpn->lock held. > > + * This way we are guaranteed to call ovpn_peer_del() without > > + * racing with another ovpn_peer_remove(). > > + */ > > + tmp = ovpn_peer_get_by_id(peer->ovpn, peer->id); > > + if (tmp) { > > or even better, we could check for !hlist_unhashed(&peer->hash_entry_id), to But hash_entry_id isn't used in P2P mode? > avoid hitting another peer that just got the same ID re-assigned (nearly > impossible, but you know ..) > > > Cheers, > > > + ovpn_peer_del(sock->peer, ovpn_peer_del wants to acquire peer->ovpn->lock which we just took, you'd have to rework that function. > > + OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT); > > + ovpn_peer_put(tmp); > > + } > > + spin_unlock_bh(&peer->ovpn->lock); > > + > > peer->tcp.sk_cb.prot->close(sk, timeout); > > ovpn_peer_put(peer); > > }
On 09/02/2026 17:34, Sabrina Dubroca wrote: > 2026-02-09, 16:36:55 +0100, Antonio Quartulli wrote: >> On 09/02/2026 16:32, Antonio Quartulli wrote: >>> What if we go the other way around: we have ovpn_tcp_close() check if >>> the peer was already gone from the ovpn hash or not. >>> >>> If it did, we know the peer is going through its own lifecycle and we >>> skip the ovpn_peer_del() call. > > To make sure I understand: the bug we're trying to fix here is that we > end up calling ovpn_tcp_close() too early, so by the time > ovpn_tcp_socket_detach() is called, sk->sk_socket is already NULL so > "sk->sk_socket->ops = ..." crashes the kernel. Ralf's patch makes sure > ovpn_socket_release() -> ... -> ovpn_tcp_socket_detach() has been > called by the time we're done with close(). Is that correct? > > Then I'm confused by how this would solve the issue. We could still be > done with ovpn_tcp_close() before ovpn_peer_keepalive_work() -> > unlock_ovpn -> ovpn_socket_release has completed? Mh ok. At first I wrote down my entire theory, then I realized it was flawed and erased it again. You're right, we still have tcp_close() being invoked before ovpn_socket_release(), which is what is causing the issue due to sk->sk_socket == NULL. Do you think we could have a reliable way to check in ovpn_socket_release() is sk->sk_socket is NULL and, if so, just bail out? That'd be telling us: "the socket is already gone - no need to continue and crash". Cheers, > >>> This is what I have mind (not tested yet! only compiled): >>> >>> diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c >>> index 1c02cc06f21b..83e7a5354266 100644 >>> --- a/drivers/net/ovpn/tcp.c >>> +++ b/drivers/net/ovpn/tcp.c >>> @@ -543,8 +543,8 @@ int ovpn_tcp_socket_attach(struct ovpn_socket >>> *ovpn_sock, >>> >>> static void ovpn_tcp_close(struct sock *sk, long timeout) >>> { >>> + struct ovpn_peer *peer, *tmp; >>> struct ovpn_socket *sock; >>> - struct ovpn_peer *peer; >>> >>> rcu_read_lock(); >>> sock = rcu_dereference_sk_user_data(sk); >>> @@ -555,7 +555,26 @@ static void ovpn_tcp_close(struct sock *sk, long >>> timeout) >>> peer = sock->peer; >>> rcu_read_unlock(); >>> >>> - ovpn_peer_del(sock->peer, >>> OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT); >>> + spin_lock_bh(&peer->ovpn->lock); >>> + /* if the peer is already unhashed, it means it already >>> + * went through ovpn_peer_remove() due to the actual removal event. >>> + * This tcp_close call is just the socket being cleaned up by >>> + * userspace and we don't need to double process the deletion >>> + * >>> + * NOTE: we need to lock (instead of just rcu-read-lock) because >>> + * we have to synchronize with other calls to ovpn_peer_remove() >>> + * happening with ovpn->lock held. >>> + * This way we are guaranteed to call ovpn_peer_del() without >>> + * racing with another ovpn_peer_remove(). >>> + */ >>> + tmp = ovpn_peer_get_by_id(peer->ovpn, peer->id); >>> + if (tmp) { >> >> or even better, we could check for !hlist_unhashed(&peer->hash_entry_id), to > > But hash_entry_id isn't used in P2P mode? > >> avoid hitting another peer that just got the same ID re-assigned (nearly >> impossible, but you know ..) >> >> >> Cheers, >> >>> + ovpn_peer_del(sock->peer, > > ovpn_peer_del wants to acquire peer->ovpn->lock which we just took, > you'd have to rework that function. > >>> + OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT); >>> + ovpn_peer_put(tmp); >>> + } >>> + spin_unlock_bh(&peer->ovpn->lock); >>> + >>> peer->tcp.sk_cb.prot->close(sk, timeout); >>> ovpn_peer_put(peer); >>> } >
On 10/02/2026 01:07, Antonio Quartulli wrote: > On 09/02/2026 17:34, Sabrina Dubroca wrote: >> 2026-02-09, 16:36:55 +0100, Antonio Quartulli wrote: >>> On 09/02/2026 16:32, Antonio Quartulli wrote: >>>> What if we go the other way around: we have ovpn_tcp_close() check if >>>> the peer was already gone from the ovpn hash or not. >>>> >>>> If it did, we know the peer is going through its own lifecycle and we >>>> skip the ovpn_peer_del() call. >> >> To make sure I understand: the bug we're trying to fix here is that we >> end up calling ovpn_tcp_close() too early, so by the time >> ovpn_tcp_socket_detach() is called, sk->sk_socket is already NULL so >> "sk->sk_socket->ops = ..." crashes the kernel. Ralf's patch makes sure >> ovpn_socket_release() -> ... -> ovpn_tcp_socket_detach() has been >> called by the time we're done with close(). Is that correct? >> >> Then I'm confused by how this would solve the issue. We could still be >> done with ovpn_tcp_close() before ovpn_peer_keepalive_work() -> >> unlock_ovpn -> ovpn_socket_release has completed? > As we have already stated, the crux of the issue is the moment tcp_close() sets sk->sk_socket to NULL. This happens in: 1. tcp_close(sk, timeout); 2. __tcp_close(sk, timeout); 3. sock_orphan(sk); 4. sk_set_socket(sk, NULL); When we later/concurrently invoke ovpn_tcp_socket_release() we dereference sk->sk_socket, which is now NULL. sock_orphan()is actually pretty small: 2123 static inline void sock_orphan(struct sock *sk) 2124 { 2125 write_lock_bh(&sk->sk_callback_lock); 2126 sock_set_flag(sk, SOCK_DEAD); 2127 sk_set_socket(sk, NULL); 2128 sk->sk_wq = NULL; 2129 write_unlock_bh(&sk->sk_callback_lock); 2130 } and as we can see it holds and then releases sk_callback_lock. I remember that we were originally using this lock when setting the callbacks, but we eventually dropped it (around v3 or v4 of the original patchset, but I couldn't find the reason). Maybe we thought there was no need to hold that lock? Now, if we actually acquire this lock in ovpn_tcp_socket_detach(), check that sk_socket is non-NULL, restores the CB and release it, wouldn't we solve our problem? Cheers,
2026-02-10, 14:50:47 +0100, Antonio Quartulli wrote: > On 10/02/2026 01:07, Antonio Quartulli wrote: > > On 09/02/2026 17:34, Sabrina Dubroca wrote: > > > 2026-02-09, 16:36:55 +0100, Antonio Quartulli wrote: > > > > On 09/02/2026 16:32, Antonio Quartulli wrote: > > > > > What if we go the other way around: we have ovpn_tcp_close() check if > > > > > the peer was already gone from the ovpn hash or not. > > > > > > > > > > If it did, we know the peer is going through its own lifecycle and we > > > > > skip the ovpn_peer_del() call. > > > > > > To make sure I understand: the bug we're trying to fix here is that we > > > end up calling ovpn_tcp_close() too early, so by the time > > > ovpn_tcp_socket_detach() is called, sk->sk_socket is already NULL so > > > "sk->sk_socket->ops = ..." crashes the kernel. Ralf's patch makes sure > > > ovpn_socket_release() -> ... -> ovpn_tcp_socket_detach() has been > > > called by the time we're done with close(). Is that correct? > > > > > > Then I'm confused by how this would solve the issue. We could still be > > > done with ovpn_tcp_close() before ovpn_peer_keepalive_work() -> > > > unlock_ovpn -> ovpn_socket_release has completed? > > > Mh ok. At first I wrote down my entire theory, then I realized it > was flawed and erased it again. I've done that as well at least once in this thread :) > As we have already stated, the crux of the issue is the moment tcp_close() > sets sk->sk_socket to NULL. > > This happens in: > 1. tcp_close(sk, timeout); > 2. __tcp_close(sk, timeout); > 3. sock_orphan(sk); > 4. sk_set_socket(sk, NULL); > > When we later/concurrently invoke ovpn_tcp_socket_release() we dereference > sk->sk_socket, which is now NULL. > > sock_orphan()is actually pretty small: > > 2123 static inline void sock_orphan(struct sock *sk) > 2124 { > 2125 write_lock_bh(&sk->sk_callback_lock); > 2126 sock_set_flag(sk, SOCK_DEAD); > 2127 sk_set_socket(sk, NULL); > 2128 sk->sk_wq = NULL; > 2129 write_unlock_bh(&sk->sk_callback_lock); > 2130 } > > and as we can see it holds and then releases sk_callback_lock. > > I remember that we were originally using this lock when setting the > callbacks, but we eventually dropped it (around v3 or v4 of the original > patchset, but I couldn't find the reason). > Maybe we thought there was no need to hold that lock? Possibly :) Or holding a spinlock at that stage was no longer possible due to some changes (maybe we'd have done sleeping ops under it), but either way, the locking probably changed another 5 times afterwards :) > Now, if we actually acquire this lock in ovpn_tcp_socket_detach(), check > that sk_socket is non-NULL, restores the CB and release it, wouldn't we > solve our problem? I think it would, but maybe post this to netdev with Jakub/Paolo/etc in CC for confirmation (answers may be slow due to the merge window). Expanding the commit message to describe a bit better the order of events leading to the crash would be good too (maybe also summarize this thread, ie explain other idea(s) and why they don't work). Otherwise we might have to come up with a scheme that delays ovpn_tcp_close() until the pending/in progress ovpn_socket_release() has completed. Let's hope not... (I was looking into another idea, making ovpn_socket_release's rcu_replace_pointer atomic by using xchg instead, but this could still end up calling tcp_close before detach() is done if ovpn_tcp_close starts immediately after we've swapped the pointer)
diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c index 0d7f30360d87..13d2a8069695 100644 --- a/drivers/net/ovpn/tcp.c +++ b/drivers/net/ovpn/tcp.c @@ -553,6 +553,7 @@ static void ovpn_tcp_close(struct sock *sk, long timeout) rcu_read_unlock(); ovpn_peer_del(sock->peer, OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT); + ovpn_socket_release(peer); peer->tcp.sk_cb.prot->close(sk, timeout); ovpn_peer_put(peer); }
Currently, when a connected peer expires (no packets received within the keepalive interval), we remove the peer and notify userspace of the deletion. We then insert the peer in the release list and proceed to detach and release the socket and the peer. This can be problematic with TCP because, as soon as we send the notification, openvpn will close the peer's socket and if ovpn_tcp_close is invoked before ovpn_tcp_socket_detach we incurr in a NULL pointer dereference when trying to access sk->sk_socket. Enforce correct ordering by calling ovpn_sock_release before invoking the original socket close callback. This avoids potential race conditions and guarantees that we completely detach from the socket once userspace issues the close command. Signed-off-by: Ralf Lici <ralf@mandelbit.com> --- drivers/net/ovpn/tcp.c | 1 + 1 file changed, 1 insertion(+)