[Openvpn-devel,ovpn,net,1/9] ovpn: skip rehash for peers already removed from by_id

Message ID 20260526231850.2511369-1-a@unstable.cc
State New
Headers
Series [Openvpn-devel,ovpn,net,1/9] ovpn: skip rehash for peers already removed from by_id |

Commit Message

Antonio Quartulli May 26, 2026, 11:18 p.m. UTC
  From: Antonio Quartulli <antonio@openvpn.net>

ovpn_nl_peer_set_doit() resolves the target peer via
ovpn_peer_get_by_id() before taking ovpn->lock. In the window between
the lookup (which only takes a refcount) and the subsequent
spin_lock_bh(&ovpn->lock), a concurrent OVPN_CMD_PEER_DEL, keepalive
expiry, or socket teardown can take ovpn->lock first, run
ovpn_peer_remove() to unhash the peer from all four tables (by_id,
by_vpn_addr4/6, by_transp_addr) and release the lock. set_doit then
acquires ovpn->lock and calls ovpn_peer_hash_vpn_ip(), which
re-inserts the now-removed peer back into the rehashing tables.

The same race affects the float path: ovpn_peer_endpoints_update()
holds only a refcount and acquires ovpn->lock very late (after async
AEAD decrypt and a netlink notification), then rehashes the peer
in the by_transp_addr table.

The resurrected peer becomes reachable again from the RX lookup
(ovpn_peer_get_by_transp_addr) and the TX VPN-IP lookup, even though
userspace believes it is gone. Once the data-path refcount drops the
peer is freed via call_rcu while the hash entries embedded in it
remain linked, opening a UAF window.

Bail out of the rehash when hash_entry_id is unhashed, mirroring
the sentinel already used by ovpn_peer_remove() to detect the
already-removed state. The check is safe under ovpn->lock, which
serializes every mutation of hash_entry_id, and is a no-op for the
add path because ovpn_peer_add_mp() inserts hash_entry_id before
calling ovpn_peer_hash_vpn_ip().

Fixes: 1d36a36f6d53 ("ovpn: implement peer add/get/dump/delete via netlink")
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/peer.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
  

Comments

Sabrina Dubroca May 28, 2026, 6:24 p.m. UTC | #1
2026-05-27, 01:18:42 +0200, Antonio Quartulli wrote:
> From: Antonio Quartulli <antonio@openvpn.net>
> 
> ovpn_nl_peer_set_doit() resolves the target peer via
> ovpn_peer_get_by_id() before taking ovpn->lock. In the window between
> the lookup (which only takes a refcount) and the subsequent
> spin_lock_bh(&ovpn->lock), a concurrent OVPN_CMD_PEER_DEL, keepalive
> expiry, or socket teardown can take ovpn->lock first, run
> ovpn_peer_remove() to unhash the peer from all four tables (by_id,
> by_vpn_addr4/6, by_transp_addr) and release the lock. set_doit then
> acquires ovpn->lock and calls ovpn_peer_hash_vpn_ip(), which
> re-inserts the now-removed peer back into the rehashing tables.
> 
> The same race affects the float path: ovpn_peer_endpoints_update()
> holds only a refcount and acquires ovpn->lock very late (after async
> AEAD decrypt and a netlink notification), then rehashes the peer
> in the by_transp_addr table.
> 
> The resurrected peer becomes reachable again from the RX lookup
> (ovpn_peer_get_by_transp_addr) and the TX VPN-IP lookup, even though
> userspace believes it is gone. Once the data-path refcount drops the
> peer is freed via call_rcu while the hash entries embedded in it
> remain linked, opening a UAF window.
> 
> Bail out of the rehash when hash_entry_id is unhashed, mirroring
> the sentinel already used by ovpn_peer_remove() to detect the
> already-removed state. The check is safe under ovpn->lock, which
> serializes every mutation of hash_entry_id, and is a no-op for the
> add path because ovpn_peer_add_mp() inserts hash_entry_id before
> calling ovpn_peer_hash_vpn_ip().
> 
> Fixes: 1d36a36f6d53 ("ovpn: implement peer add/get/dump/delete via netlink")
> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
> ---
>  drivers/net/ovpn/peer.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)

This looks correct to me, just one small thing:

> diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
> index a09d61296425..a472ffe3016b 100644
> --- a/drivers/net/ovpn/peer.c
> +++ b/drivers/net/ovpn/peer.c
> @@ -307,6 +307,16 @@ void ovpn_peer_endpoints_update(struct ovpn_peer *peer, struct sk_buff *skb)
>  			return;
>  		}
>  
> +		/* peer may have been concurrently removed between the caller's
> +		 * initial lookup and our acquisition of ovpn->lock; skip the
> +		 * rehash so we don't re-insert a removed peer
> +		 */
> +		if (unlikely(hlist_unhashed(&peer->hash_entry_id))) {
> +			spin_unlock_bh(&peer->lock);
> +			spin_unlock_bh(&peer->ovpn->lock);
> +			return;
> +		}

I'd add an "unlock2" label (or "unlock_both") to unlock both (maybe
just after the "hlist_nulls_add_head_rcu", before the existing
unlock). The unlock/return is a bit verbose and we have it 3 times
after this patch.
  
Antonio Quartulli May 28, 2026, 8:04 p.m. UTC | #2
On 28/05/2026 20:24, Sabrina Dubroca wrote:
>> diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
>> index a09d61296425..a472ffe3016b 100644
>> --- a/drivers/net/ovpn/peer.c
>> +++ b/drivers/net/ovpn/peer.c
>> @@ -307,6 +307,16 @@ void ovpn_peer_endpoints_update(struct ovpn_peer *peer, struct sk_buff *skb)
>>   			return;
>>   		}
>>   
>> +		/* peer may have been concurrently removed between the caller's
>> +		 * initial lookup and our acquisition of ovpn->lock; skip the
>> +		 * rehash so we don't re-insert a removed peer
>> +		 */
>> +		if (unlikely(hlist_unhashed(&peer->hash_entry_id))) {
>> +			spin_unlock_bh(&peer->lock);
>> +			spin_unlock_bh(&peer->ovpn->lock);
>> +			return;
>> +		}
> 
> I'd add an "unlock2" label (or "unlock_both") to unlock both (maybe
> just after the "hlist_nulls_add_head_rcu", before the existing
> unlock). The unlock/return is a bit verbose and we have it 3 times
> after this patch.

Yap, make sense!
I'll respin the patch with this change.

Cheers,
  

Patch

diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index a09d61296425..a472ffe3016b 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -307,6 +307,16 @@  void ovpn_peer_endpoints_update(struct ovpn_peer *peer, struct sk_buff *skb)
 			return;
 		}
 
+		/* peer may have been concurrently removed between the caller's
+		 * initial lookup and our acquisition of ovpn->lock; skip the
+		 * rehash so we don't re-insert a removed peer
+		 */
+		if (unlikely(hlist_unhashed(&peer->hash_entry_id))) {
+			spin_unlock_bh(&peer->lock);
+			spin_unlock_bh(&peer->ovpn->lock);
+			return;
+		}
+
 		/* This function may be invoked concurrently, therefore another
 		 * float may have happened in parallel: perform rehashing
 		 * using the peer->bind->remote directly as key
@@ -905,6 +915,13 @@  void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer)
 	if (peer->ovpn->mode != OVPN_MODE_MP)
 		return;
 
+	/* peer may have been concurrently removed between the caller's
+	 * initial lookup and our acquisition of ovpn->lock; skip the
+	 * rehash so we don't re-insert a removed peer
+	 */
+	if (hlist_unhashed(&peer->hash_entry_id))
+		return;
+
 	if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) {
 		/* remove potential old hashing */
 		hlist_nulls_del_init_rcu(&peer->hash_entry_addr4);