[Openvpn-devel] multi.c: Allow floating to a new IP right after connection setup

Message ID 20250423175843.104553-1-walter.openvpn@wjd.nu
State New
Headers show
Series [Openvpn-devel] multi.c: Allow floating to a new IP right after connection setup | expand

Commit Message

walter.openvpn--- via Openvpn-devel April 23, 2025, 5:58 p.m. UTC
From: Walter Doekes <walter+github@wjd.nu>

When you're connected to a VPN which is used as the default gateway, a
connection to a second VPN will cause a tunnel-in-tunnel. If the
administrator of the second VPN wants to avoid that, by pushing its IP
as net_gateway, this means that the client's source IP switches right
after connect:

  the source IP switches from the first-VPN-exit-IP to the
  regular-ISP-exit-IP

In openvpn 2.5 and below, this worked fine. Since openvpn 2.6, this
triggers the "Disallow float to an address taken by another client"
code. The root cause for this change of behaviour is as of yet
unexplained.

This change allows one to switch to the new IP, if it is still in an
unconnected state. That makes the use-case mentioned above work again.

Github: closes OpenVPN/openvpn#704
---
 src/openvpn/multi.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

Comments

Arne Schwabe April 24, 2025, 8:26 a.m. UTC | #1
Am 23.04.25 um 19:58 schrieb walter.openvpn--- via Openvpn-devel:
> From: Walter Doekes <walter+github@wjd.nu>
> 
> When you're connected to a VPN which is used as the default gateway, a
> connection to a second VPN will cause a tunnel-in-tunnel. If the
> administrator of the second VPN wants to avoid that, by pushing its IP
> as net_gateway, this means that the client's source IP switches right
> after connect:
> 
>    the source IP switches from the first-VPN-exit-IP to the
>    regular-ISP-exit-IP
> 
> In openvpn 2.5 and below, this worked fine. Since openvpn 2.6, this
> triggers the "Disallow float to an address taken by another client"
> code. The root cause for this change of behaviour is as of yet
> unexplained.
> 
> This change allows one to switch to the new IP, if it is still in an
> unconnected state. That makes the use-case mentioned above work again.


I still do not really understanding what network setup you have to 
trigger this kind of bizzare behaviour of having a VPN connection.
It would really be helpful if you can describe that more. Since the 
current explainations do not really explain for me why there is already 
an existing half-established connection.

The scenario you are fixing seem like:

  * somehow a connection attempt is done from IP A
  * a connection is fully established for IP B
  * connection for IP B tries to move to IP A


You are also saying that there is a difference in OpenVPN 2.5 vs OPneVPN 
2.6. If that is indeed the case, a git bisect would be helpful to point 
to the commit that is breaking this.

 > msg(M_INFO, "peer %" PRIu32 " (%s) floating from %s to %s (m2 still 
setting up) state=%d/%d",

A user has no idea what m2 is. This message more confusing than helpful.


> +	/* if the new connection is fresh and the old one is already connected, this
> +	 * might be a legitimate move to a new IP by the original client;
> +	 * for example when the server IP is pushed as net_gateway to escape from
> +	 * a double VPN. */
> +	if (m1->multi_state == CAS_CONNECT_DONE && m2->multi_state == CAS_NOT_CONNECTED
> +		&& m1->locked_cert_hash_set && !m2->locked_cert_hash_set)

The code doesn't really do what the comment does. If you were trying to 
actually figure out if it is the same client you should rather compare 
the session ID of the tls state.

Also I think this code looks on the first glance to be broken in the way 
that an authorized client could just send a bunch of data packets with 
faked sender ip and then kill of other connections that are in the 
progress of connection.

Arne
Walter Doekes April 24, 2025, 9:08 a.m. UTC | #2
On 24-04-2025 10:26, Arne Schwabe wrote:
> I still do not really understanding what network setup you have to 
> trigger this kind of bizzare behaviour of having a VPN connection.
> It would really be helpful if you can describe that more. Since the 
> current explainations do not really explain for me why there is already 
> an existing half-established connection.
> 
> The scenario you are fixing seem like:
> 
>   * somehow a connection attempt is done from IP A
>   * a connection is fully established for IP B
>   * connection for IP B tries to move to IP A

If you take into account that 2.5 works just fine with this scenario, I 
don't see why you would call this bizarre.

But I'll break it down some more:

- I have internet, exit IP: 1.1.1.10.

- I have a VPN, CorporateVPN at 2.2.2.1. I use it as my default route. 
CorporateVPN has one exit IP: 2.2.2.10.

- There is a second VPN server, SecondVPN, at 3.3.3.1, which I'd like to 
connect to.


 From the perspective of SecondVPN, CorporateVPN is connecting to it:

- 2.2.2.10 -> 3.3.3.1 - openvpn UDP stuff, all works


But, in fact, this is tunneled traffic. There is also:

- 1.1.1.10 -> 2.2.2.1 - here traffic is doubly encrypted


So, I'd like the traffic to go directly instead, to lose the double 
encryption and the extra latency:

- 1.1.1.10 -> 3.3.3.1


To do that, I tell SecondVPN to push this route:

   push "route 3.3.3.1 255.255.255.255 net_gateway"


This causes my machine to create a direct route, bypassing the CorporateVPN.

- 1.1.1.10 -> 3.3.3.1 - now goes directly without the double tunnel


This seems to me like a legit use of net_gateway.

What happens is that SecondVPN sees traffic that previously came from 
2.2.2.10 now suddenly comes from 1.1.1.10. And this IP change is 
detected by openvpn as a "floating IP".

- In OpenVPN 2.5 this worked like a charm.

- In OpenVPN 2.6 it doesn't, as mentioned in the Github ticket.

I've reliably reproduced this several times, as there are a bunch of 
OpenVPN 2.5's that I've connected to that do not exhibit the problematic 
behaviour, while the OpenVPN 2.6's do (both 2.6.3 bookworm and 2.6.12 
noble).

Is this a sufficiently clear explanation?


> You are also saying that there is a difference in OpenVPN 2.5 vs OPneVPN 
> 2.6. If that is indeed the case, a git bisect would be helpful to point 
> to the commit that is breaking this.

This is a fair request. But bisecting will take me considerably more 
time, so this wasn't my first priority.

I did look into all the changes, and it indeed looked like the "floating 
IP detection" was untouched, while there are some connection setup 
speedup (shortcut?) changes. My suspicion is that the quicker connection 
setup has something to do with the changed behaviour.

In due time, I can get to bisecting. But if the information in this 
ticket and workaround is sufficient for someone else to say "A-HA", then 
that would be my preferred option.


>  > msg(M_INFO, "peer %" PRIu32 " (%s) floating from %s to %s (m2 still 
> setting up) state=%d/%d",
> 
> A user has no idea what m2 is. This message more confusing than helpful.

I'll look into rephrasing when making a new patch.


>> +    /* if the new connection is fresh and the old one is already 
>> connected, this
>> +     * might be a legitimate move to a new IP by the original client;
>> +     * for example when the server IP is pushed as net_gateway to 
>> escape from
>> +     * a double VPN. */
>> +    if (m1->multi_state == CAS_CONNECT_DONE && m2->multi_state == 
>> CAS_NOT_CONNECTED
>> +        && m1->locked_cert_hash_set && !m2->locked_cert_hash_set)
> 
> The code doesn't really do what the comment does. If you were trying to 
> actually figure out if it is the same client you should rather compare 
> the session ID of the tls state.
> 
> Also I think this code looks on the first glance to be broken in the way 
> that an authorized client could just send a bunch of data packets with 
> faked sender ip and then kill of other connections that are in the 
> progress of connection.

Given your experience with openvpn, I betting you're right on these points.

I don't know where the "tls state session ID" is and if it's there. I'm 
dealing with OpenVPN 2.5 clients (no older ones, I hope).

If you can point me in the right direction, I'd be grateful. This is my 
first venture into openvpn code.


Cheers,
Walter Doekes
OSSO B.V.

Patch

diff --git a/src/openvpn/multi.c b/src/openvpn/multi.c
index a2d3fd10..8a219ef2 100644
--- a/src/openvpn/multi.c
+++ b/src/openvpn/multi.c
@@ -3236,8 +3236,22 @@  multi_process_float(struct multi_context *m, struct multi_instance *mi,
         struct tls_multi *m1 = mi->context.c2.tls_multi;
         struct tls_multi *m2 = ex_mi->context.c2.tls_multi;
 
+	/* if the new connection is fresh and the old one is already connected, this
+	 * might be a legitimate move to a new IP by the original client;
+	 * for example when the server IP is pushed as net_gateway to escape from
+	 * a double VPN. */
+	if (m1->multi_state == CAS_CONNECT_DONE && m2->multi_state == CAS_NOT_CONNECTED
+		&& m1->locked_cert_hash_set && !m2->locked_cert_hash_set)
+        {
+            msg(M_INFO, "peer %" PRIu32 " (%s) floating from %s to %s (m2 still setting up) state=%d/%d",
+                m1->peer_id,
+                tls_common_name(m1, false),
+                mroute_addr_print(&mi->real, &gc),
+                print_link_socket_actual(&m->top.c2.from, &gc),
+                m1->multi_state, m2->multi_state);
+        }
         /* do not float if target address is taken by client with another cert */
-        if (!cert_hash_compare(m1->locked_cert_hash_set, m2->locked_cert_hash_set))
+        else if (!cert_hash_compare(m1->locked_cert_hash_set, m2->locked_cert_hash_set))
         {
             msg(D_MULTI_LOW, "Disallow float to an address taken by another client %s",
                 multi_instance_string(ex_mi, false, &gc));