[Openvpn-devel,v3] Fix best gateway selection over netlink

Message ID 20200908123625.23179-1-themiron@yandex-team.ru
State Accepted
Headers show
Series
  • [Openvpn-devel,v3] Fix best gateway selection over netlink
Related show

Commit Message

Vladislav Grishenko Sept. 8, 2020, 12:36 p.m.
Netlink route request with NLM_F_DUMP flag set means to return
all entries matching criteria passed in message content -
matching supplied family & dst address in our case.
So, gateway from the first ipv4 route was always used.

On kernels earlier than 2.6.38 default routes are the last ones,
so arbitrary host/net route w/o gateway is likely be returned as
first, causing gateway to be invalid or empty.
After refactoring in 2.6.38 kernel default routes are on top, so
the problem with older kernels was hidden.

Fix this behavior by selecting first 0.0.0.0/0 if dst was not set
or empty. For IPv6, no behavior is changed - request ::/128 route,
so just clarify the sizes via netlink route api.

Tested on 5.4.0, 4.1.51, 2.6.36 and 2.6.22 kernels.

Signed-off-by: Vladislav Grishenko <themiron@yandex-team.ru>
---
 doc/man-sections/advanced-options.rst |  7 +++--
 src/openvpn/networking_sitnl.c        | 44 ++++++++++++++++++++++++---
 2 files changed, 45 insertions(+), 6 deletions(-)

Comments

David Sommerseth Sept. 8, 2020, 1:08 p.m. | #1
On 08/09/2020 14:36, Vladislav Grishenko wrote:
> On kernels earlier than 2.6.38 default routes are the last ones,
> so arbitrary host/net route w/o gateway is likely be returned as
> first, causing gateway to be invalid or empty.
> After refactoring in 2.6.38 kernel default routes are on top, so
> the problem with older kernels was hidden.

I haven't paid too much attention here, but I don't think I've seen this point
being brought up.  But do we really care about such old kernels at all?

AFAIK, RHEL-6 (which goes EOL in November this year and which is not planned
to be supported in OpenVPN 2.5+) is the only distro carrying such an old
kernel release (2.6.32 baseline).  Even an internal OpenWRT 19.07 box of mine
(which should be upgraded, I know!) ships with 4.14.  Unless I'm completely
clueless (which is a possibility), 2.4 and 2.6 kernels are mostly interesting
for boards with 4MB flash memory.  And I would suspect such boards with that
little flash memory to belong to that past.  (And OpenVPN 2.4 is perfectly
fine too for some time forward anyhow, which should work just fine).

If this fix has other qualities, then it's fine to consider such a patch.  But
I don't see the need for this if it is primarily to enable support for ancient
kernel releases which are no longer supported by the upstream kernel community
(where 4.4 is the oldest one).

I would lean on what Antonio says here as well, as he kinda owns the sitnl
implementation and API.
Antonio Quartulli Sept. 8, 2020, 1:24 p.m. | #2
Hi,

On 08/09/2020 15:08, David Sommerseth wrote:
> On 08/09/2020 14:36, Vladislav Grishenko wrote:
>> On kernels earlier than 2.6.38 default routes are the last ones,
>> so arbitrary host/net route w/o gateway is likely be returned as
>> first, causing gateway to be invalid or empty.
>> After refactoring in 2.6.38 kernel default routes are on top, so
>> the problem with older kernels was hidden.
> 
> I haven't paid too much attention here, but I don't think I've seen this point
> being brought up.  But do we really care about such old kernels at all?
> 
> AFAIK, RHEL-6 (which goes EOL in November this year and which is not planned
> to be supported in OpenVPN 2.5+) is the only distro carrying such an old
> kernel release (2.6.32 baseline).  Even an internal OpenWRT 19.07 box of mine
> (which should be upgraded, I know!) ships with 4.14.  Unless I'm completely
> clueless (which is a possibility), 2.4 and 2.6 kernels are mostly interesting
> for boards with 4MB flash memory.  And I would suspect such boards with that
> little flash memory to belong to that past.  (And OpenVPN 2.4 is perfectly
> fine too for some time forward anyhow, which should work just fine).

Well, this is an actual flaw in the sitnl code.
It now works, but it may break at some point (it is apparently relying
on a self-made assumption which may not hold true in the future).

For this reason sitnl *needs* fixing.

On the other hand, helping out people that are stuck on old platforms to
upgrade to modern (aka more secure) OpenVPN versions is also not a bad idea.

Since the fix is relatively small, I'd consider it for inclusion.

I'll try to review it ASAP so that it can be considered for going into
the next beta (2.5 compiles with sitnl by default - this is how this
issue was exposed)


Regards,
Antonio Quartulli Sept. 10, 2020, 9:02 a.m. | #3
Hi,

On 08/09/2020 14:36, Vladislav Grishenko wrote:
> Netlink route request with NLM_F_DUMP flag set means to return
> all entries matching criteria passed in message content -
> matching supplied family & dst address in our case.
> So, gateway from the first ipv4 route was always used.
> 
> On kernels earlier than 2.6.38 default routes are the last ones,
> so arbitrary host/net route w/o gateway is likely be returned as
> first, causing gateway to be invalid or empty.
> After refactoring in 2.6.38 kernel default routes are on top, so
> the problem with older kernels was hidden.
> 
> Fix this behavior by selecting first 0.0.0.0/0 if dst was not set
> or empty. For IPv6, no behavior is changed - request ::/128 route,
> so just clarify the sizes via netlink route api.
> 
> Tested on 5.4.0, 4.1.51, 2.6.36 and 2.6.22 kernels.
> 
> Signed-off-by: Vladislav Grishenko <themiron@yandex-team.ru>

Thanks for taking care of this issue and for digging into the sitnl code.

The change is really contained and and easy to review.
Tested a bit and it works as expected.

Acked-by: Antonio Quartulli <a@unstable.cc>
Gert Doering Sept. 10, 2020, 9:23 a.m. | #4
Thanks a lot.

Your patch has been applied to the master and release/2.5 branch.

(I have lightly tested this on Linux 4.19.9 and stared a bit
at the code.  Antonio is the master of this code and if he says
it's good, it is :-) )

commit 505d5ad8fadcdc56bae07f4b95c05acd93a47c24 (master)
commit 7a9fefb6e2ea28967a09956eeb041b687ca0e335 (release/2.5)
Author: Vladislav Grishenko
Date:   Tue Sep 8 17:36:25 2020 +0500

     Fix best gateway selection over netlink

     Signed-off-by: Vladislav Grishenko <themiron@yandex-team.ru>
     Acked-by: Antonio Quartulli <a@unstable.cc>
     Message-Id: <20200908123625.23179-1-themiron@yandex-team.ru>
     URL: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg20900.html
     Signed-off-by: Gert Doering <gert@greenie.muc.de>


--
kind regards,

Gert Doering
Vladislav Grishenko Sept. 11, 2020, 9:02 a.m. | #5
Hi, Antonio
Thank you for review

--
Best Regards, Vladislav Grishenko

> -----Original Message-----
> From: Antonio Quartulli <a@unstable.cc>
> Sent: Thursday, September 10, 2020 2:02 PM
> To: Vladislav Grishenko <themiron@yandex-team.ru>; openvpn-
> devel@lists.sourceforge.net
> Subject: Re: [Openvpn-devel] [PATCH v3] Fix best gateway selection over netlink
> 
> Hi,
> 
> On 08/09/2020 14:36, Vladislav Grishenko wrote:
> > Netlink route request with NLM_F_DUMP flag set means to return all
> > entries matching criteria passed in message content - matching
> > supplied family & dst address in our case.
> > So, gateway from the first ipv4 route was always used.
> >
> > On kernels earlier than 2.6.38 default routes are the last ones, so
> > arbitrary host/net route w/o gateway is likely be returned as first,
> > causing gateway to be invalid or empty.
> > After refactoring in 2.6.38 kernel default routes are on top, so the
> > problem with older kernels was hidden.
> >
> > Fix this behavior by selecting first 0.0.0.0/0 if dst was not set or
> > empty. For IPv6, no behavior is changed - request ::/128 route, so
> > just clarify the sizes via netlink route api.
> >
> > Tested on 5.4.0, 4.1.51, 2.6.36 and 2.6.22 kernels.
> >
> > Signed-off-by: Vladislav Grishenko <themiron@yandex-team.ru>
> 
> Thanks for taking care of this issue and for digging into the sitnl code.
> 
> The change is really contained and and easy to review.
> Tested a bit and it works as expected.
> 
> Acked-by: Antonio Quartulli <a@unstable.cc>
> 
> --
> Antonio Quartulli
>
Vladislav Grishenko Sept. 11, 2020, 9:03 a.m. | #6
Hi Gert,

Great, many thanks

--
Best Regards, Vladislav Grishenko

> -----Original Message-----
> From: Gert Doering <gert@greenie.muc.de>
> Sent: Thursday, September 10, 2020 2:23 PM
> To: Vladislav Grishenko <themiron@yandex-team.ru>
> Cc: openvpn-devel@lists.sourceforge.net
> Subject: [PATCH applied] Re: Fix best gateway selection over netlink
> 
> Thanks a lot.
> 
> Your patch has been applied to the master and release/2.5 branch.
> 
> (I have lightly tested this on Linux 4.19.9 and stared a bit at the code.
Antonio is
> the master of this code and if he says it's good, it is :-) )
> 
> commit 505d5ad8fadcdc56bae07f4b95c05acd93a47c24 (master) commit
> 7a9fefb6e2ea28967a09956eeb041b687ca0e335 (release/2.5)
> Author: Vladislav Grishenko
> Date:   Tue Sep 8 17:36:25 2020 +0500
> 
>      Fix best gateway selection over netlink
> 
>      Signed-off-by: Vladislav Grishenko <themiron@yandex-team.ru>
>      Acked-by: Antonio Quartulli <a@unstable.cc>
>      Message-Id: <20200908123625.23179-1-themiron@yandex-team.ru>
>      URL: https://www.mail-archive.com/openvpn-
> devel@lists.sourceforge.net/msg20900.html
>      Signed-off-by: Gert Doering <gert@greenie.muc.de>
> 
> 
> --
> kind regards,
> 
> Gert Doering

Patch

diff --git a/doc/man-sections/advanced-options.rst b/doc/man-sections/advanced-options.rst
index 9b96e406..bedc8841 100644
--- a/doc/man-sections/advanced-options.rst
+++ b/doc/man-sections/advanced-options.rst
@@ -11,8 +11,11 @@  Standalone Debug Options
      --show-gateway
      --show-gateway IPv6-target
 
-  If an IPv6 target address is passed as argument, the IPv6 route for this
-  host is reported.
+  For IPv6 this queries the route towards ::/128, or the specified IPv6
+  target address if passed as argument.
+  For IPv4 on Linux, Windows, MacOS and BSD it looks for a 0.0.0.0/0 route.
+  If there are more specific routes, the result will not always be matching
+  the route of the IPv4 packets to the VPN gateway.
 
 
 Advanced Expert Options
diff --git a/src/openvpn/networking_sitnl.c b/src/openvpn/networking_sitnl.c
index 713a213a..2bc70a50 100644
--- a/src/openvpn/networking_sitnl.c
+++ b/src/openvpn/networking_sitnl.c
@@ -345,6 +345,13 @@  sitnl_send(struct nlmsghdr *payload, pid_t peer, unsigned int groups,
  *               continue;
  *           }
  */
+
+            if (h->nlmsg_type == NLMSG_DONE)
+            {
+                ret = 0;
+                goto out;
+            }
+
             if (h->nlmsg_type == NLMSG_ERROR)
             {
                 err = (struct nlmsgerr *)NLMSG_DATA(h);
@@ -360,7 +367,11 @@  sitnl_send(struct nlmsghdr *payload, pid_t peer, unsigned int groups,
                         ret = 0;
                         if (cb)
                         {
-                            ret = cb(h, arg_cb);
+                            int r = cb(h, arg_cb);
+                            if (r <= 0)
+                            {
+                                ret = r;
+                            }
                         }
                     }
                     else
@@ -375,8 +386,12 @@  sitnl_send(struct nlmsghdr *payload, pid_t peer, unsigned int groups,
 
             if (cb)
             {
-                ret = cb(h, arg_cb);
-                goto out;
+                int r = cb(h, arg_cb);
+                if (r <= 0)
+                {
+                    ret = r;
+                    goto out;
+                }
             }
             else
             {
@@ -410,6 +425,7 @@  typedef struct {
     int addr_size;
     inet_address_t gw;
     char iface[IFNAMSIZ];
+    bool default_only;
 } route_res_t;
 
 static int
@@ -421,6 +437,12 @@  sitnl_route_save(struct nlmsghdr *n, void *arg)
     int len = n->nlmsg_len - NLMSG_LENGTH(sizeof(*r));
     unsigned int ifindex = 0;
 
+    /* filter-out non-zero dst prefixes */
+    if (res->default_only && r->rtm_dst_len != 0)
+    {
+        return 1;
+    }
+
     while (RTA_OK(rta, len))
     {
         switch (rta->rta_type)
@@ -477,11 +499,25 @@  sitnl_route_best_gw(sa_family_t af_family, const inet_address_t *dst,
     {
         case AF_INET:
             res.addr_size = sizeof(in_addr_t);
-            req.n.nlmsg_flags |= NLM_F_DUMP;
+            /*
+             * kernel can't return 0.0.0.0/8 host route, dump all
+             * the routes and filter for 0.0.0.0/0 in cb()
+             */
+            if (!dst || !dst->ipv4)
+            {
+                req.n.nlmsg_flags |= NLM_F_DUMP;
+                res.default_only = true;
+            }
+            else
+            {
+                req.r.rtm_dst_len = 32;
+            }
             break;
 
         case AF_INET6:
             res.addr_size = sizeof(struct in6_addr);
+            /* kernel can return ::/128 host route */
+            req.r.rtm_dst_len = 128;
             break;
 
         default: