[Openvpn-devel,RFC] Backport request: 7791f535 (#919 fix) to release/2.6 — DCO peer-table desync
Commit Message
Hello,
Full disclosure: I used Claude for a lot of this work including the text
below. That said, I was very thorough in all the testing I've done here and
been careful to verify the bugs and was careful to read through the details
below. But I don't want to pretend like there weren't robots involved...
7791f535 ("dco: process messages immediately after read", fixes #919) is in
master and release/2.7, but not in release/2.6. 2.6.x DCO server deployments
are still exposed to the underlying bug, and it bites hard at scale. I'd
like
to ask whether you'd consider a release/2.6 backport -- and, since it
can't be
a straight cherry-pick, what form you'd prefer. I have a validated
adaptation
I can submit via Gerrit (attached here for reference).
(Context: Debian stable / trixie ships 2.6.14 + the out-of-tree ovpn-dco
module, so this affects the current Debian stable DCO server users.)
IMPACT ON 2.6.x
---------------
On a busy DCO server, when many peers are deleted in a short window --
e.g. a
transient that makes a large number of peers time out near-simultaneously --
libnl delivers multiple netlink messages per nl_recvmsgs(), but the 2.6 DCO
read path stores each notification into single-slot dco_context_t fields and
only processes the last, silently dropping the rest (exactly as #919
describes). Userspace then never reaps those client instances ->
n_clients is
never decremented -> the instances leak. Eventually the server hits
--max-clients and refuses new/reconnecting clients until openvpn is
restarted.
Reproduced on stock 2.6.14:
- 1024 DCO clients connected; server under CPU load (I used
"stress-ng --cpu 64" to simulate a reconnection storm); then block the
server's inbound VPN packets at the firewall to simulate a network
outage, which makes all peers hit their --keepalive timeout at close to
the same time.
- The kernel expires and deletes all the peers, but openvpn drops some of
the DEL_PEER notifications and leaks the corresponding client instances
(never reaped, so n_clients stays high). The number leaked is
nondeterministic -- it depends on how many notifications get coalesced
into a single nl_recvmsgs() read, so it varies run-to-run and with load
(it is not a fixed count; as one example, a run left 767 of 1024
instances stale).
- Those leaked instances still occupy --max-clients slots, so once enough
accumulate, reconnecting clients are rejected with "MULTI: new incoming
connection would exceed maximum number of clients", and the only
recovery
is restarting openvpn. (Both the management interface and the
status file
report the leaked count -- they read the same multi_context client
list.)
A CLEAN CHERRY-PICK IS NOT POSSIBLE
-----------------------------------
I verified 7791f535 and its prerequisites do not apply to release/2.6:
- a699681b ("dco: only pass struct context to init function" --
introduces
dco->c) and 7f5a6dea ("multi: store multi_context address inside top
instance" -- introduces c->multi): 7791f535 relies on both
(dco->c->mode
/ dco->c->multi), and neither is in release/2.6. a699681b alone
conflicts
in 5 files against v2.6.14 (the ovpn_dco_init() signature differs --
release/2.6 predates the dev_node parameter).
- 7791f535 itself: conflicts in 7 files -- release/2.6 has no multi_io.c,
uses the pre-rename netlink API (OVPN_CMD_DEL_PEER vs
PEER_DEL_NTF), and a
different multi_process_incoming_dco() signature.
So the intervening 2.7 DCO / new-module work sits between 2.6 and these
commits; any 2.6 backport requires a genuine adaptation.
THE ADAPTATION I HAVE (VALIDATED)
---------------------------------
The attached patch corresponds most closely to 7791f535 -- that is the
commit
to diff it against. The two prerequisites (a699681b, 7f5a6dea) are not
carried
as such; their effect (dco->c, c->multi) is inlined as the c/multi
back-pointers described below. The patch mirrors 7791f535's logic
against the
old ovpn-dco API:
- Move per-message processing into the libnl parse callback
(ovpn_handle_msg), so each notification is acted on before the next
overwrites the single-slot fields -- the core of 7791f535.
- Port the __is_locked guard around nl_recvmsgs() (checked in the
GET_PEER
stats senders) to avoid the re-entrant stats request during parsing.
- Because 2.6 lacks dco->c (a699681b) and c->multi (7f5a6dea), add
c/multi
back-pointers to dco_context_t, set at the three event-loop call sites
(mtcp.c, mudp.c, forward.c), and dispatch server-vs-client on
dco->multi.
Validated on a load rig (2.6.14 + the buggy out-of-tree module rebuilt
against
6.12): the desync is gone -- repeated 1024-peer deletion storms under
stress-ng drain to 0 with no stale instances, reconnect to 1024 with zero
max-clients rejections, no regressions, survives reboot.
A note on the attached patch's format: it is a Debian quilt patch (with a
DEP-3 header), because I developed and validated it directly against the
current Debian openvpn 2.6.14 source package -- i.e. it applies on top
of that
package's existing patch series, which is what 2.6.x users actually run.
That
is purely how it was produced/tested; I'm happy to reshape it as a plain
git format-patch against release/2.6 for Gerrit.
QUESTIONS
---------
1. Would you accept a release/2.6 backport of #919's fix?
2. If so, which approach do you prefer:
(a) also backport the prerequisites (a699681b for dco->c and
7f5a6dea for
c->multi) so the result is closer to upstream, or
(b) a minimal self-contained adaptation (like mine) that adds just the
c/multi back-pointers locally?
3. I'm happy to submit it through Gerrit in whatever shape you'd want and to
iterate on review -- I'm aware it's a hand-port and would value
DCO-maintainer eyes on the back-pointer lifetime / dispatch / lock
placement.
The patch is attached; full validation details (logs etc.) available on
request.
REFERENCES (commit diffs, for comparison against the attached patch)
--------------------------------------------------------------------
#919 (the bug):
https://github.com/OpenVPN/openvpn/issues/919
7791f535 -- dco: process messages immediately after read (THE commit this
patch corresponds to; diff the attached patch against this one):
https://github.com/OpenVPN/openvpn/commit/7791f5358a5574d4ef1bd27e2d52300c9d98bd72
(Gerrit: https://gerrit.openvpn.net/c/openvpn/+/1403 )
a699681b -- dco: only pass struct context to init function (prerequisite;
introduces dco->c, inlined as a back-pointer rather than carried):
https://github.com/OpenVPN/openvpn/commit/a699681bb86c6e9a2c9f205543f60400208aea4b
7f5a6dea -- multi: store multi_context address inside top instance
(prerequisite; introduces c->multi, inlined as a back-pointer):
https://github.com/OpenVPN/openvpn/commit/7f5a6deae33a338a23d7e8ff8526db8fdddf4bc2
Thanks,
Thomas Nyberg
Description: dco: process messages immediately after read
Backport of upstream 7791f5358a5574d4ef1bd27e2d52300c9d98bd72. libnl delivers
multiple netlink messages per nl_recvmsgs() call, but the DCO read path stored
each into single-slot dco_context_t fields and processed only the last,
silently dropping the rest. Under a peer-deletion storm this desyncs openvpn
client table from the kernel (ghost instances -> max-clients lockout). Process
each message inside the parse callback, with a lock to avoid re-entrant
GET_PEER stats requests during parsing. Adapted to 2.6.x (adds c/multi
backpointers to dco_context_t, which 2.7 has natively).
Origin: backport, https://github.com/OpenVPN/openvpn/commit/7791f5358a5574d4ef1bd27e2d52300c9d98bd72
Bug: https://github.com/OpenVPN/openvpn/issues/919
Forwarded: not-needed
---
===================================================================
@@ -134,7 +134,7 @@ void close_tun_dco(struct tuntap *tt, op
* @param dco the DCO context
* @return 0 on success or a negative error code otherwise
*/
-int dco_do_read(dco_context_t *dco);
+int dco_read_and_process(dco_context_t *dco);
/**
* Install a DCO in the main event loop
@@ -301,7 +301,7 @@ close_tun_dco(struct tuntap *tt, openvpn
}
static inline int
-dco_do_read(dco_context_t *dco)
+dco_read_and_process(dco_context_t *dco)
{
ASSERT(false);
return 0;
===================================================================
@@ -40,6 +40,7 @@
#include "ssl.h"
#include "fdmisc.h"
#include "multi.h"
+#include "forward.h"
#include "ssl_verify.h"
#include "ovpn_dco_linux.h"
@@ -51,6 +52,15 @@
#include <netlink/genl/ctrl.h>
+/* When parsing multiple DEL_PEER notifications, openvpn tries to request stats
+ * for each DEL_PEER message. This triggers a GET_PEER request-reply while we
+ * are still parsing the rest of the initial notifications, which can lead to
+ * NLE_BUSY or even NLE_NOMEM. This basic lock ensures we don't bite our own
+ * tail by issuing a dco_get_peer_stats while still busy receiving and parsing
+ * other messages. (backport of upstream 7791f535)
+ */
+static bool __is_locked = false;
+
/* libnl < 3.5.0 does not set the NLA_F_NESTED on its own, therefore we
* have to explicitly do it to prevent the kernel from failing upon
* parsing of the message
@@ -131,7 +141,9 @@ nla_put_failure:
static int
ovpn_nl_recvmsgs(dco_context_t *dco, const char *prefix)
{
+ __is_locked = true;
int ret = nl_recvmsgs(dco->nl_sock, dco->nl_cb);
+ __is_locked = false;
switch (ret)
{
@@ -813,11 +825,22 @@ ovpn_handle_msg(struct nl_msg *msg, void
return NL_SKIP;
}
+ /* process the message immediately, before the next one parsed in the same
+ * nl_recvmsgs() batch overwrites the single-slot dco fields */
+ if (dco->multi)
+ {
+ multi_process_incoming_dco(dco);
+ }
+ else
+ {
+ process_incoming_dco(dco);
+ }
+
return NL_OK;
}
int
-dco_do_read(dco_context_t *dco)
+dco_read_and_process(dco_context_t *dco)
{
msg(D_DCO_DEBUG, __func__);
nl_cb_set(dco->nl_cb, NL_CB_VALID, NL_CB_CUSTOM, ovpn_handle_msg, dco);
@@ -925,6 +948,12 @@ dco_get_peer_stats_multi(dco_context_t *
{
msg(D_DCO_DEBUG, "%s", __func__);
+ if (__is_locked)
+ {
+ msg(D_DCO_DEBUG, "%s: cannot request peer stats while parsing other messages", __func__);
+ return 0;
+ }
+
struct nl_msg *nl_msg = ovpn_dco_nlmsg_create(dco, OVPN_CMD_GET_PEER);
nlmsg_hdr(nl_msg)->nlmsg_flags |= NLM_F_DUMP;
@@ -979,6 +1008,12 @@ dco_parse_peer(struct nl_msg *msg, void
int
dco_get_peer_stats(struct context *c)
{
+ if (__is_locked)
+ {
+ msg(D_DCO_DEBUG, "%s: cannot request peer stats while parsing other messages", __func__);
+ return 0;
+ }
+
uint32_t peer_id = c->c2.tls_multi->dco_peer_id;
msg(D_DCO_DEBUG, "%s: peer-id %d", __func__, peer_id);
===================================================================
@@ -36,6 +36,8 @@
typedef enum ovpn_key_slot dco_key_slot_t;
typedef enum ovpn_cipher_alg dco_cipher_t;
+struct context;
+struct multi_context;
typedef struct
{
@@ -50,6 +52,11 @@ typedef struct
unsigned int ifindex;
+ /* backpointers set before each read so the parse callback can process
+ * a message immediately (backport of upstream 7791f535) */
+ struct context *c;
+ struct multi_context *multi;
+
int dco_message_type;
int dco_message_peer_id;
int dco_del_peer_reason;
===================================================================
@@ -1234,13 +1234,11 @@ process_incoming_link(struct context *c)
perf_pop();
}
-static void
-process_incoming_dco(struct context *c)
+void
+process_incoming_dco(dco_context_t *dco)
{
#if defined(ENABLE_DCO) && (defined(TARGET_LINUX) || defined(TARGET_FREEBSD))
- dco_context_t *dco = &c->c1.tuntap->dco;
-
- dco_do_read(dco);
+ struct context *c = dco->c;
/* FreeBSD currently sends us removal notifcation with the old peer-id in
* p2p mode with the ping timeout reason, so ignore that one to not shoot
@@ -2309,7 +2307,9 @@ process_io(struct context *c)
{
if (!IS_SIG(c))
{
- process_incoming_dco(c);
+ c->c1.tuntap->dco.c = c;
+ c->c1.tuntap->dco.multi = NULL;
+ dco_read_and_process(&c->c1.tuntap->dco);
}
}
}
===================================================================
@@ -74,6 +74,13 @@ void pre_select(struct context *c);
void process_io(struct context *c);
+/**
+ * Process an incoming DCO message (from kernel space).
+ *
+ * @param dco - Pointer to the structure representing the DCO context.
+ */
+void process_incoming_dco(dco_context_t *dco);
+
/**********************************************************************/
/**
* Process a data channel packet that will be sent through a VPN tunnel.
===================================================================
@@ -747,7 +747,9 @@ multi_tcp_process_io(struct multi_contex
/* incoming data on DCO? */
else if (e->arg == MTCP_DCO)
{
- multi_process_incoming_dco(m);
+ m->top.c1.tuntap->dco.multi = m;
+ m->top.c1.tuntap->dco.c = &m->top;
+ dco_read_and_process(&m->top.c1.tuntap->dco);
}
#endif
/* signal received? */
===================================================================
@@ -409,11 +409,9 @@ multi_process_io_udp(struct multi_contex
{
if (!IS_SIG(&m->top))
{
- bool ret = true;
- while (ret)
- {
- ret = multi_process_incoming_dco(m);
- }
+ m->top.c1.tuntap->dco.multi = m;
+ m->top.c1.tuntap->dco.c = &m->top;
+ dco_read_and_process(&m->top.c1.tuntap->dco);
}
}
#endif
===================================================================
@@ -3275,14 +3275,12 @@ process_incoming_del_peer(struct multi_c
multi_signal_instance(m, mi, SIGTERM);
}
-bool
-multi_process_incoming_dco(struct multi_context *m)
+void
+multi_process_incoming_dco(dco_context_t *dco)
{
- dco_context_t *dco = &m->top.c1.tuntap->dco;
+ ASSERT(dco->multi);
- struct multi_instance *mi = NULL;
-
- int ret = dco_do_read(&m->top.c1.tuntap->dco);
+ struct multi_context *m = dco->multi;
int peer_id = dco->dco_message_peer_id;
@@ -3291,12 +3289,12 @@ multi_process_incoming_dco(struct multi_
*/
if (peer_id < 0)
{
- return ret > 0;
+ return;
}
if ((peer_id < m->max_clients) && (m->instances[peer_id]))
{
- mi = m->instances[peer_id];
+ struct multi_instance *mi = m->instances[peer_id];
if (dco->dco_message_type == OVPN_CMD_DEL_PEER)
{
process_incoming_del_peer(m, mi, dco);
@@ -3325,13 +3323,6 @@ multi_process_incoming_dco(struct multi_
"type %d, del_peer_reason %d", peer_id, dco->dco_message_type,
dco->dco_del_peer_reason);
}
-
- dco->dco_message_type = 0;
- dco->dco_message_peer_id = -1;
- dco->dco_del_peer_reason = -1;
- dco->dco_read_bytes = 0;
- dco->dco_write_bytes = 0;
- return ret > 0;
}
#endif /* if defined(ENABLE_DCO) && defined(TARGET_LINUX) */
===================================================================
@@ -323,7 +323,7 @@ bool multi_process_post(struct multi_con
* - True, if the message was received correctly.
* - False, if there was an error while reading the message.
*/
-bool multi_process_incoming_dco(struct multi_context *m);
+void multi_process_incoming_dco(dco_context_t *dco);
/**************************************************************************/
/**