Message ID | 20241116051705.53061-1-shujifurukawa1213@gmail.com |
---|---|
State | New |
Headers | show |
Series | [Openvpn-devel] Improve shuffling algorithm of connection list | expand |
Hi, On 16/11/2024 06:17, Hurukawa2121 wrote: > --- > Improve shuffling algorithm of connection list > > This patch implements the Fisher-Yates shuffle algorithm to ensure that all permutations of the connection target list are generated with equal probability, eliminating biases present in the previous shuffling method. In the Fisher-Yates algorithm, there's only one way to obtain each permutation through a series of element swaps, so all permutations occur with equal probability in theory. > > Signed-off-by: Hurukawa2121 <shujifurukawa1213@gmail.com> We'd need a real name here, if possible, as this is a true signature telling us that you accepted the project license. > > src/openvpn/init.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/src/openvpn/init.c b/src/openvpn/init.c > index 9371024e..c4fb5cd7 100644 > --- a/src/openvpn/init.c > +++ b/src/openvpn/init.c > @@ -467,7 +467,14 @@ ce_management_query_remote(struct context *c) > #endif /* ENABLE_MANAGEMENT */ > > /* > - * Initialize and possibly randomize connection list. > + * Initialize and randomize the connection list. > + * > + * Applies the Fisher-Yates shuffle algorithm to ensure all permutations are equally probable, > + * thereby eliminating shuffling bias in the previous method. > + * > + * The algorithm randomly selects an element from the unshuffled portion and places it at position i. > + * There's only one way to obtain each permutation through these swaps. > + * This guarantees that each permutation occurs with equal probability in theory. > */ > static void > init_connection_list(struct context *c) > @@ -478,9 +485,9 @@ init_connection_list(struct context *c) > if (c->options.remote_random) > { > int i; > - for (i = 0; i < l->len; ++i) > + for (i = l->len - 1; i > 0; --i) Is the the change above truly needed to achieve what you described in the commit message? It's just changing the direction we iterate over the array, but it should not make a real difference, no? Regards, > { > - const int j = get_random() % l->len; > + const int j = get_random() % (i + 1); > if (i != j) > { > struct connection_entry *tmp;
Thanks for your reply! Please keep the mailing list in CC while replying. See below: On 18/11/2024 11:40, 古川修慈 wrote: > > We'd need a real name here, if possible, as this is a true signature > > telling us that you accepted the project license. > Thanks for your feedback. > My real name is Shuji Furukawa. > Do I have to resend the patch? Technically yes. > > > Is the the change above truly needed to achieve what you described in > the commit message? > > It's just changing the direction we iterate over the array, but it > should not make a real difference, no? > You can run this simple simulation <http://tpcg.io/5SPPET> to understand > how effective this patch is. Thanks for the link. As mentioned in my earlier comment below, I was trying to figure out if reverting the looping direction could make any difference. Not sure if you have seen that question? In any case, I just made an attempt by myself, and indeed keeping the original direction still yields the same effectiveness: http://tpcg.io/_L5Q0WX Regards,
don't forget to keep the mailing list in CC. Your reply went to me only :) If you click "Reply All" you will retain all recipients. On 18/11/2024 13:18, 古川修慈 wrote: > Thanks for your reply :) > > > > > Technically yes. > I understand. I'll resend later. > >> >> Thanks for the link. > > As mentioned in my earlier comment below, I was trying to figure out if > > reverting the looping direction could make any difference. > > > > Not sure if you have seen that question? > > I'm sorry for misunderstanding your question earlier. > > It’s not just about changing the direction of the iteration over the array. > I also updated the following lines of code: > > > > - const int j = get_random() % l->len; > > + const int j = get_random() % (i + 1); > This difference is crucial. Absolutely! My comment was only for the first line. > > > > > In any case, I just made an attempt by myself, and indeed keeping the > > original direction still yields the same effectiveness: > > > > http://tpcg.io/_L5Q0WX <http://tpcg.io/_L5Q0WX> > Yes, your observation seems correct. > This article <https://stackoverflow.com/questions/68064254/correctness- > of-fisher-yates-shuffle-executed-backward> summarizes the difference > between fisher_yates_shuffling and fisher_yates_shuffling2. > Thanks for the pointer. In any case you can also read here that what I proposed is just a variation of the original algorithm: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_%22inside-out%22_algorithm But the original version was proposed the way you wrote it. For our specific case they are equivalent. > In my opinion, fisher_yates_shuffling is easier to understand than > fisher_yates_shuffling2 because it simply ensures that we randomly > select an element only from the unshuffled portion. > On the other hand, iterating from the beginning makes it harder to > understand why it produces a uniform permutation. > That’s why I recommend iterating in the reverse direction. > I don't really have a strong opinion, but if the Fisher Yates algorithm is described this way, I think it makes sense to follow that. Feel free to resend your patch (possibly adding a "v2" in the subject so we understand that this is a new patch, i.e. "[PATCH v2]") Regards,
diff --git a/src/openvpn/init.c b/src/openvpn/init.c index 9371024e..c4fb5cd7 100644 --- a/src/openvpn/init.c +++ b/src/openvpn/init.c @@ -467,7 +467,14 @@ ce_management_query_remote(struct context *c) #endif /* ENABLE_MANAGEMENT */ /* - * Initialize and possibly randomize connection list. + * Initialize and randomize the connection list. + * + * Applies the Fisher-Yates shuffle algorithm to ensure all permutations are equally probable, + * thereby eliminating shuffling bias in the previous method. + * + * The algorithm randomly selects an element from the unshuffled portion and places it at position i. + * There's only one way to obtain each permutation through these swaps. + * This guarantees that each permutation occurs with equal probability in theory. */ static void init_connection_list(struct context *c) @@ -478,9 +485,9 @@ init_connection_list(struct context *c) if (c->options.remote_random) { int i; - for (i = 0; i < l->len; ++i) + for (i = l->len - 1; i > 0; --i) { - const int j = get_random() % l->len; + const int j = get_random() % (i + 1); if (i != j) { struct connection_entry *tmp;
--- Improve shuffling algorithm of connection list This patch implements the Fisher-Yates shuffle algorithm to ensure that all permutations of the connection target list are generated with equal probability, eliminating biases present in the previous shuffling method. In the Fisher-Yates algorithm, there's only one way to obtain each permutation through a series of element swaps, so all permutations occur with equal probability in theory. Signed-off-by: Hurukawa2121 <shujifurukawa1213@gmail.com> src/openvpn/init.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)