[Openvpn-devel] Improve shuffling algorithm of connection list

Message ID 20241116051705.53061-1-shujifurukawa1213@gmail.com
State New
Headers show
Series [Openvpn-devel] Improve shuffling algorithm of connection list | expand

Commit Message

Hurukawa2121 Nov. 16, 2024, 5:17 a.m. UTC
---
 Improve shuffling algorithm of connection list

 This patch implements the Fisher-Yates shuffle algorithm to ensure that all permutations of the connection target list are generated with equal probability, eliminating biases present in the previous shuffling method. In the Fisher-Yates algorithm, there's only one way to obtain each permutation through a series of element swaps, so all permutations occur with equal probability in theory. 
 
 Signed-off-by: Hurukawa2121 <shujifurukawa1213@gmail.com>

 src/openvpn/init.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Comments

Antonio Quartulli Nov. 18, 2024, 9:33 a.m. UTC | #1
Hi,

On 16/11/2024 06:17, Hurukawa2121 wrote:
> ---
>   Improve shuffling algorithm of connection list
> 
>   This patch implements the Fisher-Yates shuffle algorithm to ensure that all permutations of the connection target list are generated with equal probability, eliminating biases present in the previous shuffling method. In the Fisher-Yates algorithm, there's only one way to obtain each permutation through a series of element swaps, so all permutations occur with equal probability in theory.
>   
>   Signed-off-by: Hurukawa2121 <shujifurukawa1213@gmail.com>

We'd need a real name here, if possible, as this is a true signature 
telling us that you accepted the project license.

> 
>   src/openvpn/init.c | 13 ++++++++++---
>   1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/src/openvpn/init.c b/src/openvpn/init.c
> index 9371024e..c4fb5cd7 100644
> --- a/src/openvpn/init.c
> +++ b/src/openvpn/init.c
> @@ -467,7 +467,14 @@ ce_management_query_remote(struct context *c)
>   #endif /* ENABLE_MANAGEMENT */
>   
>   /*
> - * Initialize and possibly randomize connection list.
> + * Initialize and randomize the connection list.
> + *
> + * Applies the Fisher-Yates shuffle algorithm to ensure all permutations are equally probable,
> + * thereby eliminating shuffling bias in the previous method.
> + *
> + * The algorithm randomly selects an element from the unshuffled portion and places it at position i.
> + * There's only one way to obtain each permutation through these swaps.
> + * This guarantees that each permutation occurs with equal probability in theory.
>    */
>   static void
>   init_connection_list(struct context *c)
> @@ -478,9 +485,9 @@ init_connection_list(struct context *c)
>       if (c->options.remote_random)
>       {
>           int i;
> -        for (i = 0; i < l->len; ++i)
> +        for (i = l->len - 1; i > 0; --i)

Is the the change above truly needed to achieve what you described in 
the commit message?
It's just changing the direction we iterate over the array, but it 
should not make a real difference, no?

Regards,

>           {
> -            const int j = get_random() % l->len;
> +            const int j = get_random() % (i + 1);
>               if (i != j)
>               {
>                   struct connection_entry *tmp;
Antonio Quartulli Nov. 18, 2024, 10:51 a.m. UTC | #2
Thanks for your reply!
Please keep the mailing list in CC while replying.

See below:

On 18/11/2024 11:40, 古川修慈 wrote:
>  > We'd need a real name here, if possible, as this is a true signature
>  > telling us that you accepted the project license.
> Thanks for your feedback.
> My real name is Shuji Furukawa.
> Do I have to resend the patch?

Technically yes.

> 
>  > Is the the change above truly needed to achieve what you described in 
> the commit message?
>  > It's just changing the direction we iterate over the array, but it 
> should not make a real difference, no?
> You can run this simple simulation <http://tpcg.io/5SPPET> to understand 
> how effective this patch is.

Thanks for the link.
As mentioned in my earlier comment below, I was trying to figure out if 
reverting the looping direction could make any difference.

Not sure if you have seen that question?

In any case, I just made an attempt by myself, and indeed keeping the 
original direction still yields the same effectiveness:

http://tpcg.io/_L5Q0WX

Regards,
Antonio Quartulli Nov. 18, 2024, 1:17 p.m. UTC | #3
don't forget to keep the mailing list in CC.
Your reply went to me only :)

If you click "Reply All" you will retain all recipients.

On 18/11/2024 13:18, 古川修慈 wrote:
> Thanks for your reply :)
> 
>  >
>  > Technically yes.
> I understand. I'll resend later.
> 
>> 
>> Thanks for the link.
>  > As mentioned in my earlier comment below, I was trying to figure out if
>  > reverting the looping direction could make any difference.
>  >
>  > Not sure if you have seen that question?
> 
> I'm sorry for misunderstanding your question earlier.
> 
> It’s not just about changing the direction of the iteration over the array.
> I also updated the following lines of code:
>  >
>  > -            const int j = get_random() % l->len;
>  > +            const int j = get_random() % (i + 1);
> This difference is crucial.

Absolutely!
My comment was only for the first line.

> 
>  >
>  > In any case, I just made an attempt by myself, and indeed keeping the
>  > original direction still yields the same effectiveness:
>  >
>  > http://tpcg.io/_L5Q0WX <http://tpcg.io/_L5Q0WX>
> Yes, your observation seems correct.
> This article <https://stackoverflow.com/questions/68064254/correctness- 
> of-fisher-yates-shuffle-executed-backward> summarizes the difference 
> between fisher_yates_shuffling and fisher_yates_shuffling2.
> 

Thanks for the pointer.
In any case you can also read here that what I proposed is just a 
variation of the original algorithm:

https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_%22inside-out%22_algorithm

But the original version was proposed the way you wrote it.
For our specific case they are equivalent.

> In my opinion, fisher_yates_shuffling is easier to understand than 
> fisher_yates_shuffling2 because it simply ensures that we randomly 
> select an element only from the unshuffled portion.
> On the other hand, iterating from the beginning makes it harder to 
> understand why it produces a uniform permutation.
> That’s why I recommend iterating in the reverse direction.
> 

I don't really have a strong opinion, but if the Fisher Yates algorithm 
is described this way, I think it makes sense to follow that.

Feel free to resend your patch (possibly adding a "v2" in the subject so 
we understand that this is a new patch, i.e. "[PATCH v2]")

Regards,

Patch

diff --git a/src/openvpn/init.c b/src/openvpn/init.c
index 9371024e..c4fb5cd7 100644
--- a/src/openvpn/init.c
+++ b/src/openvpn/init.c
@@ -467,7 +467,14 @@  ce_management_query_remote(struct context *c)
 #endif /* ENABLE_MANAGEMENT */
 
 /*
- * Initialize and possibly randomize connection list.
+ * Initialize and randomize the connection list.
+ * 
+ * Applies the Fisher-Yates shuffle algorithm to ensure all permutations are equally probable,
+ * thereby eliminating shuffling bias in the previous method.
+ * 
+ * The algorithm randomly selects an element from the unshuffled portion and places it at position i.
+ * There's only one way to obtain each permutation through these swaps.
+ * This guarantees that each permutation occurs with equal probability in theory.
  */
 static void
 init_connection_list(struct context *c)
@@ -478,9 +485,9 @@  init_connection_list(struct context *c)
     if (c->options.remote_random)
     {
         int i;
-        for (i = 0; i < l->len; ++i)
+        for (i = l->len - 1; i > 0; --i)
         {
-            const int j = get_random() % l->len;
+            const int j = get_random() % (i + 1);
             if (i != j)
             {
                 struct connection_entry *tmp;