RangeVoting.org - answer to puzzle 99 - correctly generating random pairwise matrices

Puzzle #99: Efficient generation of "random election" pair-margins matrices

Puzzle:
Many election methods for N-candidate elections with rank-order ballots, as their first step, compute the "pairwise table" giving the "margin" by which A defeats B (i.e. the number of voters who prefer A>B minus the number who prefer B>A, for every A,B).

Suppose all V votes are independent random orderings (all N! orderings equiprobable) with V very large.

In the V→∞ limit, what kind of probability density do these matrices arise from?
Find a way to generate the N×N pairwise matrix resulting from such votes, without actually generating and processing all the votes. I found an interesting O(N⁵)-time, O(N²)-space algorithm. However, soon after, I realized that there was a simpler algorithm running in only O(N²)-time and O(N)-space.

Answer

a. The upper-triangle of the matrix is an (N-1)N/2-dimensional zero-centered Gaussian distribution. (Note: not N² dimensional, because the diagonal of the matrix is all-zero and the lower triangle is determined by antisymmetry from the upper triangle.) The "Gaussian" is a result of the central limit theorem in the V→∞ limit in this (N-1)N/2-dimensional space.

The entries in a given row or column of the matrix are correlated. Specifically, if V_AB is a ±1-valued coin-toss random deviate indicating for some voter indicating whether she prefers A over B or the reverse, then its mean is E(V_AB)=0 and its variance is var(V_AB)=1. Because once we know a voter prefers A>B, the probability is 2/3 she prefers A>C [consider the three possibilities A>B>C, A>C>B, C>A>B], we have that correl(V_AB,V_AC)=1/3=2/3-1/3. Therefore (now applying the central limit theorem to these votes after rescaling and translating everything appropropriately) in the limit of a huge number of voters we get that the mean of each off-diagonal matrix element's Gaussian is 0, its variance is 1, and the correlation between any two different Gaussians in the same row (or in the same column) of the upper triangle is 1/3.

b. The naive slow algorithm, running in O(N!·N²) steps, would be to generate a (standard Gaussian random) vote-count for each of the N! different possible kinds of vote, then add the resulting N! matrices (each N×N) together. That generates the correct (N-1)N/2-dimensional Gaussian distribution.

There is an interesting faster algorithm. If N=1+p, where p is prime, then instead of using N! vote types it suffices to use (N-2)(N-1)N. (And if N≠1+p that is ok; simply increase N to the next suitable value by adding extra fake candidates. Because of standard number-theoretical results about "prime gaps" and polynomial time primality testing, this won't increase the runtime bound by more than a constant factor; nor will it alter the correctness of the statistical distribution.)

The (N-2)(N-1)N permutations are precisely the members of the triply-transitive permutation group SL₂(p) got as follows. Let the N items being permuted be {∞,0,1,2,...,p-1}. The permutations are then precisely the degree-(1,1) rational bijections

x → (ax+b)/(cx+d) (with all arithmetic mod-p) where ad-bc=1.

E.g. see George Mackiw: The Linear Group SL(2,3) as a Source of Examples, The Mathematical Gazette 81,490 (March 1997) 64-67.

Why does the subgroup SL₂(p) generate the (same) correct (N-1)N/2-dimensional Gaussian distribution as the full N!-size group? Because all its covariances are the same. Specifically:

(Where A,B,C,D unequal) If a vote says A>B, then the probability it says C>D is (correctly) exactly ½. Indeed, once we have decided where the vote-permutation f maps A and maps B, then we can demand that f(C)-f(D) is any specified (finite nonzero) value and solve the resulting (simultaneous linear) equations for f; there always will be a unique solution and since our group includes cyclic shifts mod p (that is, x→x+1) this means C>D and D>C are equiprobable when both are finite. Further, we can instead specify f(A), f(B) and demand f(C)=∞, or we can specify f(A), f(B) and demand f(D)=∞; either way we get a unique solution and equiprobability of D>/<C.
(Where A,B,C unequal) If a vote says A>B, then the probability it says A>C>B (or A>B>C, or C>A>B) is also exactly correct because due to triple-transitivity we can simply solve for the unique f with f(A), f(B), f(C) specified.

But there is a much simpler and faster algorithm still: Let X_ab (for 1≤a≤b≤N) be (N+1)N/2 independent standard normal random variables, then Margin_ab=X_aa-X_bb-X_ab for a≠b. Thus: Direct an edge in the defeat-tournament from a to b if X_aa-X_bb>X_ab and from b to a otherwise. This is an incredibly simple O(N²)-time algorithm and the space usage may even be reduced to O(N) by only handling one row of X at a time (but storing the diagonal of the X matrix always). (The correctness of the algorithm is again because it produces the correct pairwise correlations – namely 1/3 – that is straightforward to check.)

INCREDIBLY SIMPLE ALGORITHM TO GENERATE NxN Margin Matrix (and defeat matrix):
for a from 1 to N { diag[a] = RandNormal(); MarginMat[a][a] = 0.0; DefeatMat[a][a] = 0; }
for a from 1 to N-1 {
   for b from a+1 to N {
      x = RandNormal();
      y = diag[a] - diag[b] - x;
      MarginMat[a][b] =  y;
      MarginMat[b][a] = -y;
      if(y>0.0){ DefeatMat[a][b] =  1;  DefeatMat[b][a] = -1; }
      else{      DefeatMat[a][b] = -1;  DefeatMat[b][a] =  1; }
   }
}
NOTE the margin matrix will be correct up to a scaling factor.

Return to puzzles

Return to main page