Answer to puzzle #117 – Bayes-Laplace-Dirichlet law & "soft quorum"

Puzzle
Suppose there is some binary quantity B (i.e. yes=1 or no=0, for example "do you think there should be a $5/gallon gasoline tax?"). You ask a random sample of S≥0 people for their values of B, and the result is that Y say "yes" while N say "no" where Y+N=S.

  1. Given this data: what is the Bayesian estimate of the probability P that a random person says "yes"?
  2. And what is the variance in this estimate?
  3. How can a similar formula be used to make range voting have a "soft quorum"?

Answer a [Th.Bayes (1702-1761), P.S.Laplace (1749-1827) & J.P.G.L.Dirichlet (1805-1859)]:

If we assume P has a "prior" distribution uniform on the real interval [0,1], then the Bayesian estimate of the expectation value of P (conditioned on the Y yesses, N noes data) is

Expectuniform prior(P | data)   =   ∫0<u<1 u · uY · (1-u)N du /0<u<1 uY · (1-u)N du

Since both the integrals are Euler Beta functions they can be done immediately via Euler's formula

0<u<1 uA-1 · (1-u)B-1 du   =   Γ(A) Γ(B) / Γ(A+B).

Bayes recognized the answer was the above ratio of integrals but I doubt he was aware of Euler's formula [due to Leonhard Euler (1707-1783)] in which case he was not able to actually do the integrals. But Dirichlet was aware of it and thus reached the final result, which (after algebraic simplification) is

Expectuniform prior(P | data)   =   (Y+1)/(Y+N+2) = (Y+1)/(S+2).

Note that this is not quite the same as the naive formula P=Y/S, although it becomes the same in the limit where S is very large. One indication that the Bayes-Laplace-Dirichlet formula is superior to the naive one is how it handles the no-data case Y=N=S=0.

You may also enjoy the case Y=S=1, N=0 where Laplace says Expect(P)=2/3 as opposed to the naive 1. Obviously, it does not make sense, given a single datapoint "yes," to conclude that every other human being is also going to answer yes so that our best estimate of humanity's response is "1.000." We feel from our prior knowledge about human behavior that some people will probably say "no." Getting a single "yes" datapoint is not enough to cause us to throw all our prior knowledge about human behavior into the garbage. The Bayes-Laplace-Dirichlet formula is a way to smoothly, and in a principled way, reduce the relative amount of prior knowledge we incorporate into our estimate, as more real data becomes available.

Incidentally, if instead of using a uniform prior, we had employed a Beta(α,β) distribution [which has mean α/(α+β)] as our prior, then we would get this more general formula:

ExpectBeta(α,β) prior(P | data)   =   (Y+α)/(Y+N+α+β)

and the old formula merely arises as the special case α=β=1. Call this the "generalized" Bayes-Laplace-Dirichlet formula.

Note that the generalized Bayes-Laplace-Dirichlet formula is the same as the naive P=Y/S formula except that an extra α "yesses" and β "noes" are artificially adjoined to the set of S real votes.

Answer b: the variance

We can similarly compute the variance (and the standard deviation is its square-root):

VarianceBeta(α,β) prior(P | data) = [(Y+α+1) (N+β+1) - 1] / [Y + N + α + β]2

In the large-S limit this becomes just Variance→YN/S2.

c: Application to "quorum" for range voting (this was explained to us by Andy McKenzie on 27 June 2008):

The Internet Movie Database (IMDb) uses a formula of generalized-BLD form to handle range voting for rating movies. (Specifically, their formula reduces in the "approval voting" case to the Bayes-Laplace-Dirichlet law but with a constant number of artificial yes and no votes introduced before any real votes are solicited.) The IMDb formula is this:

Candidate's "Output Rating"   =   (RV + CQ)/(V + Q)

where
R = The average (mean) score of this candidate as rated by the V voters
V = Total number of voters
Q = Constant "quorum" number of voters
C = Some constant score somewhere in the score-range (IMDb uses the mean score of all IMDb movies, currently 6.7 on an 0-10 scale.)

Then (if we were running a range voting election using the IMDb system) the candidate with the greatest output rating would win. This is just like ordinary range voting, except that an extra Q "artificial votes" (all with artificial-vote mean-rating C) are inserted for each candidate before the V real voters speak.

Special cases:

  1. If Q=2 and C=midrange then this is just our original Bayes-Laplace-Dirichlet uniform-prior formula.
  2. If C=0 and Q→∞ then this reduces to sum-based (not average-based) range voting: candidate with highest summed-score wins.
  3. If Q=0 then this reduces to average-based range voting.
  4. So with finite positive C and Q the IMDb scheme is a compromise between average- and sum-based range voting.

In the special case C=0 (which, if the allowed-score-range is from 0 to some positive value, maximally disfavors candidates that few voters rate) the formula would simplify to

Candidate's "Output Rating"   =   RV/(V + Q).

Then a good choice (for elections purposes) for Q might be one-fourth of the maximum number of voters who genuinely-rate any candidate?

Advantages of simplified B.L.D. formula for use with range voting for "quorum" purposes

  1. The formula is simply explained as follows: "use ordinary range voting – highest average rating wins – except we give Q artificial 'zero' ratings to each candidate before the real voting begins." Also, even if C isn't zero, you can still explain it as "artificially adding Q ballots which rate all candidates at C."
  2. If there are few votes, the B.L.D. formula tries to use the data most effectively to deduce the best statistical estimate of the "true" mean score for a candidate.
  3. The B.L.D. formula can also be used to "downgrade" little-known candidates who got rated by few voters.
    This prevents the nightmare scenario where Hitler gets elected just by himself and a few friends, while 99.99% of the voters do not rate him since they never heard of him. The idea of "quorums" is you need to be rated by at least Q voters to win. Lesser-known candidates could theoretically benefit from a bias that the few people who have heard of them tend to favor them abnormally much – although in practice they usually suffer much more from the bias that a substantial fraction of the people who have never heard of them automatically give them 0s rather than NO OPINION scores as a "safety measure."
    The appropriate value of Q to remove the former type of bias, is Q≈the typical number of fanatical supporters than anybody running for that kind of seat can secretly muster. The appropriate value of C to reduce the latter type of bias is C≈the average rating of all candidates.
  4. Depending which parameters are inserted into the formula it can accomplish either or both purposes 2&3.
  5. Our formula does not exhibit a sudden "hard brick wall" cutoff in which those rated by fewer voters than the quorum can never win. [Incidentally, honeybees employ a hard-quorum type range-voting scheme.] Such sudden cutoffs could be tempting targets for those trying to "game the system" or those trying to criticize the voting system. Instead with B.L.D. the downgrading is "continuous" and the quorum is "soft." A candidate rated by few voters might still be able to win if his opponents have low-enough ratings.
  6. If all candidates are rated by the same number of voters, then our formula becomes equivalent to ordinary range voting – candidate with greatest average rating wins.

On the other hand, a disadvantage of the IMDb scheme that it adds 2 new 'dials' (C & Q) that can be turned. There would be an incentive for widely known candidates to argue for making Q as large as possible (or that C should be zero) in order to hurt candidates who aren't as widely known.

Ivan Ryan and Andy McKenzie helped W.D.Smith to create this page.


Return to puzzles

Return to main page