By Warren D. Smith, January 2007

Let P_{k} be the (known) population of state k, and let S_{k} be
its number of congressmen, where the total number
S=∑_{k}S_{k}
of congressmen is fixed by US law to be
S=435.

The question is: what should the S_{k} be? The difficulty is that they have to
be *integers*.

The new method we propose is very simple. Choose a number Q>0, traditionally called "the divisor," so that the total number of seats S will come out correct, and let

where d is a constant that had been chosen ahead of time. There are two ways to choose d. First, we can choose it to yield the best retrospective performance (least "bias") over past US history. Second, we can instead choose d based on a purely theoretical analysis, which yields the following formula:

With the USA's K=50/435=10/87=0.11494252873...
we get d=0.495211255149063832...
For application to "party list" PR voting systems, instead K=#parties/#seats,
and with typical values #parties=20 and #seats=300 we would get d=0.49722.
In the limit K→0^{+} we would get d=1/2 (Webster),
and in the opposite limit K→1 we would get d=1-ln(e-1)≈0.458675.

First of all, the classic apportionment methods of Webster, Adams, and Jefferson
are merely special cases of our new method, with d=½, d=0+, and d=1- respectively.
If we choose the *optimal* value of d, then we must get a method at least as good as these
(by whatever performance measure we are optimizing).
Because Balinski & Young's book concluded Webster (and in some ways Jefferson) were
the best methods among the ones they knew (and computer simulations
by Dan Bishop and myself
also came to that same conclusion), the new method is at least as good as all previous
to the extent you buy Balinski & Young.

Second, because the new method is a "divisor method" it enjoys all the properties
that, B&Y proved, divisor methods (and *only* divisor methods) always enjoy:
"House monotonicity," "population-pair monotonicity," and freedom from the "new state paradox."

The Hamilton/Vinton method fails all three kinds of monotonicity. The Balinski-Young method fails two of them but obeys "House monotonicity" and both it and Hamilton/Vinton obey the "quota criterion" that the number of seats a state is awarded differs by at most 1 from its ideal value. [All divisor methods fail the quota criterion in suitable situations; and indeed the quota and population-pair-monotonicity criteria are logically incompatible.]

Third, computer simulations by Dan Bishop's computer program (as well as an upcoming more powerful program of my own) found superior new-record behavior for the new method (using the theoretical model's d) over all previous divisor methods both in terms of several bias measures, and in terms of making "quota violations" get rarer. (Note: this computer finding was not merely a trivial and expected consequence of our derivation below, because the bias measures the computer evaluated were not the same as the one nulled out by our derivation, and also because the computer examined other probabilistic models.)

Fourth, both our computer simulations and US history
agree that Webster is slightly biased
in favor of the large-population states,
i.e. it tends to give them slightly more than their
ideal (but noninteger) fair share of congressmen.
(And Adams is heavily biased in favor of small states,
and Jefferson is heavily biased in favor of large ones.)
The new method with d=0.495 perturbs Webster in the right direction to get rid of that bias,
or alternatively our theoretical model that outputs d=0.495 may be regarded as having
*predicted* Webster's bias.

As our starting point, we shall assume the state populations are
independent identically distributed *exponential* random variables,
i.e. with density

That is: the probability density proportional to exp(-xK)dx is postulated to hold for state populations measured in units of one ideal seat's worth of population. In that case, the expectation value of x, which is 1/K, must equal #seats/#states, i.e. the correct average number of seats a state gets.

In other words:

- The average population of a state (measured in units of 1 ideal seat) is #seats/#states.
- This is set equal to the expected value of the population probability density.

Now suppose we wish to choose a threshhold y, n<y<n+1, so that,
if these states have quota>y, their seat-count is rounded up to n+1,
otherwise down to n; and *we wish to choose y so that the
net expected additive effect of the rounding is zero*, i.e. so that
the rounding is *unbiased*.
The formula that determines y is then

It is very interesting and pleasant that this formula for y-n is *independent* of n.
This new apportionment method is (by its derivation) unbiased in
our probabilistic model not only for all states en masse in the sense that neither
large nor small states have any systematic advantage, but also, more strongly, is unbiased
for *each* state-subset
in each quota-interval [n, n+1] in the sense
the expected additive adjustment of the number of seats caused by rounding is zero.
In the K→0+ limit it reduces to the Webster formula y=n+½,
and in the K→∞ limit it reduces to the Adams formula y=n+0;
for intermediate K it provides intermediate formulae.

Our simple apportionment proposal does not provide a **1-seat minimum**;
it will award small-enough states zero seats.
If you want to have a 1-seat minimum, then the *first* rounding point y (between 0 and 1)
needs to be replaced by y=0. The *other* rounding points y then are
still given by the usual formula y=n+d, for n=1,2,3,... but now with a different value of d.
Again you can choose d to optimize historical performance, or in the same theoretical model
we have the exact formula

which when K=50/435 yields d≈0.557504703. [Note this formula only is valid when 0<K<1.] This formula was derived by solving for y to zero

The resulting apportionment method
then is only unbiased
for all states en masse (albeit if K>1 then unbias is unachievable)
and *not* for each state-subset
in each quota-interval [n, n+1] individually.

It is also possible to start from other underlying simplistic probabilistic models besides
i.i.d. exponential state-population densities. I shall argue soon that these other models
make less sense, but disregarding that, one can proceed anyway and see what formulas result.
Also, we can work toward different goals than zeroing the expected additive
adjustment to the quota (caused by rounding-to-integer);
we instead could, e.g, zero the expected additive adjustment to the *log*
of the quota. We have worked out several examples, some of which will be given here.
They all yield rather more complicated formulas, which unfortunately always depend on n:

**1.**
If we aim to null out the expected additive adjustment in a state's
log(#seats) by choice of y, then solving

Here the "exponential integrals" Ei_{m}(z) for m=1,2,3,...
are defined in the halfplane Re(z)>0 by

**2.**
If we aim to null out the expected additive adjustment in a state's seat count, *divided* by
the state's population, then we ask for y so that

**3.**
If we aim to null out the expected additive adjustment in a state's population/seats
ratio (i.e. in its "district size"), then we ask for y so that

**4.**
If we aim to null out the expected additive adjustment in a state's population/seats
ratio (i.e. in its "district size") *times* its population, then we ask for y so that

**5.**
If we aim to null out the expected additive adjustment in a state's
log(#seats) *times its population* by choice of y, then

**6.**
If we aim to null out the expected additive adjustment in a state's seat count, *divided* by
the *square root* of the state's population, then we ask for y so that

+erf(√K √B)/2 -erf(√K √A)/2 +AK erf(√K √A) -BK erf(√K √B) =0.

At the present moment it seems to me that alternative method #2 is the most "morally right" in terms of its underlying theoretical model/goals. However, that is merely my subjective impression. Here is my reasoning:

That method aims to null out the "bias" which (for that method) is defined as the expected |additive change| in the number of seats a state has (caused by the round to integer) divided by state population. Why is this "morally right"? Well, it seems that the unfairness, for an individual person in state k, of having X times as many congressmen than he ideally should, is

Now we have to weight this unfairness per person, by the number of people in the state,
which means we multiply by P_{k}
thus cancelling out the denominator
to get |X-1|.

But that is exactly what the derivation of the method nulled out, because |X-1| is proportional to the |additive change| in the number of seats, divided by state population.

But Mike Ossipoff is (justifiably) dubious of my "quantified morality" above and currently thinks
the *original* method is the morally best of those here. He argues against me:

You said it was about unfairness per person, which justified dividing by population. But unfairness sometimes isn't something that has to be divided among its recipients. When we're unfair to a state, all of its residents get all of the unfairness, and they don't have to divide it up... Equal representation expectation per person – shouldn't that be the goal for all of us? The original method can be derived from that goal.

It is also possible to take a compromise path "midway between" the Ossipoff and Smith
views (i.e. if Ossipoff divides by 1 and Smith by x, then the compromise is
to divide by √x), and the result of that would be alternative method #6.
More generally one could consider division by an arbitrary power x^{μ} of x,
whereupon
the resulting integrals would be expressible in terms of incomplete gamma functions.

There are many reasons for that, well known to mathematicians and statisticians and associated with such names as Dirichlet, Poisson, and Shannon. We shall explore them incompletely.

First, there is the observation of Dirichlet. Suppose you want to split up a
country's total population P into N state-populations P_{k}, k=1,...,N
(N fixed), so that ∑_{k=1..N}P_{k}=P
and P_{k}≥0.
This set is an (N-1)-dimensional "simplex" in N-dimensional space. The goal of choosing
a point randomly from this set (and *uniformly* so that
equal N-1-dimensional areas are equally likely)
can be attained by choosing each state population to be an i.i.d. exponential random deviate,
and then normalizing the N state populations so that their sum comes out right
(to the desired value P).
In this process, *only* exponential random variates work. Every other 1-dimensional
density function does not work.

Second, there is the observation of Poisson.
Suppose you wish to drop N points randomly and uniformly
in the interval [0,N/K] of the real line, where N→∞.
This is a model of "density-K random points."
Then what is the probability density for the *gap* between two consecutive points?
The answer is that it is exponential: Kexp(-xK) dx.
Indeed, it is not even necessary to take the limit N→∞; we can make a very
similar statement about each *finite* N.
By that we mean that the following algorithm
will generate N
random *uniform in [0,1]*
numbers X_{1..N}, *in sorted order* without actually performing a sort:

s = 0; for k=1 to N do s = s + ExponentialRandomDeviate; X_{k}= s; end loop; s = s + ExponentialRandomDeviate; for k=1 to N do X_{k}= X_{k}/ s; end loop;

Our point is, that the gaps between the consecutive X_{k} (including the initial
gap from 0 to X_{1} and the final one from X_{N} to 1) generated
in this way are i.i.d. exponential random deviates, aside from the rescaling at the end.

We also can get exponentials from a discrete model. Suppose we consider all the possible ways of placing P indistinguishable balls into N distinguishable boxes. How many are there? The answer is (P+N-1)!/[(N-1)!P!]. Now if all these configurations are considered equally likely, then in a limit where P→∞ and N→∞ with the average number P/N of balls per box held fixed, what is the probability distribution for the number of balls in one particular box? The answer is it tends to a (now discrete) exponential distribution on the non-negative integers, with mean P/N.

In both of the cases above, what is going on is we are splitting a total population (modeled
as "indistinguishable balls" or "the real interval [0,N/K]" into N "state" populations
subject only to the constraints that they be nonnegative and have the correct sum,
and we are finding the state populations (in a limit) are exponentially distributed.
Because in the USA P=3×10^{8} is large, and N=50 is fairly large,
the asymptotic limits are fairly well satisfied in practice, albeit the assumption the state
boundaries are drawn to split up the population "randomly" is of course nonsense (but
if you are going to make a simplistic probabilistic model, this is the one).

Third, and finally, there is Claude Shannon. The
probability density function on the positive real axis with given mean 1/K
and *maximum entropy* is the exponential density Kexp(-KX)dx (and similarly
the discrete exponential has maximum entropy on the nonnegative integers with specified mean).
Maximum entropy is a formal and quantitative version of Occam's razor. That is,
this is the "simplest" assumption about a non-negative random variable with a mean,
i.e. the assumption which "assumes the least."

Can we now devise some pairwise optimality theorem or global optimality theorem that these methods obey? (See /Apportion.html for such theorems concerning the classic apportionment methods as well as an introduction to them generally.)

The following should be useful references both for terminology and for previous results and for related history (mostly US):

M.L. Balinski & H. Peyton Young: Fair Representation: Meeting the Ideal of One Person, One Vote (2nd edition), Brookings Institution Press 2001. Young also wrote a more accessible essay summarizing his thoughts for the layman.

Lawrence R. Ernst: Apportionment methods for the House of Representatives and Court Challenges, Management Science 40,10 (Oct. 1994) 1207-1227.

Warren D. Smith: Introduction to Apportionment methods web page.

The sorted random points without sorting algorithm is from:

J.Bentley & J.Saxe: Generating Sorted Lists of Random Numbers, ACM Trans. Math'l. Software 6,3 (1980) 359-364.

For standard results on probability distributions see the 5-volume set by N.L.Johnson & S.Kotz (now with N. Balakrishnan and Adrienne W. Kemp in recent editions) J.Wiley & Sons; for combinatorics R.P.Stanley: Enumerative Combinatorics, (Brooks Wadsworth, Cole), and for information theory and maximum entropy theory, T.M. Cover & J.A. Thomas: Elements of Information Theory, Wiley 1991.

I would also like to **acknowledge** Dan Bishop and Michael Ossipoff.
Bishop programmed a computer apportionment simulator, and Ossipoff came up with the idea of
trying to seek an unbiased apportionment method, although his execution of
that idea was poor due to incorrect mathematical manipulations and/or confusing expositions
(and Ossipoff also did not appreciate
the desirability of the exponential density as an underlying model).
The present work can be viewed as simply carrying out Ossipoff's idea, but now with correct
execution and a better underlying model.