## Apportionment and rounding schemes

By Warren D. Smith Jan 2007

The same difficulty arises in two different guises in democracy:

1. How many congressmen should each US state have (given that the state populations are known)?
2. In "party list" PR voting systems, how many seats should each party get (given that the number of votes for each party is known)?
3. And it also arises in pure math in the area of simultaneous diophantine approximation.

In both cases, the problem is that you cannot have 3.7 congressmen – it is necessary to round to integers. Actually, by giving congressmen unequal vote-weights in the legislature, "3.7 congressmen" would be possible in some sense. But we shall ignore that.

Let Pk be the (known) population of state k, and let Sk be its number of congressmen, where the total number S=∑kSk of congressmen is fixed by US law to be S=435.

The question is: what should the Sk be?

### Hamilton/Vinton method

Let P=∑k Pk be the total population. Start with Sk = ⌊Pk · S/P⌋ (rounding every ratio down). Then if the total number of seats is too small, start awarding extra seats to the states pre-sorted in decreasing order by rounded-off fractional parts, until the correct total number of seats is achieved. Hamilton's method has a very appealing "quota property": Each state gets either its "lower quota" ⌊Pk · S/P⌋ or its "upper quota" ⌈Pk · S/P⌉ worth of seats.

The USA abandoned the Hamilton/Vinton method after the "Alabama paradox" arose. That is: With fixed population proportions, Hamilton's method can cause a state to lose a seat as a consequence of an increase in the total number of seats. I.e. it violates "house size monotonicity." While that may not matter now that the US House has its size fixed at 435 seats... Hamilton also violates population-pair monotonicity, i.e. it can cause state A to lose seats relative to state B even though A's population expands and B's contracts! Here are some examples of that:

The following may be the simplest example where Hamilton disobeys house monotonicity ("Alabama paradox"). If we increase the total number of seats from 4 to 5, state C loses a seat.
State Population #seats(4) #seats(5)
A 5 2 3
B 3 1 2
C 1 1 0

The Alabama paradox was discovered in 1880, when it was found that increasing the total number of seats would decrease Alabama's number from 8 to 7. It is expected to occur about once every 8 apportionments in a model involving exponentially distributed state populations.

Hamilton disobeys population-pair monotonicity. After people move around, state C gains 5.1% population (largest % gain) while state B is the only decreaser; but C loses a seat (only seat-loser) while B keeps the same number of seats.
State Population(I) #seats(I) Population(II) #seats(II) Pop.Change%
A 554 5 566 6 +2.2%
B 290 3 270 3 -6.9%
C 156 2 164 1 +5.1%
total 1000 10 1000 10 0%

Balinski & Young [Fair Representation: Meeting the Ideal of One Person, One Vote (2nd edition), Brookings Institution Press 2001] pointed out that, in the year 1900, Hamilton/Vinton gave Virginia 10 seats and Maine 3, whereas in 1901, Virginia would have gotten 9 seats and Maine 4, despite the fact that the Virginia/Maine population ratio was 2.67 in 1900 and 2.68 in 1901. [1901 populations based on linear interpolation between 1900 and 1910].

### Desirable-sounding Properties which an apportionment method may or may not satisfy

House monotonicity: if population of whole country goes up (preserving proportions) no state can lose seats. [No longer matters much for US congress since total #seats now fixed at 435.]

Population-pair monotonicity: If population of state A increases but state B decreases, then A should not lose seats while B stays the same or gains seats. More generally, if A's percentage population change exceeds B's, then A should not lose seats while B stays the same or gains seats.

Quota: Always should give a state either A or A+1 seats (where the ideal ratio lies between).

Theorem (Balinski & Young): All "divisor methods" (and, essentially, only divisor methods) are both House and population-pair monotone; but they all disobey quota. (However, these disobeyals can be rare. Webster, Dean, and the Huntington-Hill method have never violated quota on any US states dataset so far, and according to some vaguely-described computer experiments by B&Y would be expected to do so only with probability 0.00061, 0.00154, and 0.000286 respectively, which in the case of Webster would be once every 1600 apportionments, i.e. once every 16000 years. My own computer experiments agree Webster is by far the least likely of the 5 classic divisor methods to suffer a quota violation, and also agrees Hill and Dean are in second and third places respectively, but estimates the quota-violation probability for Webster as 0.0016, which far exceeds B&Y's estimate but still is 6300 years between violations. Meanwhile Hamilton exhibits a monotonicity paradox about once every 18 apportionments in B&Y's same model. B&Y also showed that divisor methods avoid paradoxical results when new states are added.) Webster also satisfies quota if there are ≤3 states (but the others among the classic five divisor methods only do so if there are ≤2 states).

Meanwhile, Hamilton satisfies quota but disobeys both monotonicity properties. That leads to the question of whether an apportionment method exists that satisfies all three properties. The answer is "no" – the last two properties are incompatible – but "yes" if we only ask for the first and third properties (new method found by Balinski & Young in 1974).

Here's Balinski & Young's 1974 apportionment method which is both house-monotone and obeys quota: Let Pk denote the population, and Sk the number of seats assigned so far, to state k. We initially assign all states 0 seats: Sk=0. We now repeatedly award a new seat to whichever state maximizes Pk/(1+Sk) subject to the constraint that adding this new seat not violate the "upper quota" condition, i.e. seats are not awarded that cause Sk≤⌈Pk∑Sk/∑Pk⌉ to be violated. We just keep awarding new seats in this way, never violating quota (which B&Y proved trivially can always be done), until the correct total number ∑Sk results. Although Balinski & Young initially recommended this method, ultimately they rejected their own invention because it fails population-pair monotonicity. Also G.D.Birkhoff in 1976, and Cyril Carter in 1982, both claimed based on empirical numerical experiments, that this method was "biased" in favor of large-population states. (Birkhoff and Carter's claims were not very convincing because the number of their experiments were very small.) Jonathan W. Still [SIAM J. Applied Math. 37,2 (1979) 401-418] then found an infinite class – in fact the entire class – of house-monotonic and quota-obeying apportionment methods (Balinski & Young's method is just one member of Still's class). All of Still's methods are of the following form:

2. Find the "eligibility set" E (which, Still proved, is always nonempty) of states for which (a) you can give that state another seat without violating quota, and (b) a tricky "lookahead condition" explained by Still assures that if you do so, it will not force a quota violation later (throughout the infinite future of seat awards).
3. Pick a member of E (in any way at all, including random, or to maximize some arbitrary "quality function") and award that state an additional seat (increment Sk for that k).
4. Go back to step 2 until the total number of seats reaches the desired value S.
Still eventually recommended a house-monotone version of Hamilton's method in which the state you award an extra seat to in step 3, is the one that currently is the largest additive amount below its upper-quota.

Why is it impossible for any method to satisfy both quota and population-pair monotonicity? Consider the following 18-seat apportionment problem in which both Webster and Huntington-Hill give the same seat-counts – but both violate quota (violation shown in red) – actually every "divisor method" violates quota in this scenario:

State Population Quota #seats(Webster&Hill, use divisor=8.43)
A 13 1.625 2
B 13 1.625 2
C 13 1.625 2
D 105 13.125 12
Total 144 18 18

Now consider any method which tries to satisfy quota by means of A, B, or C giving up a seat to D. We claim that whichever one you pick (since they are all the same), you can use it to construct a population-pair paradox: say you picked A. Now move the people around so as to decrease A's and D's populations and increase B's and C's (with A having the largest decrease, percentagewise). Then the obvious apportionment will yield a population "paradox" since D would give up a seat to A despite A's faster population decrease. To exhibit that in detail:

State Population Quota #seats(Any quota-obeying method)
A 130 1.625 1
B 130 1.625 2
C 130 1.625 2
D 1050 13.125 13
Total 1440 18 18

and then after 60 people move (decreasing D's population by 5.0% and A's by 5.4%):

State Population Quota #seats(Hamilton, Hill, Webster, Dean, Adams)
A 123 1.5375 2
B 160 2 2
C 160 2 2
D 997 12.4625 12
Total 1440 18 18

...which is a population-pair paradox in the sense that D lost a seat to A despite A's largest-percentage population decrease. This does not actually prove that every quota-obeying method must exhibit such a paradox – for that see theorem 6.1 of B&Y's book – but it at least makes it plausible.

Splitting & combining incentives: Jefferson's method (which is equivalent to d'Hondt's method) "encourages mergers" because by merging into one, two states can gain seats but never lose them.
Theorem (see Balinski & Young thms 9.1 & 9.2) Jefferson is indeed the unique population-pair-monotonic apportionment method that offers such a combining-incentive (and as a free bonus, Jefferson also always obeys the "lower quota" condition).
(B&Y proposition 6.4): Adams is exactly the opposite: with Adams, states can gain seats by splitting, but never lose seats (except once the number of states exceed the number of seats, Adams becomes undefined), and it is the unique population-pair monotone method that thus-offers splitting-incentive (and as a free bonus, Adams also always obeys the "upper quota" condition).

Bias: Vaguely speaking, if a method "tends to favor small-population states" (or large-population ones) it is "biased." There are, however, many different possible formulas for assessing such "bias" and "favortism."

Theorem: Hamilton, Webster, Dean, and Hill all are "limit unbiased." That is, if the state populations are selected independently at random from some continuous and bounded probability density, the number of states is held fixed, and the total #seats is taken to infinity, then (almost surely) the bias (measured as a correlation coefficient) will tend to 0. But Jefferson and Adams are not limit unbiased; they favor large- and small-population states, respectively.

Near the quota: A method is "near the quota" if no transfer of a seat from one state to another can bring both states nearer to their exact ratios. (Birkhoff 1976 instead called this "binary consistency.") Balinski & Young (their theorem 6.2) proved Webster to be the unique population-pair-monotone method that is always "near the quota." Further, they proved this regardless of whether "nearer" is measured either in terms of additive or relative difference. But Ernst (his theorem 4.3) countered with an analogous theorem indicating that Huntington-Hill was "near the ideal" in the sense that for it, no transfer of a seat from one state to another can diminish certain other measures of departure from ideality. Hamilton/Vinton also is "near the quota," but (Birkhoff 1976 pointed out) Balinski & Young's quota-obeying method is not.

Uniformity: Balinski & Young [Proc. Nat'l. Acad. Sci. USA 73 (1980) 684-686] defined an apportionment method to be "uniform" if when we restrict to a state-subset, the apportionment within this sub"country" still is a valid apportionment under the same apportionment rule. They showed every method which is both uniform and population-pair monotone, must be a "divisor method." They also showed no method exists that obeys both uniformity and quota.

### The 5 classic "divisor methods"

A "divisor method" gives a state Sk = r(Pk/Q) seats where r is the appropriate rounding function for that method, and Q is a "divisor," i.e. a number chosen to make the total number ∑k Sk of seats come out right. (Here r is a monotonic-increasing integer-valued function of the real numbers.)

ADAMS' METHOD: r(x) = ⌈x⌉    [most favorable to small-pop states]

DEAN'S METHOD: r(x) = if x > hmean then ⌈x⌉ else ⌊x⌋ where hmean=2/(1/⌈x⌉ + 1/⌊x⌋)=2⌈x⌉⌊x⌋/(⌈x⌉+⌊x⌋)

HUNTINGTON-HILL'S METHOD: r(x) = if x > gmean then ⌈x⌉ else ⌊x⌋ where gmean=√(⌈x⌉⌊x⌋)

WEBSTER'S METHOD: r(x) = ⌊x+½⌋; equivalently r(x) = if x > amean then ⌈x⌉ else ⌊x⌋ where amean=(⌈x⌉+⌊x⌋)/2

JEFFERSON'S METHOD: r(x) = ⌊x⌋    [most favorable to large-pop states]

Note Hill, Dean, and Adams "naturally" enforce a one-seat minimum. E.V.Huntington in 1928 [The Apportionment of Representatives in Congress, Transactions Amer. Math'l. Soc. 30,1 (Jan 1928) 85-110 (pdf); this method had however also been considered earlier by Sainte-Laguë in 1910] showed that these "classic five" were the only divisor methods in which no transfer of a seat from one state to another can decrease some "measure of inequity" where this measure was required by Huntington to be an absolute value of the difference of two rational functions of the two state's populations and seat counts, with degree≤1 numerator or denominator in each, and such that equal seat/population ratio would zero this difference. However, Huntington did not show (contrary to misleading claims in B&Y's book) this for more-general "measures of inequity" such as more-complicated rational functions or non-rational functions. Therefore I deny the idea that these classic five methods are somehow uniquely favored and all other divisor methods should be excluded from consideration.

Although it is not a "divisor method," Hamilton/Vinton can be thought of in a similar way to those methods if we use an additive rather than multiplicative, "divisor." That is, you find a real value Δ such that if it is added to all the ideal (non-integer) seat-counts, then they are rounded off to nearest integers, you get the correct total number of seats.

Unfortunately, the divisor methods can fail – that is, a divisor that leads to the right total number of seats can simply fail to exist. Hamilton's method (if rephrased as a pseudo-divisor method as above) also can fail. For a simple example, consider 5 seats to be divided among 2 states with exactly equal populations. Any apportionment will necessarily be unfair (e.g. 3 and 2) and the divisor methods simply cannot do that. But with some random tiebreaking in such circumstances, we are ok; and in practice exact population ties are extremely rare and have never occurred, so this has not been an issue. [In our computer simulations no such failure ever occurred.]

### Connection to d'Hondt, Droop, and Sainte-Laguë methods

J.Q.Adams, T.Jefferson, A.Hamilton, and D.Webster were famous statesmen in early US history and devised the methods named after them. Edward V. Huntington, Joseph Hill, and James Dean were American mathematicians. The Europeans later encountered the same mathematical problem but in guise II, and their mathematicians and politicians invented their own solutions. Some of the European solutions were equivalent to the American ones:

Jefferson = d'Hondt (idea also known to Henry Droop): After all the votes have been tallied, successive quotients are calculated for each list. The formula for the quotient is V/(s+1), where:

• V is the total number of votes that party received; and
• s is the number of seats that party has been allocated so far (initially 0 for all parties)
Whichever party has the highest quotient gets the next seat allocated, and their quotient is recalculated given their new seat total, i.e. its s is incremented. The process is repeated until all seats have been allocated.

Webster = Sainte-Laguë = "method of odd numbers": like the d'Hondt method, but with a different denominator in the quotient formula, namely 2s+1 in place of s+1. (Incidentally, Webster could instead be defined using s+½ as the denominator. Rescaling a denominator sequence by any positive constant scale factor, does not change the system.)

Other denominator sequences have been suggested:

Estonia (non-proportional) d'Hondt=Jefferson(s+1) Sainte-Laguë=Webster=Odd numbers(2s+1) (used in Latvia) Modified Sainte-Laguë (Norway, Sweden, Nepal) Danish Imperiali (used in Ecuador) 1 20.9≈1.866 30.9≈2.688 40.9≈3.482 50.9≈4.257 60.9≈5.016 1 2 3 4 5 6 1 3 5 7 9 11 1.4 3 5 7 9 11 1 4 7 10 13 16 2 3 4 5 6 7 0+ 1 2 3 4 5 0+ √2≈1.414 √6≈2.449 √12≈3.464 √20≈4.472 √30≈5.477 0+ 4/3≈1.333 12/5≈2.400 24/7≈3.429 40/9≈4.444 60/11≈5.455 1 2 4 8 16 32 d d+1 d+2 d+3 d+4 d+5

Of these, Imperiali is very favoring to large-population states (strictly more than Jefferson). Modified Sainte-Laguë favors large states and disfavors small ones slightly versus ordinary Sainte-Laguë; it differs from it only for states (or parties, in the party-list usage) having ≤1 congressman. Danish favors small states. The Estonia and Macau methods actually are nonproportional and favor large and small states respectively. Every divisor method can be rewritten as an algorithm of d'Hondt type; you simply make the d'Hondtian denominator sequence be the same as (or any positive constant multiple of) the sequence of "rounding thresholds" for the divisor method.

### Pairwise optimality properties

Let Sk be the number of seats for state k and Pk be its population.

For all j,k, Adams minimizes |Sj - Sk Pj/Pk|.

Dean minimizes |Pj/Sj - Pk/Sk|

Huntington-Hill minimizes |Sj Pk/(Sk Pj) - 1|

Webster minimizes |Sj/Pj - Sk/Pk|

Jefferson minimizes |Sj Pk/Pj - Sk|

### Global optimality properties

Among all possible ways to assign non-negative integers to the Sk summing to the desired value S,

1. For any fixed power q≥1, Hamilton/Vinton minimizes ∑k |Sk - Pk S/P|q, i.e. the "Lq distance."
2. Webster minimizes ∑kPk(Sk/Pk-S/P)2 which corresponds to the idea that individual people have "as equal shares (of congress) as possible," in the sense of minimizing the variance of the seat-share, according to André Sainte-Laguë 1910 and F.W.Owens in 1921 (who both argued this independently) [Owens: On the Apportionment of Representatives, Quarterly Pub. Amer. Statist. Assoc. 17,136 (Dec 1921) 958-968, this theorem about "method W" minimizing the "total injustice" is stated without proof on his page 964].
Equivalently, given that S=∑kSk, P=∑kPk, and the Pk are regarded as fixed, Webster minimizes ∑kPk(Sk/Pk)2=∑k(Sk)2/Pk.
3. Huntington-Hill minimizes ∑kSk(Pk/Sk-P/S)2 which corresponds to the idea that district populations should be as equal as possible (in the sense of minimizing their variance), according to Huntington 1928, also stated by Owens on his p.965. Comparing this objective function versus Webster's, it seems to me that Huntington-Hill's is "morally wrong." That is, according to Huntington-Hill's, leaving a single person unrepresented is "infinitely bad." In reality it is only finitely bad. That suggests to me that Huntington-Hill's minimization is morally distorted. (The same criticism may be made of Adams below.)
4. Jefferson minimizes maxkSk/Pk, i.e. minimizes the largest individual share of congress. (Balinski&Young). Actually, it is stronger than this; it is not just the single max that matters; if we remove the maxes, then the next max also gets minimized by Jefferson's apportionment of the remaining seats, and so on recursively.
5. Jefferson also minimizes ∑k(Sk/Pk-S/P)+ where (x)+≡max(x,0).
6. Adams minimizes maxkPk/Sk, i.e. minimizes the largest district-population. (Balinski&Young). Actually, it again is stronger than this in the same way.

Lemma 7.1 (p.117), lemma 6.1 (p.94) and theorem 6.1 (p.95) of the book Evaluation and Optimization of Electoral Systems, by Pietro Grilli di Cortona, Cecilia Manzi, Aline Pennisi, Federica Ricca, and Bruno Simeone [SIAM Monographs on Discrete Mathematics and Its Applications, Philadelphia, 1999] are powerful tools for devising and proving more global optimality theorems. Assuming they are correct and I've used them correctly (which is not certain so please regard the following as preliminary), we can prove the following new results:

1. Let H(n)=∑1≤m≤n(1/m) be the "Harmonic numbers." Then Dean maximizes ∑k[2H(Sk)-1/Sk]Pk.
2. Also Dean minimizes ∑kSk[Sk - Qk - √(1+Qk²)] where Qk=PkS/P is state k's ideal (albeit perhaps noninteger) seat count. Equivalently (up to an additive and multiplicative constant that have no real effect), Dean minimizes
k[2Sk - √(1+Qk²) - Qk
3. The new divisor method based on the rounding function r(x)=⌈x-d⌉ where d is an arbitrary constant with 0≤d≤1 (the case d=½ is Webster, d=0+ is Adams, and d=1- is Jefferson), minimizes
k(Sk - Qk + d - ½)2/Qk.

In 2016 Toby Pereira conjectured something more complicated than, but equivalent to, the following – and I confirmed it in the nonrigorous sense that my computer failed to find a counterexample – the following new optimality property of Jefferson=d'Hondt:

Jefferson minimizes    ∑kPk[(Sk+½)/Pk]2=∑k(Sk+½)2/Pk.

Then I realized this conjecture is implied by the statement #3 just above it, in the case d=1-. Indeed, given that S=∑kSk, P=∑kPk, and the Pk are regarded as fixed, more generally it implies that

k(Sk - Qk + d - ½)2/Pk    and    ∑k(Sk+d-½)2/Pk

both are globally minimized by the divisor method arising from the rounding function r(x)=⌈x-d⌉ where d is an arbitrary constant with 0≤d≤1.

Balinski & Young in their 1982 book Fair representation, and Sainte-Laguë 1910, both provided proofs of the Owens 1921 theorem about Webster. When I looked them up I immediately saw that their arguments generalize immediately to prove this general-d theorem. The resulting proofs have the same validity (or lack thereof) as their original proofs.

### Geometric interpretations

Geometric interpretation of Webster: The vector V of ideal (albeit not necessarily integer) seat-counts for the N states is an N-vector constrained to lie on the hyperplane with sum=#seats. We wish to replace it with a lattice point, i.e. all-integer N-vector, which also lies on this hyperplane. Webster chooses the hyperplane lattice point nearest to the line that passes through both V and the origin 0.

Geometric interpretation of Hamilton/Vinton: It simply finds the closest hyperplane lattice point to V (actually not only in the usual L2 distance, but also in any Lq distance for any q≥1).

Here's another method (the "Hilbert method" to put a name on it): find the nearest hyperplane lattice point, but "nearest" in the Hilbert projective metric rather than the usual Euclidean metric. This metric's distance function is

HPdist(X, Y) = log maxj,k (Xj Yk) / (Xk Yj).

giving the Hilbert distance between two N-vectors X and Y with all-positive entries. In other words, for all j,k, Hilbert minimizes |log (Pj Sk) / (Pk Sj)|.

New claim: the Hilbert and Huntington-Hill methods are equivalent. (Incidentally, it is better than this, in the sense that it is not only the max that matters; if we remove that max, then the next max also gets minimized by Huntington-Hill's apportionment of the remaining seats, and so on recursively.)

### Truly Unbiased Methods

Based on computer numerical experiments, Webster seems to be, on average over a large number of different randomized situations, the "least biased" of the above divisor methods... and we have a new method which can argue to be even less biased... but all of the methods discussed above, including Webster and the new one, are "biased" in some situations. That is, we can always set up an artificial scenario in which the "small states" always get more than their fair share of seats and the "big states" always get less (or the reverse) with any of these methods.

However, it is possible to devise a truly unbiased method by using randomized rounding.

Randomized rounding refers to the idea of rounding a number like 3.7 upward to 4 with probability 0.7 and downward to 3 with probability 0.3. (More generally you round a number with fractional part x, 0<x<1 upward with probability x.) Note that this preserves the expected value (here 3.7) of the original number, i.e, in that sense is exactly unbiased. We can choose the quantity we wish to be unbiased, round it in (an appropriate version of) this manner for every state, and then if the right total number of seats results, we quit (otherwise we do it again with new randomness).

This idea will truly be unbiased – but unfortunately it is randomized, not deterministic.

Another idea, suggested to me by Raphfrk, is to make corrections over time. That is, we keep track for each state of its historical "bias budget." If that state has historically had too many seats, we round it down next time; if it has historically had too few seats, we round it up next time. There are various ways to do that, but the point is that averaged over time (and assuming the state-population changes are not insanely wild) such a technique guarantees exact unbiasedness.