CLRS Appendix C: Comparing sums of Bernoulli trials
Here is an interesting exercise on probability distributions.
Exercise C.4-9
Let \(X\) be the random variable for the total number of successes in a set \(A\) of \(n\) Bernoulli trials, where the \(i\)th trial has a probability \(p_i\) of success, and let \(X'\) be the random variable for the total number of successes in a second set \(A'\) of \(n\) Bernoulli trials, where the \(i\)th trial has a probability \(p_i' \geq p_i\) of success. Prove that for \(0\leq k\leq n\),
\(P(X'\geq k) \geq P(X\geq k)\).
(Hint: Show how to obtain the Bernoulli trials in \(A'\) by an experiment involving the trials of \(A\), and use the result of Exercise C.3-7.)
I found this exercise interesting as the result is intuitive, but it’s tricky to find a way to “compare” the distributions, as per the hint.
This is one clever solution I found online, which seems obvious in hindsight:
Consider comparing each Bernoulli trial to picking a random value \(S\) on the unit interval. If \(S \leq p_i\), we can call that a success in set \(A\). If \(S \leq p_i'\), we can call that a success in set \(A'\). If \(S \leq p_i\), then \(S \leq p_i'\). Then clearly, whenever there is a success for set \(A\), there must be a success for set \(A'\). So \(X' \geq X\), and \(P(X' \geq k) \geq P(X \geq k)\).