In English, people say two things are “independent” to
mean they “have nothing to do with one another.” This means the same thing in statistics. But how do statisticians formalize this idea?
Recall that 19% (= Pr(C)) of EMBAs own cats, and 48.5%
(= Pr(D)) of EMBAs own dogs. If cat ownership
and dog ownership are independent of each other, then we would expect 19% of
dog owners to also own cats (or 48.5% of cat owners to also own dogs). In other words, the frequency of one random
event (cat ownership) is unaffected by knowledge of the other (dog
ownership). This implies the percentage
of people who own both dogs and cats (= Pr(C∩D)) should be .19 × .485 (= Pr(D)
× Pr(C)) = .092 = 9.2%. In other words,
independence in statistics means a particular mathematical condition must
hold. This notion is formalized in the
following definition.
Definition. We say two events C and D are independent if
and only if Pr(C∩D) = Pr(D) × Pr(C).
This is not a formula but rather a condition that we
must check if we want to claim two events are independent. If the condition is true, then the events are
independent. If not, they are
dependent. For example, are dog
ownership and cat ownership independent in our previous example? Let’s check.
Pr(C∩D) = 7.5%, and Pr(D) × Pr(C) = .19 × .485 = 9.2%. These percentages are not equal, so dog
ownership and cat ownership are not
independent.