Basic Probability

Probability represents how likely an event is to occur. The probability of an outcome can be measured as \[ P = \frac{\#\ Outcomes}{\#\ Trials}\]

Imagine you didn’t know how many times the number 6 appeared on a standard die. How could you figure it out?

Basic Probability

To get a true measure of probability, we have to observe an infinite number of trials, which is obviously impossible.

Instead, we are stuck merely observing a sample of possible trials.

We try to make this sample as representative as we can, but sometimes logistics and chance conspire to make a sample of trials unrepresentative.

If the sample of trials is unrepresentative, then our estimate of probability may be badly wrong.

Imagine you are interested in studying the average body mass of primate species. You study museum specimens curated at the Smithsonian to get at your research question. How might the sample be unrepresentative, and how might that skew your understanding of species average body mass?

Sampling in R

There are functions that make it very easy to sample from vectors in R.

We can use the sample() function to draw a sample of 10 individuals from this population.

Notice that when we do this twice, the samples are very rarely identical.

individuals <- 1:1000
sample(individuals, 10)
##  [1] 844 565 447 401 148 407 973 169  65 734
sample(individuals, 10)
##  [1] 196 253 872 602 425 684 153 118 886 342

Quantifying probability

Outcome = what happens during a particular event

Sample space = the universe of possible outcomes.

Outcomes should be exhaustive in describing the sample space and the outcomes should be mutually exclusive.

When these conditions are met, then the sum of probabilities for all outcomes in the sample space is 1.

For example, when quantifying risk probabilities, organisms are coded as either 1) alive or 2) dead. These outcomes meet the requirements above, and the probability of an organism being in one of these states is 1.

Complex events = composites of simple events (Logical OR)

sum the probabilities of simple events

e.g. probability of drawing a King from a deck of cards is the sum of the probability of drawing each of the 4 distinct kings

\[1/52 + 1/52 + 1/52 + 1/52 = 4/52 = 1/13 \]

Shared events = multiple simultaneous occurrences of simple events (Logical AND)

multiply the probabilities of simple events

assume that events are independent of one another

e.g. probability of drawing the king of hearts is the probability of drawing a king multiplied by the probability of drawing a heart

\[ 1/4 * 1/13 = 1/52 \]

Shared events- challenge

If you drew two cards from a standard deck, what is the probability of drawing the 2 of Clubs and the Jack of Spades together?

Does that feel intuitively correct to you? These events are only improbable from the perspective of the individual outcome. But the probabilities of all the outcomes sum to 1, meaning highly improbable outcomes are guaranteed to happen all the time!

Conditional Probability

Sometimes probabilities in a complex event depend on previous outcomes. To calculate the probability of A given B, we use the following formula.

\[P(A|B) = \frac{P(A)*P(B)}{P(B)}\]

Conditional Probabilty is fundamental

It forms the basis of frequentist statistics, as well as Bayesian statistics, which we will talk about much more next week.