Ω = sample space
P = probability measure
P(A) >= 0 for every event A

P(Ω) = 1

P(A
∪
B) = P(A) + P(B)
if A,B disjoint
○

P(A
∪
B) = P(A) + P(B)  P(A
∩
B)
if A,B not disjoint
○

P(A) = 1  P(A
ᶜ
)

P(0) = 0

Properties of unions/intersections
A
∪
B = B
∪
A , same for
∩

A
∪
(B
∪
C) = (A
∪
B)
∪
C = A
∪
B
∪
C , same for
∩

A
∩
(B
∪
C) = (A
∩
B)
∪
(A
∩
C)

A
∪
(B
∩
C) = (A
∪
B)
∩
(A
∪
C)

Properties of complements
(A
∩
B)
ᶜ
= (A
ᶜ
)
∪
(B
ᶜ
)

(A
∪
B)
ᶜ
= (A
ᶜ
)
∩
(B
ᶜ
)

A = (A
∩
B
ᶜ
)
∪
(A
∩
B)

Events are disjoint when
A
∩
B = Ø

No outcomes are shared

At most only one of the events can occur

Conditional Probability
P(A  B) =
!
(
# ∩ &
)
!
(
&
)

P(A  A) = 1

P(Ω  B) = 1

P(A
∪
B  C) = P(A  C) + P(B  C)
if A,B disjoint
○

P(A  B) = 1  P(A
ᶜ
 B)

P(A  B) =
࠵?(࠵?  ࠵?)
!
(
#
)
!
(
&
)

࠵?(࠵?) = ࠵?( ࠵?  ࠵?
(
) ∗ ࠵?(࠵?
(
) + ࠵?(࠵?  ࠵?) ∗ ࠵?(࠵?)
Given 2 events A, B with P(B) > 0
○

A = (A
∩
B1)
∪
(A
∩
B2)
∪
... (A
∩
Bn)
if B1,B2,...Bn form a partition
○

Independence
"Knowing that A has occurred doesn't make B any more or less likely to occur"

P(A
∩
B) = P(A)P(B) implies independence

P(A  B) = P(A)
if A,B independent
○

Ø is independent of itself

If A,B are disjoint, then A,B are not independent
if 0 < P(A), P(B) < 1
○

Pairwise independent if events are independent in groups of 2
If A,B,C are P.I. then...
○
P(A
∩
B) = P(A)P(B)
○
P(B
∩
C) = P(B)P(C)
○
P(A
∩
C) = P(A)P(C)
○

Jointly independent if any number of events are independent
Jointly implies pairwise not vice versa
○
If A,B,C are J.I. then...
○
P(A
∩
B
∩
C) = P(A)P(B)P(C)
○
as well as all P.I. formulas
○

Counting
Equally likely outcomes (P is same for all, finite sample space Ω)
Then, P(E) =
# *+,(*./ 01 2
# *+,(*./ ,*,34 01
Ω
○

Types of Counting
Ordered
Order matters:
§
Ex: SSN, phone #
§
○
Unordered
Order doesn't matter
§
Ex: Poker hands
§
○
With replacement
Ex: Rolling a 2 in the first dice roll doesn't prevent another 2 from being rolled
§
○
Without replacement
Ex: Drawing the ace of spades prevents it from being drawn again
§
○

Counting problems
Ordered with replacement
"How many ways can we assign k states to N objects?" >
࠵?
5
○
Ex: How many distinct SSNs are there? —>
10
6
○
Ex: How many sequences of N coin flips are possible? —>
2
5
○

Ordered without replacement
"How many ways can we order k distinct objects from a pool of N?"
○
Ex: How many ways can k students out of N line up to exit the room? —>
5!
(
589
)
!
࠵?࠵? ࠵?! (࠵?࠵?࠵?)
○

Unordered without replacement
"How many ways can k unordered objects be drawn from a pool of N?"
○
Ex: How many 5 card poker hands are possible (52 card deck)? —>
(
52࠵?5
)
=
:;!
:!<=!
○
Note: (Ordered w/o replacement) = (Unordered w/ replacement) * (# of ways of
ordering k objects)
○

Unordered with replacement

__________TEST 2 ___________
PMF
P(X = k)

Valid if all values add up to 1, P(X = k) > 0 for all k

Joint PMF
P(X = x, Y = y)
comma means
∩
○

Marginal PMF
Same idea for Y
○

There can be (X,Y) and (X^, Y^) such that joint PMFs are different but marginal PMFs are
the same

Conditional PMF
Given event A:

P(X = x  A) =
!
({
?@A
}
∩ #
)
!
(
#
)


Example:


Given random variables X, Y:

࠵?
?D
(
࠵?

࠵?) = ࠵?
(
࠵? = ࠵?  ࠵? = ࠵?
)
=
!(?@A, D@F)
!(D@F)
Joint PMF / Marginal PMF
○

࠵?
(
࠵? = ࠵?, ࠵? = ࠵?
)
= ࠵?
(
࠵? = ࠵?

࠵? = ࠵?) ∗ ࠵?(࠵? = ࠵?)

࠵?
(
࠵? = ࠵?, ࠵? = ࠵?
)
= ࠵?
(
࠵? = ࠵?

࠵? = ࠵?) ∗ ࠵?
(
࠵? = ࠵?
)

࠵?(࠵? = ࠵?  ࠵? = ࠵?) ∗ ࠵?(࠵? = ࠵?) = ࠵?(࠵? = ࠵?  ࠵? = ࠵?) ∗ ࠵?(࠵? = ࠵?)

࠵?(࠵? = ࠵?  ࠵? = ࠵?) =
࠵?(࠵? = ࠵?  ࠵? = ࠵?) ∗ ࠵?(࠵? = ࠵?)
࠵?(࠵? = ࠵?)

Expected value

"k times probability of X = k for all possible values of k"


E[f(X)] != f(E[X])

Given constant a, b
E[a] = a
○
E[aX] = aE[X]
○
E[aX + b] = aE[X] + b
○

Joint Expectation

E[X + Y] = E[X] + E[Y]
Left side requires joint PMF, right side only requires marginal PMFs
○

"Indicator" random variables
࠵?
0
= 1
if *criteria 1*
§
࠵?
0
= 0
if *criteria 2*
§
○
○

Conditional expectation
Given event A:


Given random variables X, Y:


Law of total expectation:


Variance
(࠵?࠵?࠵?࠵?࠵?)
;
Var(X) =
࠵?
[(
࠵? − ࠵?
[
࠵?
])
;
]

Var(X) =
࠵?
[
࠵?
;
]
−
(
࠵?
[
࠵?
])
;
E[X] is a constant
○
"Expected/mean squared error"
○

Var(X) >= 0
Var(X) = 0 when X is degenerate (all probability is on one point —> ex. P(X = 0) = 1)
○

sd(X) = sqrt(Var(X))

Var(X) = Var(X)

Given constant a, b
Var(a) = 0
○
Var(aX + b) =
࠵?
;
࠵?࠵?࠵?
(
࠵?
)
○

Given Y = aX + b,
for constants a, b
Var(Y) =
࠵?
;
࠵?࠵?࠵?
(
࠵?
)
○

Covariance
(
࠵?࠵?࠵?࠵?࠵?
)
;
࠵?࠵?࠵?
(
࠵?, ࠵?
)
= ࠵?
[(
࠵? − ࠵?
[
࠵?
])(
࠵? − ࠵?
[
࠵?
])]

࠵?࠵?࠵?(࠵?, ࠵?) = ࠵?[࠵?࠵?] − ࠵?[࠵?]࠵?[࠵?]
Note: E[X], E[Y] are constants
○
Cov(X, Y) can be + or 
○

Cov(X, X) = Var(X)

Cov(X, Y) = Cov(Y, X)

Given constants a, b, c, d:

Cov(aX + b, cY + d) = (ac)Cov(X, Y)

Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z)

Given Y = aX + b,
for constants a, b
࠵?࠵?࠵?
(
࠵?, ࠵?
)
= ࠵? ∗ ࠵?࠵?࠵?(࠵?)
○

Correlation
1 <= ρ(X, Y) <= 1
○
ρ(X, Y) > 0 —> Positive correlation
○
ρ(X, Y) < 0 —> Negative correlation
○

Given Y = aX + b,
for constants a, b
ρ(X, Y) = 1 if a > 0
○
ρ(X, Y) = 1 if a < 0
○

Independence
Random variables X and Y are independent if:

P(X = x, Y = y) = P(X = x) * P(Y = y)
for all possible x, y
Given X, Y are independent:
P(X = x  Y = y) = P(X = x)
○
P(X
∈
A, Y
∈
B) = P(X
∈
A) * P(Y
∈
B)
Ex: P(X < 3, Y > 2) = P(X < 3) * P(Y > 2)
§
Doesn't work when X,Y mixed —> P(X+Y > 4) or P(X > Y)
§
○

E[f(X) * g(Y)] = E[f(X)] * E[g(Y)]
○
E[XY] = E[X] * E[Y]
○
Var(X + Y) = Var(X) + Var(Y)
○
Cov(X, Y) = 0
Thus, X, Y are "uncorrelated"
§
○
For many RVs: X1, X2, ... Xn

Independent if
࠵?
(
࠵?
G
= ࠵?
G
, ... , ࠵?
1
= ࠵?
1
)
= ࠵?
(
࠵?
G
= ࠵?
G
)
∗ ⋯ ∗ ࠵?(࠵?
1
= ࠵?
1
)

for all possible
࠵?
G
, ࠵?
;
, ... , ࠵?
1
Then:

࠵?
[
࠵?
G
, ࠵?
;
... , ࠵?
1
]
= ࠵?
[
࠵?
G
]
࠵?
[
࠵?
;
]
... ࠵?
[
࠵?
1
]
࠵?࠵?࠵?
(
࠵?
G
+ ࠵?
;
+ ⋯ + ࠵?
1
)
= ࠵?࠵?࠵?
(
࠵?
G
)
+ ࠵?࠵?࠵?
(
࠵?
;
)
+ ⋯ + ࠵?࠵?࠵?(࠵?
1
)
Bernoulli
: X ~ Ber(p)
X takes value 0 or 1
X = 1 —> Success
○
X = 0 —> Failure
○

p = probability of success
0 <= p <= 1
○

P(X = 1) = p

P(X = 0) = 1  p

E[X] = p

Var(X) = p(1  p)

Geometric
: X ~ Geom(p)
"The number of independent trials until 1 success (inclusive)"

p = probability of success
0 < p < 1
○

P(X = k) =
࠵?
(
1 − ࠵?
)
98G
k = 1,2,3,...
○

E[X] =
G
H

Var[X] =
G8H
H
!

Binomial
: X ~ Bin(n, p)
"Number of successes out of n independent trials each with success probability p"

p = probability of success
0 <= p <= 1
○

n = number of trials
n = 1,2,3,...
○

P(X = k) =
࠵?࠵?࠵? ∗ ࠵?
9
(
1 − ࠵?
)
189
k = 0,1,2,...,n
○

E[X] = np

Given:
X ~ Bin(n, p)
○
Y ~ Bin(m, p)
○

Then for Z = X + Y
Z ~ Bin(n + m, p)
○
Binomial = sum of independent Bernoullis
࠵?
G
, ࠵?
;
, ... ࠵?
1
are independent,
" " ~ Ber(p)
࠵?
0
= 1
if
࠵?
,I
trial succeeds
§
࠵?
0
= 0
otherwise
§
○
Then:
○
Counts the number of successes
§
○
Gives mean
§
○
Thanks to independence
§
○

Poisson
: X ~ Poisson(λ)
"Number of arrivals at rate λ within a unit of time"

λ = average rate per unit time
λ > 0
○

P(X = k) =
(࠵?
8
λ
)
λ
"
9!
k = 0,1,2,...
○

Limiting case of binomial with many trials (n very large), each of which is very
unlikely/rare
Xn ~ Bin(n , λ/n) for each integer n > λ
○
^ approximately

E[X] = λ
This justifies calling λ the "expected arrival rate"
○

Var[X] = λ

Given: Z ~ Poisson(λ + μ)
P(Poisson(λ + μ) = n) = P(Z = n)
○
P(Z = n) =
.
#(
λ
%
μ
)
1!
(
λ + μ
)
1
○

Cheat Sheet
Tuesday, October 4, 2022
6:50 PM