Probability & Statistics

Probability

Set Theory

A set is a collection of elements
- Elements are members of a set
$s \in S$ means "the element $s$ is a member of the set $S$
The empty set $\emptyset$ contains no elements
- It is empty
$S = {1, 3, 5, 7, 9}$
- $S$ is a set consisting of those integers
$S = {n : n is a prime number and n \leq 12}$
- $S = {1, 2, 3, 5, 7, 11}$
$S = {x : x^{2} = 4 and x is odd}$
- $S = \emptyset$
$A \subset S$
- $A$ is a subset of $S$
- $a \in A$ implies $a \in S$
$\emptyset \in S$ for all sets $S$
$A = B$ if and only if $A \subset B$ and $B \subset A$
$A \cup B$ is the union of $A$ and $B$
- Set of elements belonging to $A$ or $B$
$A \cap B$ is the intersection of $A$ and $B$
- Set of elements belonging to $A$ and $B$
Disjoint sets have no common elements
- $A \cap B = \emptyset$
$A ∖ B$ is the different of $A$ and $B$
- Set of elements belonging to $A$ but not $B$
$A^{c}$ is the complement of $A$
- Set of elements not belonging to $A$

Random Processes & Probability

The probability of event $A$ occurring is denoted $P (A)$ . This is the relative frequency of event $A \in S$ occurring in a random process within sample space S.

$S$
- Certain or sure event, guaranteed 100% to happen
$\emptyset$
- Impossible event, won't happen
$a \in S$
- Elementary event, the only event that can happen, the only possible outcome
$A \cup B$
- Event that occurs if $A$ or $B$ occurs
$A \cap B$
- Event that occurs if $A$ and $B$ occur
$A^{c} = S ∖ A$
- Event that occurs if $A$ does not occur
$A \cup B = \emptyset$
- Events $A$ and $B$ are mutually exclusive

Example

Toss a coin 3 times and observe the sequence of heads and tails.

Sample space $S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}$
Event that $\geq 2$ heads occur in succession $A = {HHH,HHT,THH}$
Event that 3 heads or 3 tails occur $B = {HHH,TTT}$
$A \cup B = {HHH,HHT,THH,TTT}$
$A \cap B = {HHH}$
$A^{c} = {HTH,HTT,THT,TTH,TTT}$
$A^{c} \cup B = {TTT}$

Another Example

Sample space $S = {17, 18, 19, 20, 21, 22}$ . Each number is an individual event.

Events	Frequency	Relative Frequency
17	3	3/35
18	4	4/35
19	9	9/35
20	11	11/35
21	6	6/35
22	2	2/35

Axioms & Laws of Probability

$0 \leq P (A) \leq 1$ for all $A \subset S$
- Probabilities are always between 0 and 1 inclusive
$P (S) = 1$
- Probability of the certain event is 1
If $A \cap B = \emptyset$ then $P (A \cup B) = P (A) + P (B)$
- If two events are disjoint, then the probability of either occurring is equal to the sum of their two probabilities
$P (\emptyset) = 0$
- The probability of the impossible event is zero
$P (A^{c}) = 1 - P (A)$
- The probability of all the elements not in A occurring is the opposite of the probability of all the elements in A occurring
If $A \subset B$ , then $P (A) \leq P (B)$
- The probability of A will always be less than or equal to the probability of B when A is a subset of B
$P (A ∖ B) = P (A) - P (A \cup B)$
- The probability of A minus B is equal to the probability of A minus the probability of A and B
$P (A \cup B) = P (A) + P (B) - P (A \cap B)$
- Probability of A or B is equal to probability of A plus the probability of B minus the probability of A and B
- This is important

Example

In a batch of 50 ball bearings:

15 have surface damage ( $A$ )
- $P (A) = 0.3$
12 have dents ( $B$ )
- $P (B) = 0.24$
6 both have defects ( $A \cap B$ )
- $P (A \cap B) = 0.12$

The probability a single ball bearing has surface damage or dents: $P (A \cup B) = P (A) + P (B) - P (A \cap B) = 0.3 + 0.24 - 0.12 = 0.42$

The probability a single ball bearing has surface damage but no dents: $P (A \cap B^{c}) = P (A ∖ B) = P (A) - P (A \cap B) = 0.3 - 0.12 = 0.18$

Conditional Probability & Bayes' Theorem

A conditional probability $P (A ∣ B)$ is the probability of event $A$ occurring, given that the event $B$ has occurred.

$P (A ∣ B) = \frac{P ( A \cap B )}{P ( B )}$

Bayes' theorem:

$P (A ∣ B) = \frac{P ( B ∣ A ) P ( A )}{P ( B )}$

Axioms of conditional probability:

$P (B) = P (B ∣ A) P (A) + P (B ∣ A^{c}) P (A^{c})$
$P (A \cup B ∣ C) = P (A ∣ C) + P (B ∣ C) - P (A \cap B ∣ C)$

Example

In a semiconductor manufacturing process:

$A$ is the event that chips are contaminated
- $P (A) = 0.2$
$F$ is the event that the product containing the chip fails
- $P (F ∣ A) = 0.1$ and $P (F ∣ A^{c}) = 0.005$

Determining the rate of failure: $P (F) = P (F ∣ A) P (A) + P (F ∣ A^{c}) P (A^{c}) = 0.1 \times 0.2 + 0.005 \times 0.8 = 0.024$

Independent Events

Two events are independent when the probability of one occurring does not dependend on the occurrence of the other. An event $A$ is independent if and only if $P (A \cap B) = P (A) P (B)$

Example

Using the coin flip example again with a sample space $S$ and 3 events $A, B, C$

$S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}$
- $P (S) = 1$
$A = {HHH, HHT, HTH, HTT}$
- $P (A) = 0.5$
$B = {HHH, HHT, THH, THT}$
- $P (B) = 0.5$
$C = {HHT, THH}$
- $P (C) = 0.25$

A and C are independent events:

$A \cap C = {HHT}$
$P (A \cap C) = 0.25 = 0.5 \times 0.25 = P (A) P (C)$

B and C are not independent events:

$B \cap C = {HHT,THH}$
$P (B \cap C) = 0.25 \neq = 0.25 \times 0.5 = P (B) P (C)$

Discrete Random Variables

For a random process with a discrete sample space $S$ , a discrete random variable $X$ is a function that assigns a real number to each outcome $s \in S$ .

$X$ is a measure related to the random distribution.
Denoted $P (X = a)$

Consider a weighted coin where $P (H) = 0.75$ and $P (T) = 0.25$ . Tossing the coin twice gives a sample space $S = {TT, TH, HT, HH}$ , which makes the number of heads a random variable $X (s) = {0, 1, 2}$ . Since successive coin tosses are independent events:

$P (TT) = 0.0625$
$P (T H) = 0.1875$
$P (H T) = 0.1875$
$P (HH) = 0.5625$

Events are also mutually exclusive, so:

$f (0) = P (TT) = 0.0625$
$f (1) = P (T H) + P (H T) = 0.375$
$f (2) = P (HH) = 0.5625$

This gives a probability distribution function $f (x) = P (X = x)$ of:

$x$	$f (x)$
$0$	$0.0625$
$1$	$0.375$
$2$	$0.5625$

Cumulative Distribution Functions

The cumulative probability function gives a "running probability" $F_{X} (x_{i}) = P (X \leq x_{i}) = j = 1 \sum i f (x_{j})$

if $x_{i} \leq x_{j}$ then $F_{X} (x_{i}) \leq F_{X} (x_{j})$
$F_{X} (x_{1}) = f (x_{1})$
$F_{X} (x_{n}) = 1$

Using coin example again:

$x$	$F_{X} (x)$
$0$	$0.0625$
$1$	$0.4375$
$2$	$1$

Expectation & Variance

Expectation is the average value, ie the value most likely to come up
- The mean of $X$

$E (X) = i = 1 \sum n x_{i} f (x_{i}) = μ_{X}$

Variance is a measure of the spread of the data

$Va r (X) = i = 1 \sum n (x_{i} - μ_{x})^{2} f (x_{i}) = E (X^{2}) - (E (X))^{2} = σ_{X}^{2}$

Standard deviation $σ_{X} = Va r (X)$

Using the weighted coin example once more:

$E (X) = 0 \times 0.0625 + 1 \times 0.375 + 2 \times 0.5625 = 1.5$ $E (X^{2}) = 0^{2} \times 0.0625 + 1^{2} \times 0.375 + 2^{2} \times 0.5625 = 2.625$ $Va r (X) = E (X^{2}) - (E (X))^{2} = 1.5 - 2.62 5^{2} = 0.375$

Standardised Random Variable

The standardised random variable is a normalised version of the discrete random variable, obtained by the following transformation: $X^{*} = \frac{X - μ _{X}}{σ _{X}}$

$E (X^{*}) = 0$
$Va r (X^{*}) = 1$

Binomial Distribution

The binomial distribution models random processes consisting of repeated independent events
Each event has only 2 outcomes, success or failure
- $P (s u ccess) = p$
- $P (f ai l u re) = q = 1 - p$

The probability of $k$ successes in $n$ events:

$b (k; n; p) = (k n) p^{k} q^{n - k}, k = 0, 1, 2, ..., n$

Probability of no success $= q^{n}$
Probability of $\geq 1$ successes is $1 - q^{n}$

Expectation & Variance

$μ = n p$ $σ^{2} = n pq$

Example

A fair coin is tossed 6 times. $p = q = 0.5$

Probability of exactly 2 heads out of 6 $b (2; 6; 0.5) = (2 6) \times 0. 5^{2} \times 0. 5^{4} = \frac{15}{64}$

Probability of $\geq 1$ heads $1 - q^{6} = 1 - 0. 5^{6} = \frac{63}{64}$

Probability of $\geq 4$ heads

$b (4; 6; 0.5) + b (5; 6; 0.5) + b (6; 6; 0.5) = (2 6) (\frac{1}{2})^{4} (\frac{1}{2})^{2} + (2 6) (\frac{1}{2})^{5} (\frac{1}{2})^{1} + (\frac{1}{2})^{6} = \frac{11}{32}$

Expected value $E (X)$ $μ = n p = 6 \times 0.5 = 3$

Variance $σ^{2} = n pq = 6 \times 0.5 \times 0.5 = 1.5$

Poisson Distribution

Models a random process consisting of repeated occurrence of a single event within a fixed interval. The probability of $k$ occurrences is given by $p (k; λ) = \frac{λ ^{k}}{k !} e^{- λ}, k = 0, 1, 2, ...$

The poisson distribution can be used to approximate the binomial distribution with $λ = n p$ . This is only valid for large $n$ and small $p$

Expectation & Variance

$μ = σ^{2} = λ$

Example

The occurrence of typos on a page is modelled by a poisson distribution with $λ = 0.5$ .

The probability of 2 errors: $p (2; 0.5) = \frac{0. 5 ^{2}}{2 !} e^{- 0.5} = 0.076$

Continuous Random Variables

Continuous random variables map events from a sample space to an interval. Probabilities are written $P (a \leq X \leq b)$ , where $X$ is the random variable. $X$ is defined with a continuous function, the probability density function.

The function must be positive
- $f (x) \geq 0$
The total area under the curve of the function must be 1
- $\int_{- \infty}^{\infty} f (x) d x = 1$
$P (a \leq X \leq b) = \int_{a}^{b} f (x) d x$

Example

$f (x) = {a (x - x^{2}) 0 0 \leq x \leq 1 o t h er w i se$

Require that $\int_{- \infty}^{\infty} f (x) d x = 1$ , so have to find $a$ : $\int_{- \infty}^{\infty} f (x) d x = \int_{0}^{1} a (x - x^{2}) d x = a [\frac{x ^{2}}{2} - \frac{x ^{3}}{3}]_{0}^{1} = \frac{a}{6} \Rightarrow a = 6$

Calculating some probabilities: $P (0 \leq X \leq 0.5) = \int_{0}^{0.5} f (x) d x = \int_{0}^{0.5} 6 (x - x^{2}) d x = 6 [\frac{x ^{2}}{2} - \frac{x ^{3}}{3}]_{0}^{0.5} = 0.5$ $P (0.25 \leq X \leq 0.75) = \int_{0.25}^{0.75} f (x) d x = \int_{0.25}^{0.75} 6 (x - x^{2}) d x = 6 [\frac{x ^{2}}{2} - \frac{x ^{3}}{3}]_{0.25}^{0.75} = \frac{11}{16}$

Cumulative Distribution Function

The cumulative distribution function $F_{X}$ up to the point $a$ is given as $F_{X} (a) = \int_{- \infty}^{a} f (x) d x$

if $a \leq b$ , then $F_{X} (a) \leq F_{X} (b)$
$lim_{x \to - \infty} F_{X} (x) = 0$
$lim_{x \to \infty} F_{X} (x) = 1$
$\frac{d}{d x} F_{X} (x) = f (x)$
- Derivative of cumulative distribution function is the probability distribution function

Using previous example, let $F_{X} (x) = \int_{- \infty}^{x} f (t) d t$ . For $x < 0$ $F_{X} (x) = 0$

For $0 \leq x \leq 1$ $F_{X} (x) = \int_{0}^{x} 6 (t - t^{2}) d t = 6 [\frac{t ^{2}}{2} - \frac{t ^{3}}{3}]_{0}^{x} = 3 x^{2} - 2 x^{3}$

For $x > 1$ $F_{X} (x) = \int_{0}^{1} 6 (t - t^{2}) d t = 6 [\frac{t ^{2}}{2} - \frac{t ^{3}}{3}]_{0}^{1} = 1$

Expectation & Variance

Where $X$ is a continuous random variable:

$E (X) = \int_{- \infty}^{\infty} x f (x) d x = μ$ $Va r (X) = \int_{- \infty}^{\infty} (x - μ)^{2} f (x) d x = σ_{X}^{2} = E (X^{2}) - μ^{2}$

Uniform Distribution

A continuous distribution with p.d.f:

$f (x) = {\frac{1}{b - a} 0 a \leq x \leq b o t h er w i se$

Expectation and variance:

$μ = \frac{a + b}{2}$ $σ^{2} = \frac{( b - a ) ^{2}}{12}$

Cumulative distribution function:

$F_{X} (x) = ⎩ ⎨ ⎧ 0 \frac{x - a}{b - a} 0 - \infty < x < a a \leq x \leq b b < x < \infty >>$

Exponential Distribution

A continuous distribution with p.d.f:

$f (x) = {0 v e^{- vx} - \infty < x < 0 0 \leq x < \infty$

Expectation and variance:

$μ = \frac{1}{v}$ $σ^{2} = \frac{1}{v ^{2}}$

Cumulative distribution function:

$F_{X} (x) = {0 1 - e^{- vx} - \infty < x < 0 0 \leq x < \infty$

Recall that a discrete random process $X$ where a single event occurs $i$ times in a fixed interval is modelled by a Possion distribution $p (k; λ)$
- $E (X) = λ$
Consider a situation where the event occurs at a constant mean rate $v$ per unit time
Let $λ = v t$ , then $P (0) = e^{- v t}$ and probability of $\geq 1$ events occurring is $1 - e^{- v t}$
Suppose the continuous random variable $Y$ is the time between occurrences of successive events
If there is a period of time $t$ with no events, then $Y > t$ and $P (Y > t) = e^{- v t}$
If $\geq 1$ events occur then $Y \leq t$ and $P (Y \leq t) = 1 - e^{- v t}$

If the number of events per interval of time is Possion distributed, then the length of time between events is exponentially distributed

Example

Calls arrive randomly at the telephone exchange at a mean rate of 2 calls per minute. The number of calls per minute $X$ is a d.r.v. which can be modelled by a Poisson distribution with $λ = 2$ . The probability of 1 call in any given minute is:

$P (X = 1) = \frac{λ e ^{(} - λ )}{1 !} = 2 e^{- 2} = 0.27$

The time between consecutive calls $Y$ is a c.r.v. modelled by an exponential distribution with $v = \frac{λ}{t} = \frac{2}{1} = 2$ . The probability of at least 1 ( $\geq 1$ ) minute between calls is: $p (1 \leq Y \leq \infty) = \int_{1}^{\infty} v e^{- v t} d t = \int_{1}^{\infty} 2 e^{- 2 t} d t = [- e^{- 2 t}]_{1}^{\infty} = 0.135$

Normal Distribution

A distribution with probability density function:

$f (x) = \frac{1}{σ 2 π} e^{- \frac{( x - μ ) ^{2}}{2 σ ^{2}}}$

Expectation $E (X) = μ$ and variance $Va r (X) = σ^{2}$ . Normal distribution is denoted $N (μ, σ^{2})$ and is defined by its mean and variance.

Standardised Normal Distribution

$X$ is a random variable with distribution $N (μ, σ^{2})$ . The standardised random variable $U$ is distributed $N (0, 1)$ and can be obtained with the transform: $U = \frac{X - μ}{σ}$ and has p.d.f. $f (u) = \frac{1}{2 π} e^{- \frac{u ^{2}}{2}}$

$P (X \leq b) = P (U \leq β)$ where $β = \frac{b - μ}{σ}$ . Values for the standard normal distribution are tabulated in the data book.

Example

The length of bolts $x$ from a production process are distributed normally with $μ = 2.5$ and $σ^{2} = 0.01$ .

$u = \frac{x - μ}{σ} = \frac{x - 2.5}{0.1}$ The probability the length of a bolt is between 2.6 and 2.7 cm (values obtained from table lookups): $P (2.6 \leq X \leq 2.7) = P (\frac{2.6 - 2.5}{0.1} \leq U \leq \frac{2.7 - 2.5}{0.1}) = P (1 \leq U \leq 2)$ $= P (0 \leq U \leq 2) - P (0 \leq U \leq 1) = 0.4772 - 0.3413 = 0.1359$

Confidence Intervals

A confidence interval is the interval in which we would expect to find an estimate of a parameter, at a specified probability level. For example, the interval covering 95% of the population of $N (μ, σ^{2})$ is $μ \pm 1.96 σ$ .

For a random variable $X$ with distribution $N (67.5, 2. 5^{2})$ , the standard variate $u = \frac{x - 67.5}{2.5}$ . For confidence interval at 95% probability:

$Q (u) = \frac{0.95}{2} = 0.475$

Using table lookups, $u = \pm 1.96$ , and: $x = μ \pm 1.96 σ = 67.5 \pm 1.96 \times 2.5 = 67.5 \pm 4.9$

For confidence interval at 99.9% probability:

$Q (u) = \frac{0.999}{2} = 0.4995$

Table lookups again, $u = \pm 3.3$ , and: $x = μ \pm 3.3 σ = 67.5 \pm 3.3 \times 2.5 = 67.5 \pm 8.25$

Normal Approximation to Binomial Distribution

The normal distribution gives a close approximation to the binomial distribution, provided:

$n$ is large
neither $p$ nor $q$ are close to zero
$μ = n p$ and $σ^{2} = n pq$

For example, take a random process consitsting of 64 spins of a fair coin $n = 64$ and $p = q = 0.5$ . The probability of 40 heads is: $P (40) = (40 60) \times 0. 5^{64} = 0.01359$ $μ = n p = 32, σ = n pq = 4$

For a normal approximation, must use the interval around 40 (normal is continuous, binomial is discrete) $[39.5, 40.5]$ :

$P (39.5 \leq X \leq 40.5) = P (\frac{39.5 - 32}{4} \leq X \leq \frac{39.5 - 32}{4}) = 0.4832 - 0.4696 = 0.0136$

Normal Approximation to Poisson Distribution

The normal distribution gives a close approximation to the binomial distribution, provided:

$λ$ is large
$μ = σ^{2} = n p$

For example, say a radioactive decay emits a mean of 69 particles per seconds. A standard normal approximation to this is:

$u = \frac{x - μ}{σ} = \frac{x - 69}{69}$

The probability of emitting $\leq 60$ particles in a second is therefore: $P (0 \leq X \leq 60) = P (\frac{0 - 69}{69} \leq X \leq \frac{60.5 - 69}{69}) = 0.5 - 0.3473 = 0.1527$

Computer Systems Engineering Notes