# Probability & Statistics

## Probability

### Set Theory

- A set is a collection of elements
- Elements are members of a set

- $s∈S$ means "the element $s$ is a member of the set $S$
- The empty set $∅$ contains no elements
- It is empty

- $S={1,3,5,7,9}$
- $S$ is a set consisting of those integers

- $S={n:nis a prime number andn≤12}$
- $S={1,2,3,5,7,11}$

- $S={x:x_{2}=4andxis odd}$
- $S=∅$

- $A⊂S$
- $A$ is a subset of $S$
- $a∈A$ implies $a∈S$

- $∅∈S$ for all sets $S$
- $A=B$ if and only if $A⊂B$ and $B⊂A$
- $A∪B$ is the union of $A$ and $B$
- Set of elements belonging to $A$
*or*$B$

- Set of elements belonging to $A$
- $A∩B$ is the intersection of $A$ and $B$
- Set of elements belonging to $A$
*and*$B$

- Set of elements belonging to $A$
- Disjoint sets have no common elements
- $A∩B=∅$

- $A∖B$ is the different of $A$ and $B$
- Set of elements belonging to
*$A$ but not $B$*

- Set of elements belonging to
- $A_{c}$ is the complement of $A$
- Set of elements
*not*belonging to $A$

- Set of elements

### Random Processes & Probability

The probability of event $A$ occurring is denoted $P(A)$. This is the relative frequency of event $A∈S$ occurring in a random process within sample space S.

- $S$
- Certain or sure event, guaranteed 100% to happen

- $∅$
- Impossible event, won't happen

- $a∈S$
- Elementary event, the only event that can happen, the only possible outcome

- $A∪B$
- Event that occurs if
*$A$ or $B$*occurs

- Event that occurs if
- $A∩B$
- Event that occurs if
*$A$ and $B$*occur

- Event that occurs if
- $A_{c}=S∖A$
- Event that occurs if $A$
*does not*occur

- Event that occurs if $A$
- $A∪B=∅$
- Events $A$ and $B$ are
*mutually exclusive*

- Events $A$ and $B$ are

#### Example

Toss a coin 3 times and observe the sequence of heads and tails.

- Sample space $S={HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}$
- Event that $≥2$ heads occur in succession $A={HHH,HHT,THH}$
- Event that 3 heads or 3 tails occur $B={HHH,TTT}$
- $A∪B={HHH,HHT,THH,TTT}$
- $A∩B={HHH}$
- $A_{c}={HTH,HTT,THT,TTH,TTT}$
- $A_{c}∪B={TTT}$

#### Another Example

Sample space $S={17,18,19,20,21,22}$. Each number is an individual event.

Events | Frequency | Relative Frequency |
---|---|---|

17 | 3 | 3/35 |

18 | 4 | 4/35 |

19 | 9 | 9/35 |

20 | 11 | 11/35 |

21 | 6 | 6/35 |

22 | 2 | 2/35 |

### Axioms & Laws of Probability

- $0≤P(A)≤1$ for all $A⊂S$
- Probabilities are always between 0 and 1 inclusive

- $P(S)=1$
- Probability of the certain event is 1

- If $A∩B=∅$ then $P(A∪B)=P(A)+P(B)$
- If two events are disjoint, then the probability of either occurring is equal to the sum of their two probabilities

- $P(∅)=0$
- The probability of the impossible event is zero

- $P(A_{c})=1−P(A)$
- The probability of all the elements not in A occurring is the opposite of the probability of all the elements in A occurring

- If $A⊂B$, then $P(A)≤P(B)$
- The probability of A will always be less than or equal to the probability of B when A is a subset of B

- $P(A∖B)=P(A)−P(A∪B)$
- The probability of A minus B is equal to the probability of A minus the probability of A and B

- $P(A∪B)=P(A)+P(B)−P(A∩B)$
- Probability of A or B is equal to probability of A plus the probability of B minus the probability of A and B
- This is important

#### Example

In a batch of 50 ball bearings:

- 15 have surface damage ($A$)
- $P(A)=0.3$

- 12 have dents ($B$)
- $P(B)=0.24$

- 6 both have defects ($A∩B$)
- $P(A∩B)=0.12$

The probability a single ball bearing has surface damage or dents: $P(A∪B)=P(A)+P(B)−P(A∩B)=0.3+0.24−0.12=0.42$

The probability a single ball bearing has surface damage but no dents: $P(A∩B_{c})=P(A∖B)=P(A)−P(A∩B)=0.3−0.12=0.18$

### Conditional Probability & Bayes' Theorem

A conditional probability $P(A∣B)$ is the probability of event $A$ occurring, *given* that the event $B$ has occurred.

$P(A∣B)=P(B)P(A∩B) $

Bayes' theorem:

$P(A∣B)=P(B)P(B∣A)P(A) $

Axioms of conditional probability:

- $P(B)=P(B∣A)P(A)+P(B∣A_{c})P(A_{c})$
- $P(A∪B∣C)=P(A∣C)+P(B∣C)−P(A∩B∣C)$

#### Example

In a semiconductor manufacturing process:

- $A$ is the event that chips are contaminated
- $P(A)=0.2$

- $F$ is the event that the product containing the chip fails
- $P(F∣A)=0.1$ and $P(F∣A_{c})=0.005$

Determining the rate of failure: $P(F)=P(F∣A)P(A)+P(F∣A_{c})P(A_{c})=0.1×0.2+0.005×0.8=0.024$

### Independent Events

Two events are independent when the probability of one occurring does not dependend on the occurrence of the other. An event $A$ is independent if and only if $P(A∩B)=P(A)P(B)$

#### Example

Using the coin flip example again with a sample space $S$ and 3 events $A,B,C$

- $S={HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}$
- $P(S)=1$

- $A={HHH, HHT, HTH, HTT}$
- $P(A)=0.5$

- $B={HHH, HHT, THH, THT}$
- $P(B)=0.5$

- $C={HHT, THH}$
- $P(C)=0.25$

A and C are independent events:

- $A∩C={HHT}$
- $P(A∩C)=0.25=0.5×0.25=P(A)P(C)$

B and C are not independent events:

- $B∩C={HHT,THH}$
- $P(B∩C)=0.25=0.25×0.5=P(B)P(C)$

## Discrete Random Variables

For a random process with a *discrete* sample space $S$, a discrete random variable $X$ is a function that assigns a real number to each outcome $s∈S$.

- $X$ is a measure related to the random distribution.
- Denoted $P(X=a)$

Consider a weighted coin where $P(H)=0.75$ and $P(T)=0.25$. Tossing the coin twice gives a sample space $S={TT, TH, HT, HH}$, which makes the number of heads a random variable $X(s)={0,1,2}$. Since successive coin tosses are independent events:

- $P(TT)=0.0625$
- $P(TH)=0.1875$
- $P(HT)=0.1875$
- $P(HH)=0.5625$

Events are also mutually exclusive, so:

- $f(0)=P(TT)=0.0625$
- $f(1)=P(TH)+P(HT)=0.375$
- $f(2)=P(HH)=0.5625$

This gives a probability distribution function $f(x)=P(X=x)$ of:

$x$ | $f(x)$ |
---|---|

$0$ | $0.0625$ |

$1$ | $0.375$ |

$2$ | $0.5625$ |

### Cumulative Distribution Functions

The cumulative probability function gives a "running probability" $F_{X}(x_{i})=P(X≤x_{i})=j=1∑i f(x_{j})$

- if $x_{i}≤x_{j}$ then $F_{X}(x_{i})≤F_{X}(x_{j})$
- $F_{X}(x_{1})=f(x_{1})$
- $F_{X}(x_{n})=1$

Using coin example again:

$x$ | $F_{X}(x)$ |
---|---|

$0$ | $0.0625$ |

$1$ | $0.4375$ |

$2$ | $1$ |

### Expectation & Variance

- Expectation is the average value, ie the value most likely to come up
- The mean of $X$

$E(X)=i=1∑n x_{i}f(x_{i})=μ_{X}$

- Variance is a measure of the spread of the data

$Var(X)=i=1∑n (x_{i}−μ_{x})_{2}f(x_{i})=E(X_{2})−(E(X))_{2}=σ_{X}$

- Standard deviation $σ_{X}=Var(X) $

Using the weighted coin example once more:

$E(X)=0×0.0625+1×0.375+2×0.5625=1.5$ $E(X_{2})=0_{2}×0.0625+1_{2}×0.375+2_{2}×0.5625=2.625$ $Var(X)=E(X_{2})−(E(X))_{2}=1.5−2.625_{2}=0.375$

### Standardised Random Variable

The standardised random variable is a normalised version of the discrete random variable, obtained by the following transformation: $X_{∗}=σ_{X}X−μ_{X} $

- $E(X_{∗})=0$
- $Var(X_{∗})=1$

## Binomial Distribution

- The binomial distribution models random processes consisting of repeated
*independent*events - Each event has only 2 outcomes, success or failure
- $P(success)=p$
- $P(failure)=q=1−p$

The probability of $k$ successes in $n$ events:

$b(k;n;p)=(kn )p_{k}q_{n−k},k=0,1,2,...,n$

- Probability of no success $=q_{n}$
- Probability of $≥1$ successes is $1−q_{n}$

### Expectation & Variance

$μ=np$ $σ_{2}=npq$

### Example

A fair coin is tossed 6 times. $p=q=0.5$

Probability of exactly 2 heads out of 6 $b(2;6;0.5)=(26 )×0.5_{2}×0.5_{4}=6415 $

Probability of $≥1$ heads $1−q_{6}=1−0.5_{6}=6463 $

Probability of $≥4$ heads

$b(4;6;0.5)+b(5;6;0.5)+b(6;6;0.5)=(26 )(21 )_{4}(21 )_{2}+(26 )(21 )_{5}(21 )_{1}+(21 )_{6}=3211 $

Expected value $E(X)$ $μ=np=6×0.5=3$

Variance $σ_{2}=npq=6×0.5×0.5=1.5$

## Poisson Distribution

Models a random process consisting of repeated occurrence of a single event within a fixed interval. The probability of $k$ occurrences is given by $p(k;λ)=k!λ_{k} e_{−λ},k=0,1,2,...$

The poisson distribution can be used to approximate the binomial distribution with $λ=np$. This is only valid for large $n$ and small $p$

### Expectation & Variance

$μ=σ_{2}=λ$

### Example

The occurrence of typos on a page is modelled by a poisson distribution with $λ=0.5$.

The probability of 2 errors: $p(2;0.5)=2!0.5_{2} e_{−0.5}=0.076$

## Continuous Random Variables

Continuous random variables map events from a sample space to an interval. Probabilities are written $P(a≤X≤b)$, where $X$ is the random variable. $X$ is defined with a continuous function, the probability density function.

- The function must be positive
- $f(x)≥0$

- The total area under the curve of the function must be 1
- $∫_{−∞}f(x)dx=1$

- $P(a≤X≤b)=∫_{a}f(x)dx$

### Example

$f(x)={a(x−x_{2})0 0≤x≤1otherwise $

Require that $∫_{−∞}f(x)dx=1$, so have to find $a$: $∫_{−∞}f(x)dx=∫_{0}a(x−x_{2})dx=a[2x_{2} −3x_{3} ]_{0}=6a ⇒a=6$

Calculating some probabilities: $P(0≤X≤0.5)=∫_{0}f(x)dx=∫_{0}6(x−x_{2})dx=6[2x_{2} −3x_{3} ]_{0}=0.5$ $P(0.25≤X≤0.75)=∫_{0.25}f(x)dx=∫_{0.25}6(x−x_{2})dx=6[2x_{2} −3x_{3} ]_{0.25}=1611 $

### Cumulative Distribution Function

The cumulative distribution function $F_{X}$ up to the point $a$ is given as $F_{X}(a)=∫_{−∞}f(x)dx$

- if $a≤b$, then $F_{X}(a)≤F_{X}(b)$
- $lim_{x→−∞}F_{X}(x)=0$
- $lim_{x→∞}F_{X}(x)=1$
- $dxd F_{X}(x)=f(x)$
- Derivative of cumulative distribution function is the probability distribution function

Using previous example, let $F_{X}(x)=∫_{−∞}f(t)dt$. For $x<0$ $F_{X}(x)=0$

For $0≤x≤1$ $F_{X}(x)=∫_{0}6(t−t_{2})dt=6[2t_{2} −3t_{3} ]_{0}=3x_{2}−2x_{3}$

For $x>1$ $F_{X}(x)=∫_{0}6(t−t_{2})dt=6[2t_{2} −3t_{3} ]_{0}=1$

### Expectation & Variance

Where $X$ is a continuous random variable:

$E(X)=∫_{−∞}xf(x)dx=μ$ $Var(X)=∫_{−∞}(x−μ)_{2}f(x)dx=σ_{X}=E(X_{2})−μ_{2}$

## Uniform Distribution

A continuous distribution with p.d.f:

$f(x)={b−a1 0 a≤x≤botherwise $

Expectation and variance:

$μ=2a+b $ $σ_{2}=12(b−a)_{2} $

Cumulative distribution function:

$F_{X}(x)=⎩⎨⎧ 0b−ax−a 0 −∞<x<aa≤x≤bb<x<∞>> $

## Exponential Distribution

A continuous distribution with p.d.f:

$f(x)={0ve_{−vx} −∞<x<00≤x<∞ $

Expectation and variance:

$μ=v1 $ $σ_{2}=v_{2}1 $

Cumulative distribution function:

$F_{X}(x)={01−e_{−vx} −∞<x<00≤x<∞ $

- Recall that a discrete random process $X$ where a single event occurs $i$ times in a fixed interval is modelled by a Possion distribution $p(k;λ)$
- $E(X)=λ$

- Consider a situation where the event occurs at a constant mean rate $v$ per unit time
- Let $λ=vt$, then $P(0)=e_{−vt}$ and probability of $≥1$ events occurring is $1−e_{−vt}$
- Suppose the
*continuous*random variable $Y$ is the time between occurrences of successive events - If there is a period of time $t$ with no events, then $Y>t$ and $P(Y>t)=e_{−vt}$
- If $≥1$ events occur then $Y≤t$ and $P(Y≤t)=1−e_{−vt}$

**If the number of events per interval of time is Possion distributed, then the length of time between events is exponentially distributed**

### Example

Calls arrive randomly at the telephone exchange at a mean rate of 2 calls per minute. The number of calls per minute $X$ is a d.r.v. which can be modelled by a Poisson distribution with $λ=2$. The probability of 1 call in any given minute is:

$P(X=1)=1!λe_{(}−λ) =2e_{−2}=0.27$

The time between consecutive calls $Y$ is a c.r.v. modelled by an exponential distribution with $v=tλ =12 =2$. The probability of at least 1 ($≥1$) minute between calls is: $p(1≤Y≤∞)=∫_{1}ve_{−vt}dt=∫_{1}2e_{−2t}dt=[−e_{−2t}]_{1}=0.135$

## Normal Distribution

A distribution with probability density function:

$f(x)=σ2π 1 e_{−2σ(x−μ)}$

Expectation $E(X)=μ$ and variance $Var(X)=σ_{2}$. Normal distribution is denoted $N(μ,σ_{2})$ and is defined by its mean and variance.

### Standardised Normal Distribution

$X$ is a random variable with distribution $N(μ,σ_{2})$. The standardised random variable $U$ is distributed $N(0,1)$ and can be obtained with the transform: $U=σX−μ $ and has p.d.f. $f(u)=2π 1 e_{−2u}$

$P(X≤b)=P(U≤β)$ where $β=σb−μ $. Values for the standard normal distribution are tabulated in the data book.

### Example

The length of bolts $x$ from a production process are distributed normally with $μ=2.5$ and $σ_{2}=0.01$.

$u=σx−μ =0.1x−2.5 $ The probability the length of a bolt is between 2.6 and 2.7 cm (values obtained from table lookups): $P(2.6≤X≤2.7)=P(0.12.6−2.5 ≤U≤0.12.7−2.5 )=P(1≤U≤2)$ $=P(0≤U≤2)−P(0≤U≤1)=0.4772−0.3413=0.1359$

### Confidence Intervals

A confidence interval is the interval in which we would expect to find an estimate of a parameter, at a specified probability level. For example, the interval covering 95% of the population of $N(μ,σ_{2})$ is $μ±1.96σ$.

For a random variable $X$ with distribution $N(67.5,2.5_{2})$, the standard variate $u=2.5x−67.5 $. For confidence interval at 95% probability:

$Q(u)=20.95 =0.475$

Using table lookups, $u=±1.96$, and: $x=μ±1.96σ=67.5±1.96×2.5=67.5±4.9$

For confidence interval at 99.9% probability:

$Q(u)=20.999 =0.4995$

Table lookups again, $u=±3.3$, and: $x=μ±3.3σ=67.5±3.3×2.5=67.5±8.25$

### Normal Approximation to Binomial Distribution

The normal distribution gives a close approximation to the binomial distribution, provided:

- $n$ is large
- neither $p$ nor $q$ are close to zero
- $μ=np$ and $σ_{2}=npq$

For example, take a random process consitsting of 64 spins of a fair coin $n=64$ and $p=q=0.5$. The probability of 40 heads is: $P(40)=(4060 )×0.5_{64}=0.01359$ $μ=np=32,σ=npq =4$

For a normal approximation, must use the interval around 40 (normal is continuous, binomial is discrete) $[39.5,40.5]$:

$P(39.5≤X≤40.5)=P(439.5−32 ≤X≤439.5−32 )=0.4832−0.4696=0.0136$

### Normal Approximation to Poisson Distribution

The normal distribution gives a close approximation to the binomial distribution, provided:

- $λ$ is large
- $μ=σ_{2}=np$

For example, say a radioactive decay emits a mean of 69 particles per seconds. A standard normal approximation to this is:

$u=σx−μ =69 x−69 $

The probability of emitting $≤60$ particles in a second is therefore: $P(0≤X≤60)=P(69 0−69 ≤X≤69 60.5−69 )=0.5−0.3473=0.1527$