In [1]:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline

from IPython.display import YouTubeVideo

Lecture 20 Part 2

Expectation: Definition and Additivity

Note that we're defining expectation only for discrete random variables. For continuous random variables you have to replace the sum by an appropriate integral, which we won't need in this class. But all the properties of expectation listed in the slides hold for both kinds of random variables.

In [2]:
YouTubeVideo("wBBWFYz9248")
Out[2]:

Now go over Slides 20-26.

Important Examples

Please work out every line in Slides 23 and 26, and note how the "balance point" interpretation reduces the need for calculation.

Transformations (Slide 24)

Linear Transformation rules are used frequently because:

  • $-X$ is a linear transformation of $X$
  • If $Y_1, Y_2, \ldots, Y_n$ are random variables then their mean $\bar{Y}_n$ is a linear transformation of their sum $S_n$, because $\bar{Y}_n = \frac{1}{n}S_n$. This will be applied to the sample mean and also to Mean Squared Error, Average Loss, etc.
  • The typical conversions of units of measurement are linear transformations, e.g. $cm = 2.54 \cdot inches$ or ${}^{\circ}F = \frac{9}{5}{}^{\circ}C + 32$ etc.
  • The most important special case of unit conversion is the conversion to standard units, which we'll do after we have defined the $\mathbb{SD}$.

Non-linear transformations don't play well with expectation:

  • $\mathbb{E}(\log(X))$ is typically different from $\log(\mathbb{E}(X))$. In general $\mathbb{E}(g(X)) \neq g(\mathbb{E}(X))$ if $g$ is non-linear.
  • The most important non-linear transformation is the square. We'll soon see that $\mathbb{E}(X^2) \ge (\mathbb{E}(X))^2$.

Questions

When in doubt, try to use additivity.

Question 1: Which of the following is $\mathbb{E}(\mathbb{E}(X))$?

  • $X$
  • $\mathbb{E}(X)$
  • Not possible to say
Answer 1 $\mathbb{E}(X)$

Question 2: Suppose you know that $\mathbb{E}(X) = 2$ and $\mathbb{E}(X^2) = 13$. If possible, find $\mathbb{E}(X - 5)$.

Answer 2 $-3$

Question 3: Suppose you know that $\mathbb{E}(X) = 2$ and $\mathbb{E}(X^2) = 13$. If possible, find $\mathbb{E}(\vert X - 5 \vert)$.

Answer 3 Not possible

Question 4: Suppose you know that $\mathbb{E}(X) = 2$ and $\mathbb{E}(X^2) = 13$. If possible, find $\mathbb{E}((X - 5)^2)$.

Answer 4 $18$

Notice how squared loss is easier to work with mathematically than absolute loss.

Variance and SD

To quantify how far a random variable can be from its mean, it's natural to start with the deviation from mean:

$$ D ~ = ~ X - \mathbb{E}(X) $$

You should show that $\mathbb{E}(D) = 0$.

The positive deviations exactly cancel out the negative deviations. So to get a sense of how big $D$ is, we have to ignore its sign somehow. That is why we look instead at the mean squared error $\mathbb{E}(D^2)$, which is called the variance of $X$.

  • Slide 28 is the random variable analog of the corresponding data definitions in Data 8.
  • The interpretations in Slide 29 correspond to this in Data 8. That discussion is worth reading.

It's worth writing out this obvious fact: variance is non-negative. Details:

  • $\mathbb{V}ar(X) = 0$ if and only if $X$ is a constant. For every other random variable, $\mathbb{V}ar(X) > 0$.

Question

Question 5: $X$ has the uniform distribution on the values $-1$ and $1$. $Y$ has the uniform distribution on the values $-1$, $0$, and $1$. Which is bigger: $\mathbb{SD}(X)$ or $\mathbb{SD}(Y)$? Answer without calculation. [Try drawing (by hand) overlaid histograms of the two distributions.]

Answer 5 $\mathbb{SD}(X)$

Alternative Calculation (Slide 30)

This is sometimes called the computational formula for $\mathbb{V}ar(X)$ but it can actually pretty bad in terms of numerical accuracy if $X$ has large positive or negative values. The term is a holdover from the days when computation meant cranking out by hand. It can also reduce algebra.

Before looking at the derivation, let's use the result, which is

$$ \mathbb{V}ar(X) ~ = ~ \mathbb{E}(X^2) - (\mathbb{E}(X))^2 $$

Consequences

  • Since variance is non-negative, $\mathbb{E}(X^2) \ge (\mathbb{E}(X))^2$. Equality is if and only if $X$ is a constant.
  • If you know the expectation and variance, you can figure out the expected square: $\mathbb{E}(X^2) = \mathbb{V}ar(X) + (\mathbb{E}(X))^2$
  • This is particularly helpful if $\mathbb{E}(X) = 0$, because then $\mathbb{E}(X^2) = \mathbb{V}ar(X)$. This will be used for random errors that have mean 0.
  • Variance of an Indicator: If $I$ has the Bernoulli $(p)$ distribution, then $I^2 = I$ because 0 and 1 are equal to their squares. So $$ \mathbb{V}ar(I) ~ = ~ \mathbb{E}(I^2) - (\mathbb{E}(I))^2 ~ = ~ \mathbb{E}(I) - (\mathbb{E}(I))^2 ~ = ~ p - p^2 ~ = ~ p(1-p) $$

Note that result! It's going to get used.

Questions

Question 6: Suppose $\mathbb{E}(X) = 10$ and $\mathbb{SD}(X) = 2$. Match the numbers below with $\mathbb{E}(X^2)$, $(\mathbb{E}(X))^2$, and $\mathbb{V}ar(X)$. You can use numbers more than once, and some will be left over.

  • $2, 4, 12, 14, 100, 102, 104$
Answer 6 $\mathbb{E}(X^2) = 104$, $(\mathbb{E}(X))^2 = 100$, $\mathbb{V}ar(X) = 4$

Question 7: Suppose $\mathbb{E}(X^2) = 13$ and $\mathbb{V}ar(X) = 9$. Which of the following could $\mathbb{E(X)}$ be? Pick all the options that you think will work.

  • $-\sqrt{13}, -3, -2, 2, 3, \sqrt{13}$, none of the previous
Answer 7 $-2, 2$
In [3]:
YouTubeVideo("poYb0w7LhY8")
Out[3]:

Linear Transformation (Slide 31)

Yes, those again. This sequence of figures should explain why $\mathbb{SD}(aX+b) = \vert a \vert \mathbb{SD}(X)$.

Important consequence: The squared coeffiecient in $\mathbb{V}ar(aX) = a^2\mathbb{V}ar(X)$.

In [4]:
# Probability distribution of X

vals_X = np.arange(1, 4)
probs = np.array([0.2, 0.5, 0.3])
In [5]:
# Distribution of X

#bins_X = np.arange(0.5, 3.6, 1)
bins_X = np.arange(-9.5, 9.6)
def plot_dist_X():
    plt.hist(vals_X, bins=bins_X, weights=probs, ec='w')
    plt.xticks(range(-10, 11, 2))
    plt.xlim(-10, 10)
    
plot_dist_X()
In [6]:
# Distribution of X+4
# SD doesn't change

plot_dist_X()
shift_b = 4
plt.hist(vals_X+shift_b, bins=bins_X, weights=probs,ec='w');
In [7]:
# Distribution of 3X
# SD gets multiplied by 3

plot_dist_X()
scale_a = 3
plt.hist(scale_a*vals_X, bins=bins_X, weights=probs, ec='w');
In [8]:
# Distribution of -3X
# SD gets multiplied by 3

plot_dist_X()
scale_a = -3
plt.hist(scale_a*vals_X, bins=bins_X, weights=probs, ec='w');

Questions

Question 8: Suppose $\mathbb{V}ar(X) = \sigma^2$. Match the following with $\mathbb{SD}(X)$, $\mathbb{SD}(-X)$, and $\mathbb{V}ar(-X)$. You can use values more than once, and some will be left over.

$-\sigma^2$, $-\sigma$, $\sigma$, $\sigma^2$

Answer 8 $\mathbb{SD}(X) = \sigma = \mathbb{SD}(-X)$, $\mathbb{V}ar(-X) = \sigma^2$

Question 9: On a 10-question test where each question is graded as either Right or Wrong, a student guesses randomly according to some wild and weird scheme. Let $R$ be the number of questions the students gets Right and $W$ the number Wrong. Suppose $E(R) = 3$ and $SD(R) = 2$. Find $E(W)$ and $SD(W)$.

Answer 9 7 and 2

Standard Units (Slide 33)

Work out every line of Slide 33, please. The results put together a bunch of previous results about variance and standard deviation, and are used frequently.