{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"# Plotly plotting support\n",
"import plotly.plotly as py\n",
"\n",
"# import cufflinks as cf\n",
"# cf.go_offline() # required to use plotly offline (no account required).\n",
"\n",
"import plotly.graph_objs as go\n",
"import plotly.figure_factory as ff\n",
"\n",
"# Make the notebook deterministic \n",
"np.random.seed(42)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Basic Modeling"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this exercise we are going to use synthetic data to illustrate the basic ideas of model design. Notice here that we are generating data from a linear model with Gaussian noise."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"n = 100 # Number of records \n",
"\n",
"noise = 0.5 # Noise in observations (we wouldn't know this in real life)\n",
"m = 1.5 # The true slope (we wouldn't know this in real life) \n",
"b = 3.0 # The true intercept (we wouldn't know this in real life)\n",
"\n",
"# Make the data --------------------------\n",
"X = np.random.rand(n) * 4. - 2.\n",
"# The Y values are created using the secret model \n",
"# (We wouldn't have this in real-life either)\n",
"\n",
"Y = m * X + b + np.random.randn(n) * noise"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Visualize the data ---------------------\n",
"raw_data = go.Scatter(name = \"Data\", x = X, y = Y, mode = 'markers')\n",
"py.iplot([raw_data], filename=\"L19-p1-01\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Starting Simple\n",
"\n",
"\n",
"* We know how the data was created but let's imagine we did not.\n",
"* What is the simplest model for this data? \n",
"\n",
"Recall that we are trying to \"model\" the relationship between $x$ and $y$. Let's focus on the class of **parametric** models that are functional relationships defined by a set of **parameters** $\\theta$:\n",
"\n",
"$${\\large\n",
"f_\\theta: x \\rightarrow y\n",
"}\n",
"$$\n",
"\n",
"\n",
"\n",
"--- \n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The constant function:\n",
"\n",
"Here is a very simple model:\n",
"\n",
"$$ {\\large\n",
"f_\\theta(x) = \\theta\n",
"}\n",
"$$\n",
"\n",
"Notice that this model is defined by a single number $\\theta$. How should we determine $\\theta$? \n",
"\n",
"**What would be a good guess for $\\theta$**?\n",
"\n",
"1. 0\n",
"1. $\\bar{x} = \\frac{1}{n}\\sum_{i=1}^n x_i$\n",
"1. $\\bar{y} = \\frac{1}{n}\\sum_{i=1}^n x_i$\n",
"\n",
"\n",
"\n",
"
\n",
"\n",
"---\n",
"\n",
"**Proposal:** \n",
"\n",
"$${\\large\n",
"\\theta = \\bar{y} = \\frac{1}{n} \\sum_{i=1}^n y_i\n",
"}$$\n",
"\n",
"Why is this a good guess?\n",
"\n",
"We can define our model in Python:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Define our simple constant model:\n",
"def f_const(theta, x):\n",
" return theta \n",
"\n",
"# Guess a value for theta\n",
"theta_guess = np.mean(Y)\n",
"\n",
"# Make predictions for all of our training data\n",
"yhat = [f_const(theta_guess, x) for x in X]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualizing the Fit\n",
"\n",
"For this and the next lecture I am experimenting with Plotly. It allows me to build more interactive web-based graphics. (Let me know what you think.)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Visualize the data ---------------------\n",
"theta_guess_line = go.Scatter(\n",
" name=\"First Guess\", \n",
" x = X, y = yhat, \n",
" mode = 'lines', line = dict(color = 'red')\n",
")\n",
"py.iplot([raw_data, theta_guess_line], filename=\"L19-p1-02\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Is this a good approximation of the data? \n",
"\n",
"---\n",
"\n",
"
\n",
"\n",
"## Look at the Residuals"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Definethe residual lines segments, a separate line for each training point\n",
"residual_lines = [\n",
" go.Scatter(x=[x,x], y=[y,yhat],\n",
" mode='lines', showlegend=False, line=dict(color='black', width = 0.5))\n",
" for (x, y, yhat) in zip(X, Y, yhat)\n",
"]\n",
"# Combine the plot elements\n",
"py.iplot([raw_data, theta_guess_line] + residual_lines, filename=\"L19_p1_p02.1\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can compute the residuals by computing the difference between the predicted value and the true value:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"residuals = yhat - Y"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Plot.ly plotting code\n",
"py.iplot(go.Figure(\n",
" data = [dict(x=X, y=residuals, mode='markers')],\n",
" layout = dict(title=\"Residual Plot\", xaxis=dict(title=\"X\"), yaxis=dict(title=\"Residual\"))\n",
"), filename=\"L19_p1_p02.2\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What do we see in the above plot?\n",
"\n",
"----\n",
"
\n",
"\n",
"The residual has a clear linear dependence on $X$. This means that there is probably more that we could model. Let's come back to that.\n",
"\n",
"### The Residual Distribution\n",
"\n",
"The residual distribution seems to be pretty uniform and not necessarily concentrated around 0. However, it seems pretty symmetric. "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"py.iplot(ff.create_distplot([residuals], group_labels=['Residuals']), filename=\"L19_p1_p02.3\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The residuals analysis suggests:\n",
"1. There is still linear structure that we can model. \n",
"1. We don't appear to have any strong bias in our predictor\n",
"\n",
"\n",
"We still haven't answered the key question:\n",
"\n",
"## How do we quantify how well a model fits the data? \n",
"\n",
"--- \n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Loss Functions and the Squared Loss\n",
"\n",
"To answer these questions we need to define a **loss function** to measure how well the model approximates the data. A very common loss function is the **squared loss** which measures the squared difference between what the model predicts $f_\\theta(x)$ and the observed response $y$. \n",
"\n",
"$${\\large\n",
"L(y, f_\\theta(x)) = \\left( y - f_\\theta(x) \\right)^2\n",
"}$$\n",
"\n",
"The **squared loss** is the foundation of the **least squares** method which is widely used in function approximation and modeling and can be applied to a wide range of models (not just linear models).\n",
"\n",
"\n",
"If we have many data points (and hopefully we do) then we define the loss over the entire dataset as:\n",
"\n",
"$${\\large\n",
"L_D(\\theta) = \\frac{1}{n} \\sum_{i=1}^n L(y_i, f_\\theta(x_i))\n",
"}$$\n",
"\n",
"Notice that in the above notation we have defined the loss $L_D(\\theta)$ as a function of only the parameters $\\theta$ and are assuming the model form $f$ and the data $D$ are fixed. If we consider the squared loss we get:\n",
"\n",
"$${\\large\n",
"L_D(\\theta) = \\frac{1}{n} \\sum_{i=1}^n \\left( y_i - f_\\theta(x_i) \\right)^2\n",
"}$$\n",
"\n",
"\n",
"--- \n",
"\n",
"
\n",
"We can plot the **loss function** for our simple model and our best guess"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Single point loss function\n",
"def sq_loss(y, pred):\n",
" return (y - pred)**2"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Loss over the entire dataset for a given thena\n",
"def loss(f, theta):\n",
" return np.mean([sq_loss(y, f(theta, x)) for (x,y) in zip(X,Y)])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can compute the loss for a bunch of different $\\theta$ values."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Generate a range of possible theta values (models)\n",
"theta_values = np.linspace(0, 5, 100)\n",
"\n",
"# Compute the loss of each\n",
"loss_values = [loss(f_const, theta) for theta in theta_values]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can then plot the loss on our data as function of $\\theta$"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"## Plotting code ---------\n",
"loss_curve = go.Scatter(name=\"Loss Function\", x = theta_values, y = loss_values, mode = 'lines')\n",
"best_guess = go.Scatter(name=\"Best Guess\", x = [theta_guess], y = [loss(f_const, theta_guess)], \n",
" mode = 'markers',\n",
" marker=dict(color=\"red\", size=20))\n",
"layout = go.Layout(xaxis=dict(title=r\"$\\theta$\"), yaxis=dict(title=r\"Loss Value\"))\n",
"fig = go.Figure(data = [loss_curve, best_guess ], layout = layout)\n",
"py.iplot(fig, filename=\"L19-p1-03\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Observation:** *The loss function is a smooth quadratic curve.*\n",
"\n",
"This is not an artifact of our data. In general, when we combine the squared loss with a linear model (more on this later) we will get nice quadratic loss curves (potentially in many dimensions).\n",
"\n",
"---\n",
"\n",
"
\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"## Minimize the Loss with Calculus\n",
"\n",
"\n",
"\n",
"\n",
"We guessed that best $\\theta$ value for the model $f_\\theta(x) = \\theta$ would be $\\theta = \\frac{1}{n} \\sum_{i=1}^n y_n$ (the red circle in the above figure). This was a good guess!! \n",
"\n",
"However, a more general procedure would be to use calculus to minimize the loss function. \n",
"\n",
"\n",
"\n",
"\n",
"**Beginners Convex Optimization Recipe:**\n",
"1. Take derivative of the loss function \n",
"1. Solve for paramter values by setting derivative equal to zero\n",
"\n",
"---\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Working through the calculations for our very simple constant function:\n",
"\n",
"\\begin{align*} \n",
"\\large\\frac{\\partial}{\\partial \\theta} L_D(\\theta) \n",
"& \\large{ = \\frac{1}{n} \\sum_{i=1}^n \\frac{\\partial}{\\partial \\theta} \\left( y_i - f_\\theta(x_i) \\right)^2 }\\\\\n",
"& \\large{= \\frac{1}{n} \\sum_{i=1}^n 2 \\left( y_i - f_\\theta(x_i) \\right) \\frac{\\partial}{\\partial \\theta} f_\\theta(x_i) }\\\\\n",
"& \\large{ = \\frac{1}{n} \\sum_{i=1}^n 2 \\left( y_i - \\theta \\right)}\n",
"\\end{align*}\n",
"\n",
"where the last line comes from:\n",
"\n",
"$$\\large\n",
"f_\\theta(x) = \\theta \\quad \\Rightarrow \\quad \\frac{\\partial}{\\partial \\theta} f_\\theta(x_i) = 1\n",
"$$\n",
"\n",
"Setting the derivative equal to zero we get:\n",
"\n",
"\n",
"\\begin{align*}\n",
"\\large 0 & \\large{= \\frac{1}{n} \\sum_{i=1}^n 2 \\left( y_i - \\theta \\right) }\\\\\n",
"\\large 0 &\\large{ = \\frac{2}{n} \\sum_{i=1}^n y_i - \\theta \\frac{2 n}{n} }\\\\\n",
"\\large{\\theta \\frac{2 n}{n}} & \\large{ =\\frac{2}{n} \\sum_{i=1}^n y_i } \\\\\n",
"\\large \\theta & = \\large \\frac{1}{n} \\sum_{i=1}^n y_i = \\bar{y}\n",
"\\end{align*}\n",
"\n",
"\n",
"Therefore the loss minimizing assignment to $\\theta$ is the average value of $y$."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Modeling the Line\n",
"\n",
"You may recall from introductory math classes the equation of a line:\n",
"\n",
"$$\\large\n",
"y = m \\, x + b\n",
"$$\n",
"\n",
"a line is one of the simplest models we might use to represent the relationship between an input $x$ and an output $y$. We could write this as the model:\n",
"\n",
"$$\\large\n",
"f_{(m,b)}(x) = m \\, x + b\n",
"$$\n",
"\n",
"This is a **parametric model** parameterized by $m$ and $b$. Adopting our earlier notation we define $\\theta = (m, b)$ and rewrite our improved model:\n",
"\n",
"$$\\large\n",
"f_\\theta(x) = \\theta_0 x + \\theta_1\n",
"$$\n",
"\n",
"Now that we have a new model how should we select our parameters? We would like to minimize our loss but what does it look like. \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Our new model\n",
"def f_line(theta, x):\n",
" return theta[0] * x + theta[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Generate all combinations of $\\theta_0$ and $\\theta_1$ on a 20 by 20 grid."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# To visualize the loss I need to generate many theta0, theta1 pairs. \n",
"# This is done using meshgrid\n",
"(theta0,theta1) = np.meshgrid(np.linspace(0,3,20), np.linspace(1,5,20))\n",
"theta_values = np.vstack((theta0.flatten(), theta1.flatten())).T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Compute the loss for each $\\theta$"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Evaluating the loss for all theta values\n",
"loss_values = [loss(f_line, theta) for theta in theta_values]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plot the $\\theta$ values"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Here I make a 3d surface plot. X, Y, and Z must all be _matrices_ \n",
"# corresponding to the x, y, z values for each point on the x,y plane \n",
"loss_surface = go.Surface(\n",
" x = theta0, y = theta1,\n",
" z = np.reshape(loss_values, theta1.shape)\n",
")\n",
"\n",
"# Axis labels\n",
"layout = go.Layout(\n",
" scene=go.Scene(\n",
" xaxis=go.XAxis(title='theta0'),\n",
" yaxis=go.YAxis(title='theta1'),\n",
" zaxis=go.ZAxis(title='loss'),\n",
" aspectratio=dict(x=1.,y=1., z=1.)\n",
" )\n",
")\n",
"fig = go.Figure(data = [loss_surface], layout = layout)\n",
"py.iplot(fig, filename=\"L19-p1-04\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Spin the above plot and guess what might be optimal values for $\\theta$ (and admire D3 WebGL enabled visualizations). \n",
"\n",
"**Notice:** *The plot is quadratic like the earlier 1-dimensional loss function.*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Minimizing the Loss Function\n",
"\n",
"Let's now try to derive the minimizing values for our parameters:\n",
"\n",
"\\begin{align*}\n",
"\\large \\frac{\\partial}{\\partial \\theta_0} L_D(\\theta) \n",
"&\\large = \\frac{1}{n} \\sum_{i=1}^n \\frac{\\partial}{\\partial \\theta_0} \\left( y_i - f_\\theta(x_i) \\right)^2 \\\\\n",
"&\\large = \\frac{1}{n} \\sum_{i=1}^n 2 \\left( y_i - f_\\theta(x_i) \\right) \\frac{\\partial}{\\partial \\theta_0} f_\\theta(x_i) \\\\\n",
"&\\large = \\frac{1}{n} \\sum_{i=1}^n 2 \\left( y_i - \\left(\\theta_0 x_i + \\theta_1 \\right) \\right) x_i\n",
"\\end{align*}\n",
"\n",
"where the last line comes from \n",
"\n",
"$$ \\large\n",
"f_\\theta(x) = \\theta_0 x + \\theta_1 \\quad \\Rightarrow \\quad \\frac{\\partial}{\\partial \\theta_0} f_\\theta(x) = x\n",
"$$ \n",
"\n",
"Taking the derivative with respect to $\\theta_1$\n",
"\n",
"\\begin{align*}\n",
"\\large \\frac{\\partial}{\\partial \\theta_1} L_D(\\theta) \n",
"&\\large = \\frac{1}{n} \\sum_{i=1}^n \\frac{\\partial}{\\partial \\theta_1} \\left( y_i - f_\\theta(x_i) \\right)^2 \\\\\n",
"&\\large = \\frac{1}{n} \\sum_{i=1}^n 2 \\left( y_i - f_\\theta(x_i) \\right) \\frac{\\partial}{\\partial \\theta_1} f_\\theta(x_i) \\\\\n",
"&\\large = \\frac{1}{n} \\sum_{i=1}^n 2 \\left( y_i - \\left(\\theta_0 x_i + \\theta_1\\right) \\right)\n",
"\\end{align*}\n",
"\n",
"where the last line comes from \n",
"\n",
"$$ \\large\n",
"f_\\theta(x) = \\theta_0 x + \\theta_1 \\quad \\Rightarrow \\quad \\frac{\\partial}{\\partial \\theta_1} f_\\theta(x) = 1\n",
"$$ \n",
"\n",
"\n",
"\n",
"\n",
"Setting both derivatives equal to zero we get the following system of equations:\n",
"\n",
"\\begin{align*}\n",
"\\large 0 & \\large = \\frac{1}{n} \\sum_{i=1}^n 2 \\left( y_i - \\theta_0 x_i - \\theta_1 \\right) x_i \\\\\n",
"\\large 0 & \\large = \\frac{1}{n} \\sum_{i=1}^n 2 \\left( y_i - \\theta_0 x_i - \\theta_1 \\right)\n",
"\\end{align*}\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can do some algebra to make things a bit clearer:\n",
"\n",
"\n",
"\\begin{align*}\n",
"\\large \\sum_{i=1}^n x_i y_i & \\large = \\theta_0 \\sum_{i=1}^n x_i x_i + \\theta_1 \\sum_{i=1}^n x_i \\\\\n",
"\\large \\sum_{i=1}^n y_i & \\large = \\theta_0 \\sum_{i=1}^n x_i + \\theta_1 n\n",
"\\end{align*}\n",
"\n",
"To simplify the math we can define constants (statistics computed from the data directly):\n",
"\n",
"\\begin{align*}\n",
"\\large C_1 & = \\large \\sum_{i=1}^n x_i y_i \\quad & \\large C_2 &= \\large \\sum_{i=1}^n x_i x_i \\\\\n",
"\\large C_3 & = \\large \\sum_{i=1}^n x_i \\quad & \\large C_4 & \\large = \\sum_{i=1}^n y_i \n",
"\\end{align*}\n",
"\n",
"Substituting the constants it becomes clear that we have a classic system of linear equations:\n",
"\n",
"\\begin{align*}\n",
"\\large C_1 & = \\large \\theta_0 C_2 + \\theta_1 C_3 \\\\\n",
"\\large C_4 & = \\large \\theta_0 C_3 + \\theta_1 n\n",
"\\end{align*}\n",
"\n",
"Which we can solve analytically. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## _Bonus:_ Solving Equations Symbolically\n",
"\n",
"However, because algebra is tedious and we are data scientists who use technology to tackle problems (and make our lives easier) let's make a symbolic algebra library do the tricky work for us:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYMAAAAyBAMAAABMq6pFAAAAMFBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv3aB7AAAAD3RSTlMAEImZRO/dMlQiu6vN\nZnZmcXX2AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAHAklEQVRoBdVab4hUVRQ/b2Z2Zsd1nUFRsH87\nIgWh6Kr5xawdBOtjFlIUhgomSJZDUvRB2AmxrKAdCoL+0CpBHxJCCQLZrPlUQUsu9aE/X5wSv1Tk\napKa2XbOvffcd+57d2bevmY/eGHf+93zO+d37nnvzbv3XQXAtnCYjjdee7vOY849XGN4Y50HHyub\nAY+1vQm3HZuYer19WZKWuH1EnJFxEkvP4NeJU8WKsUinN3dqY/Fv6S7xgscBDuJfmyZpidu4e80y\nTmLH+dwU9K+f0ibHqXhNGwcuOP5hJ/gA71P2amhwkaQldr0692ScxCrqNRN7E13rkYbqRZxGaspa\naqpT/DCfasttiRPaImmJ2/n77DJOYuXLJZzGKwmbdXjEabSlzKXjmo0dR+poKgzH7MYgaYnb+fvs\nMk5i5WtKyF6h3n4dHnEa0qMbalNC8S+KChp0zK1464/PCIRN0hKHHt2RjJNYR5oSxndQd6myRZ3M\n2NuVkPlXRQEcwF/EvI9gSBVjbABMZ86esBjgiOWTANagFIxJTzdTwuqW7i5/v2ydOFGXEvKXdej+\nTwCezzdhvKb75sj0PbABGEN/u1eDE2o7HEcpGJOebqaEfTpvpjHQtE6cqEsJfWo4AcCLAOWhKpwh\n4Vt2U/sSEdNrkWAMC5OVEHxBKntrNg5TMCY9bCsnJ5+YnPwa0fYy9cv5RuGqdeJEXUrIT1PoS6oE\nGAP4lLphszTdBXb9OVkJrGLjsASLY3eBvPtr+WrxUuhkEnUpQc8Iw7qEjQDX8YaIZungETN5DEPh\nhdmVYDWwBItRTzfzIKkZIYsmfErZiRN1KaF4EcMyeBdRHx5CgYxWNkdL/94CxplgdiVwHKWwGPV0\n4zdSBbvL8C8fT9SlBBjDeX0NhpL+Jcg0v0csGtN04w3+cZYlcJy6SlJPpTEl9F/HZQ5eSViOf5FE\nbgm3TonhKRhsmLiLAJaQ2wHFU3VltQemYbwOGgctfwlxaRZhDbrRjElPNVMCLDr2cRUNg/hMRxM5\nJeSOHlRxngPpt2345I5z8f3r1v3JWAR0kGavMIXQ4xKM0ya4n91tIqeEZ8ulMntEzocifbe7B86F\ngUXfb6GDNEuJFKGenpDZpfDV4aOM8cnWiZwSnoJSPfSQ6I19kaWFJAGe+eVkaPjpWj3sMGovzR4y\nhaPHDnjum5mZDrsmkVnfjVeQGdwCJc8zEMakR3MoDaUjalxnqnjKD8NoXXV7fphDaZivb8xuGvQD\nVThf5dE/yaAn5zmUBnrh4uR1lY6fT0xshwWrKoSDbXTsWVPStBbF1mNpHHcDVW+vkvZegH/gW3iV\ncI+bkqa16Fy0ga249DyhlC9C4QrsAvlinUnXxGtDj1lJq/nX1pBOORKlEx1qwZI1JBxcgeyW4DL0\n1W2aXgElrRdavZKUOs9VANbX0BJcglIL1+P5KUnPBtM3lbcp6R6UkNmgLnYkRwYfJMiqj7OtcAcU\nLtBiMF1T31T+UJLuQQl3Rj9YVLbVdANAvVTvLT4K9FXUQkOa14b6piI1bENNdeIDSXMJaaSNzh4Y\nbbCkPef0972aDxadLQP+FkpVYlPMC+qbyijndxqgTyTNJaSRNmK7YLxqYHiaN62wWmAo5L6R2NHZ\n0WRjeGZafPm/HLIWhWtRa7KAJazBBUyP0aVwm9nGM9tJyJ31zgvnxI6mK6B6TIufUcXjJtaiMZYl\nYoQ2MP10nHdWqkRn1np+9HJHM64BlqZvKt2CFqPwLNeioVUjKxElXDozHOdjJcRd0HKabt9mL0VG\nptU3lfYabOvsJ1jCz9oM33n4RCU4O5pxEUuLb6q4V0eLlfB7MZ2r5Goxj0QljO+guKUq2LOzyrT7\nTaW8kx5Ygvw3nXzvt5YbyPSBw6/Ef86JSpA7mp6dVabdbyp3EF16LEHz+/CHrUCtnMMYprfPzIRG\nRolKkDuanp1VQ7NkmrPIUKjvhaJaLYRCHTMkKkHuaHp2Vg0dZpw9EhkCXFANTqMEb7oi7JghUQn7\naExmR9Ozs2poWHV3/EGlyARNZuifhoGmG9MxQ6ISRhqoaHY0YWNsZ9XQ2ansETdz8p7MkG3iLoR7\nMTpmSFSCWn4swwHR7BvfWTV0tqLuf/JxC0+Zoa8CY+WKIAE6ZkhUgtzR9OysWjr9XbASOL/jSm7l\nYM0pwdK+DKYEs53kxImO2NH07awaGha7iYVAVygy3Afwzg+RgE4ZzPqurxmJ8Xa7zL4LvvFGzcaY\nJsNoRWUYiMwl3rRdZ1/fbfYqtTOmynBe33v+XwDttJW96+xbVP863FGjM5kmQ8D/9+J8q7N4AnZx\nraj21RK4pnTxZljyoFEbvOa+iVMkyTcWsFqK6CQhvgz92+zAbz6eRKSTT2HVikYn/v9zvgzvTpHu\nfxlEwG3H8Zq9AAAAAElFTkSuQmCC\n",
"text/latex": [
"$$\\left \\{ \\theta_{0} : \\frac{C_{1} n - C_{3} C_{4}}{C_{2} n - C_{3}^{2}}, \\quad \\theta_{1} : \\frac{- C_{1} C_{3} + C_{2} C_{4}}{C_{2} n - C_{3}^{2}}\\right \\}$$"
],
"text/plain": [
"⎧ C₁⋅n - C₃⋅C₄ -C₁⋅C₃ + C₂⋅C₄⎫\n",
"⎪θ₀: ────────────, θ₁: ──────────────⎪\n",
"⎨ 2 2 ⎬\n",
"⎪ C₂⋅n - C₃ C₂⋅n - C₃ ⎪\n",
"⎩ ⎭"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import sympy\n",
"\n",
"# Define the variables (symbols) that we will use in our equation\n",
"theta0 = sympy.Symbol(\"theta0\")\n",
"theta1 = sympy.Symbol(\"theta1\")\n",
"c1 = sympy.Symbol(\"C1\")\n",
"c2 = sympy.Symbol(\"C2\")\n",
"c3 = sympy.Symbol(\"C3\")\n",
"c4 = sympy.Symbol(\"C4\")\n",
"n = sympy.Symbol(\"n\")\n",
"\n",
"# Solve the system of equations (eq = 0) for theta0 and theta1\n",
"theta_hats_symb = sympy.solve(\n",
" [\n",
" theta0 * c2 + theta1 * c3 - c1, # = 0\n",
" theta0 * c3 + theta1 * n - c4 # = 0\n",
" ], \n",
" [theta0, theta1])\n",
"\n",
"\n",
"# Print the answer (so pretty)\n",
"sympy.init_printing()\n",
"theta_hats_symb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Okay, so the above equation was pretty easy. However, as we move to more complex models we will quickly find that both computing the derivatives and even solving them will require sophisticated analytic and numeric machinery. \n",
"\n",
"**The combination of symbolic differentiation and numerical optimization have been at the heart of recent advances in AI.**\n",
"\n",
"--- \n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now compute the actual numerical values for the algebraic expressions above. Recall:\n",
"\n",
"\\begin{align*}\n",
"\\large C_1 & = \\large \\sum_{i=1}^n x_i y_i \\quad & \\large C_2 &= \\large \\sum_{i=1}^n x_i x_i \\\\\n",
"\\large C_3 & = \\large \\sum_{i=1}^n x_i \\quad & \\large C_4 & \\large = \\sum_{i=1}^n y_i \n",
"\\end{align*}"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 1.44252835 2.99260477]\n"
]
}
],
"source": [
"# Compute the actual numerical values for all the constants\n",
"subs = {\n",
" c1: np.sum(X * Y),\n",
" c2: np.sum(X * X),\n",
" c3: np.sum(X),\n",
" c4: np.sum(Y),\n",
" n: len(X)\n",
"}\n",
"\n",
"# For each theta value substitute the numerical values for the constans \n",
"# and evaluate the expression\n",
"theta_hat = np.array([float(theta_hats_symb[theta0].evalf(subs=subs)), \n",
" float(theta_hats_symb[theta1].evalf(subs=subs))])\n",
"print(theta_hat)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have just computed the \"optimal\" parameters for our original model. Returning to our earlier plot lets place the solution on the loss suface:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"theta_hat_point = go.Scatter3d(\n",
" x = [theta_hat[0]],\n",
" y = [theta_hat[1]],\n",
" z = [loss(f_line, theta_hat)],\n",
" mode = 'markers', \n",
" marker = dict(color='red')\n",
")\n",
"loss_surface.opacity = 0.9\n",
"fig = go.Figure(data = [loss_surface, theta_hat_point], layout = layout)\n",
"py.iplot(fig, filename=\"L19-p1-05\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also plot the line defined by these parameters:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Define my model:\n",
"yhat = [f_line(theta_hat, x) for x in X]\n",
"\n",
"# Visualize the data ---------------------\n",
"theta_line = go.Scatter(name=\"Linear Model\", x = X, y = yhat, mode = 'lines',\n",
" line = dict(color = 'red')\n",
")\n",
"theta_guess_line.line.color = \"pink\"\n",
"residual_lines = [\n",
" go.Scatter(x=[x,x], y=[y,yhat],\n",
" mode='lines', showlegend=False, line=dict(color='black', width = 0.5))\n",
" for (x, y, yhat) in zip(X, Y, yhat)\n",
"]\n",
"py.iplot([raw_data, theta_guess_line, theta_line] + residual_lines, filename=\"L19-p1-06\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Is this a better fit?\n",
"\n",
"---\n",
"
\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"residuals = yhat - Y\n",
"# Plot.ly plotting code\n",
"py.iplot(go.Figure(\n",
" data = [dict(x=X, y=residuals, mode='markers')],\n",
" layout = dict(title=\"Residual Plot\", xaxis=dict(title=\"X\"), yaxis=dict(title=\"Residual\"))\n",
"), filename=\"L19_p1_p07\")"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"py.iplot(ff.create_distplot([residuals], group_labels=['Residuals'],bin_size=0.2), \n",
" filename=\"L19_p1_p08\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"---\n",
"\n",
"# Quick Review\n",
"\n",
"\n",
"1. We introduced **parametric functions** as a basic modeling primitive\n",
"1. We introduced the **loss function** and in particular the **squared** loss to determine how well our model approximates our data.\n",
"1. We then used calculus to determine the model parameters that **minimize the loss function**\n",
"\n",
"We will now generalize these ideas to the **parametric** function of a hyperplane. \n",
"\n",
"
\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"# Linear Model Family\n",
"\n",
"The **linear model** is a generalization of our earlier two dimensional $y = m x + b$ model:\n",
"\n",
"$$ \\large\n",
"f_\\theta(x) = \\sum_{j=1}^p \\theta_j x_j\n",
"$$\n",
"\n",
"**Note:**\n",
"1. This is the model for a **single data point** $x$\n",
"1. The data point is **$p$-dimensional**\n",
"1. The subscript $j$ is indexing each of the $p$ dimensions\n",
"\n",
"To simplify the presentation we will use the following vector notation:\n",
"\n",
"$$ \\large\n",
"f_\\theta(x) = x^T \\theta\n",
"$$\n",
"\n",
"You can see this in the following figure:\n",
"\n",
"\n",
"\n",
"As we will see, shortly, this is a **very expressive parametric model**. \n",
"\n",
"In previous lectures we derived the **Least Squares** parameter values for this model. Let's derive them once more."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Matrix Notation\n",
"\n",
"Before we proceed we need to define some new notation to describe the entire dataset. We introduce the design (covariate) matrix $X$ and the response matrix (vector) $Y$ which encode the data:\n",
"\n",
"\n",
"\n",
"**Notes:**\n",
"1. The **rows** of $X$ correspond to records (e.g., users in our database)\n",
"1. The **columns** of $X$ correspond to **features** (e.g., the age, income, height).\n",
"1. **CS and Stats Terminology Issue (p=d):** In statistics we use $p$ to denote the number of columns in $X$ which corresponds to the number of *parameters* in the corresponding linear model. In computer science we use $d$ instead of $p$ to refer to the number of columns. The $d$ is in reference to the *dimensionality* of each record. You can see the differences in focus on the model and the data. If you get confused just flip the paper (or your computer) upside down and it will make more sense.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The Least Squares Loss Objective\n",
"\n",
"We can write the loss using the matrix notation:\n",
"\n",
"\\begin{align}\n",
"\\large L_D(\\theta) \n",
"& \\large = \n",
"\\frac{1}{n}\\sum_{i=1}^n \\left(Y_i - f_\\theta(X_i)\\right)^2 \\\\\n",
"& \\large = \n",
"\\frac{1}{n}\\sum_{i=1}^n \\left(Y_i - X_i \\theta \\right)^2 \\\\\n",
"& \\large = \n",
"\\frac{1}{n}\\left(Y - X \\theta \\right)^T \\left(Y - X \\theta \\right) \n",
"\\end{align}\n",
"\n",
"Note that the last line $X_i \\theta$ is the dot product of the \n",
"\n",
"\n",
"\n",
"We can further simply the last expression by expanding the product:\n",
"\n",
"\\begin{align} \\large\n",
"L_D(\\theta) \n",
"& \\large = \n",
"\\frac{1}{n}\\left(Y - X \\theta \\right)^T \\left(Y - X \\theta \\right) \\\\\n",
"& \\large = \n",
"\\frac{1}{n}\\left(Y^T \\left(Y - X \\theta \\right) - \\left(X \\theta\\right)^T \\left(Y - X \\theta \\right) \\right) \\\\\n",
"& \\large = \n",
"\\frac{1}{n}\\left( \n",
" Y^T Y - Y^T X \\theta - \\left(X \\theta \\right)^T Y + \\left(X \\theta \\right)^T X \\theta \n",
"\\right) \\\\\n",
"& \\large =\n",
"\\frac{1}{n} \\left( \n",
" Y^T Y - 2 Y^T X \\theta + \\theta^T X^T X \\theta \n",
"\\right)\n",
"\\end{align}\n",
"\n",
"**Note:** Because $\\left(X \\theta \\right)^T Y$ is a scalar it is equal to its transpose $Y^T X \\theta$ \n",
"and therefore $- Y^T X \\theta - \\left(X \\theta \\right)^T Y = -2 Y^T X \\theta$.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"\n",
"# Computing the Loss Minimizing $\\theta$ \n",
"Recall our goal is to compute:\n",
"\n",
"\\begin{align} \\large\n",
"\\hat{\\theta} = \\arg \\min_\\theta L(\\theta)\n",
"\\end{align}\n",
"\n",
"\n",
"Which we can compute by taking the **gradient** of the loss and setting it equal to zero.\n",
"\n",
"\n",
"\\begin{align} \\large\n",
"\\nabla_\\theta L(\\theta) \n",
"& \\large =\n",
"\\frac{1}{n} \\left( \n",
" \\nabla_\\theta Y^T Y - \\nabla_\\theta 2 Y^T X \\theta + \\nabla_\\theta \\theta^T X^T X \\theta \n",
"\\right) \\\\\n",
"& \\large =\n",
"\\frac{1}{n} \\left( \n",
" 0 - 2 X^T Y + 2 X^T X \\theta \n",
"\\right) \n",
"\\end{align}\n",
"\n",
"The above gradient derivation uses the following identities:\n",
"1. $\\large \\nabla_\\theta \\left( A \\theta \\right) = A^T$\n",
"1. $\\large \\nabla_\\theta \\left( \\theta^T A \\theta \\right) = A\\theta + A^T \\theta$ and $\\large A = X^T X$ is symmetric\n",
"\n",
"Setting the gradient equal to zero we get the famous **Normal Equations**:\n",
"\n",
"$$\\large\n",
"X^T X \\theta = X^T Y\n",
"$$\n",
"\n",
"\n",
"\n",
"$$\\large\n",
" \\theta = \\left(X^T X\\right)^{-1} X^T Y\n",
"$$\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Connection to the MLE\n",
"\n",
"\n",
"We can write a probabilistic model of the data:\n",
"\n",
"$$ \\large\n",
"Y = x^T \\theta + \\epsilon_\\text{noise}\n",
"$$\n",
"\n",
"where we assume:\n",
"\n",
"$$ \\large\n",
"\\epsilon_\\text{noise} \\sim \\textbf{Normal}\\left(0, \\sigma_\\text{noise} \\right)\n",
"$$\n",
"\n",
"which implies:\n",
"\n",
"\\begin{align} \\large\n",
"Y \\sim \\textbf{Normal}\\left(x^T \\theta, \\sigma_\\text{noise} \\right)\n",
"\\end{align}\n",
"\n",
"If we assume the data are **i**ndependent and **i**dentically **d**istributed (iid) according to this model then the Maximum Likelihood parameter $\\theta$ can be written as:\n",
"\n",
"\\begin{align}\\large\n",
"\\hat{\\theta}_\\text{MLE} \n",
"&\\large = \\arg \\max_\\theta \\log \n",
"\\left( \\prod_{i=1}^n N\\left(y_i; \\,\\, x_i^T \\theta, \\, \\sigma_\\text{noise} \\right) \\right) \\\\\n",
"&\\large = \\arg \\min_\\theta -\\log \n",
"\\left( \\prod_{i=1}^n N\\left(y_i; \\,\\, x_i^T \\theta, \\, \\sigma_\\text{noise} \\right) \\right) \\\\\n",
"&\\large =\\arg \\min_\\theta \n",
"\\left( \\sum_{i=1}^n -\\log N\\left(y_i; \\,\\, x_i^T \\theta, \\, \\sigma_\\text{noise} \\right) \\right) \\\\\n",
"&\\large =\\arg \\min_\\theta \n",
"\\left( \\sum_{i=1}^n -\\log \\left( \\frac{1}{\\sqrt{2\\sigma_\\text{noise}^2\\pi}} \n",
"\\exp \\left(- \\frac{\\left(y_i - x_i^T \\theta\\right)^2}{2\\sigma_\\text{noise}^2} \\right) \\right) \\right) \\\\\n",
"&\\large =\\arg \\min_\\theta \n",
" \\sum_{i=1}^n \n",
" \\frac{\\left(y_i - x_i^T \\theta\\right)^2}{2\\sigma_\\text{noise}^2} \\\\\n",
"&\\large =\\arg \\min_\\theta \n",
"\\sum_{i=1}^n \\left(y_i - x_i^T \\theta\\right)^2\\\\\n",
"&\\large = \\hat{\\theta}_\\text{LS}\n",
"\\end{align}\n",
"\n",
"**Notice:** \n",
"1. did not need to assume $f_\\theta(x) = x^T\\theta$\n",
"1. we did need to assume **additive iid Gaussian noise** (reasonable?)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"
\n",
"\n",
"\n",
"# The Normal Equation\n",
"\n",
"The normal equations define the least squares optimal parameter value $\\hat{\\theta}$ for the linear regression model:\n",
"\n",
"$$\\large\n",
" \\hat{\\theta} = \\left(X^T X\\right)^{-1} X^T Y\n",
"$$\n",
"\n",
"and have the pictorial interpretation of the matrices:\n",
"\n",
"\n",
"\n",
"It is worth noting that for common settings where $n >> p$ (i.e., there are many records and only a few dimensions) the matrix $X^T X$ and $X^T Y$ are small relative to the data. That is they summarize the data.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Working with the Normal Equation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have the normal equation let's apply it to compute the optimal $\\theta$ for our toy dataset. "
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.1904575072\n"
]
}
],
"source": [
"# Because X is 1 dimensional ... we need to use scalar inversion\n",
"theta_hat = (X.T @ X)**-1 * (X.T @ Y)\n",
"print(theta_hat)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plotting our least-squares regression line:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"yhat = X * theta_hat\n",
"\n",
"# Visualize the data ---------------------\n",
"normal_equation_line = go.Scatter(name=\"Normal Equation\", x = X, y = yhat, mode = 'lines',\n",
" line = dict(color = 'red')\n",
")\n",
"py.iplot([raw_data, normal_equation_line], filename=\"L19_p1_p09\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Stuck on the origin! Why?**\n",
"\n",
"---\n",
"
\n",
"\n",
"\n",
"# The Bias (Intercept) Term\n",
"Remember our general linear model: \n",
"\n",
"$$ \\large\n",
"f_\\theta(x) = \\sum_{j=1}^p \\theta_j x_j\n",
"$$\n",
"\n",
"doesn't have the **\"bias\"** term $b$ we had in our equation for a line:\n",
"\n",
"$$ \\large\n",
"y = m x + b\n",
"$$ \n",
"\n",
"\n",
"**Can we manipulate the _data_ to add a bias term to our model?**\n",
"\n",
"
\n",
"\n",
"---\n",
"\n",
"We begin our journey into **feature engineering** by adding an extra **constant** (typically 1) to each record:\n",
"\n",
"\n",
"\n",
"This extra **feature** will allow our general linear model to represent a **bias** term.\n",
"\n",
"We typically denote feature transformation $\\phi(X)$ (**fea**ture transformation?). For notational convenience I will occasionally write $\\Phi = \\phi(X)$.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[-0.50183952, 1. ],\n",
" [ 1.80285723, 1. ],\n",
" [ 0.92797577, 1. ],\n",
" [ 0.39463394, 1. ],\n",
" [-1.37592544, 1. ]])"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def phi(x):\n",
" return np.array([x, 1.])\n",
"\n",
"Phi = np.array([phi(x) for x in X]) # more efficient: Phi = np.vstack((X, np.ones(n))).T\n",
"Phi[:5,]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the normal equations again:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 1.44252835 2.99260477]\n"
]
}
],
"source": [
"theta_hat = np.linalg.inv(Phi.T @ Phi) @ Phi.T @ Y\n",
"print(theta_hat)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A better way to solve\n",
"\n",
"A more numerically stable and efficient way to compute $A^{-1} b$ is to use `np.linalg.solve` which computes the solution to $A \\theta = b$ rather than first computing inverse of $A$ and then multiplying by $b$."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 1.44252835 2.99260477]\n"
]
}
],
"source": [
"theta_hat = np.linalg.solve(Phi.T @ Phi, Phi.T @ Y)\n",
"print(theta_hat)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plotting the fit once more. Notice this time `yhat` (i.e., $\\hat{y}$) is computed using `Phi` (i.e., $\\Phi$) instead of `X`."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"yhat = Phi @ theta_hat # Phi instead of X\n",
"\n",
"# Visualize the data ---------------------\n",
"normal_equation_line2 = go.Scatter(name=\"Normal Equation\", x = X, y = yhat, mode = 'lines',\n",
" line = dict(color = 'red')\n",
")\n",
"py.iplot([raw_data, normal_equation_line2], filename=\"L19_p1_p10\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Success! **\n",
"\n",
"---\n",
"\n",
"## Moving to a Higher Dimensional Plane\n",
"\n",
"Something interesting just happened. The linear model went from **one** dimension to **two** dimensions. We added a **feature** (i.e., a dimension) to each record. In a sense we took points on the `(x,y)` plane above and moved them into 3-dimensions: `(x, 1, y)`.\n",
"\n",
"We can actually plot the data in 3 dimensions and see the corresponding plane of best fit:"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw3d = go.Scatter3d(\n",
" name = \"Raw Data\",\n",
" x = Phi[:,0],\n",
" y = Phi[:,1],\n",
" z = Y,\n",
" mode = 'markers',\n",
" marker = dict(size=3)\n",
")\n",
"\n",
"(u,v) = np.meshgrid(np.linspace(-3,3,20), np.linspace(-3,3,20))\n",
"coords = np.vstack((u.flatten(),v.flatten())).T\n",
"ycoords = coords @ theta_hat\n",
"\n",
"fit_plane = go.Surface(\n",
" name = \"Fitting Hyperplane\",\n",
" x = np.reshape(coords[:,0], (20,20)),\n",
" y = np.reshape(coords[:,1], (20,20)),\n",
" z = np.reshape(ycoords, (20,20)),\n",
" opacity = 0.8,\n",
" cauto = False,\n",
" showscale = False,\n",
" colorscale = [[0, 'rgb(255,0,0)'], [1, 'rgb(255,0,0)']]\n",
")\n",
"\n",
"origin = go.Scatter3d(\n",
" name = \"Origin (0,0,0)\",\n",
" x = [0],\n",
" y = [0],\n",
" z = [0],\n",
" mode = 'markers', \n",
" marker = dict(color='black')\n",
")\n",
"\n",
"data_plane = go.Mesh3d(\n",
" name = \"2D Plot\",\n",
" x = [-3,3,3,-3],\n",
" y = [1,1,1,1],\n",
" z = [-10,-10,10,10],\n",
" i = [0,1],\n",
" j = [1,2],\n",
" k = [3,3],\n",
" opacity = 0.5,\n",
" color = 'gray'\n",
")\n",
"\n",
"layout = go.Layout(\n",
" scene=go.Scene(\n",
" xaxis=go.XAxis(title='X'),\n",
" yaxis=go.YAxis(title='Bias'),\n",
" zaxis=go.ZAxis(title='Y'),\n",
" camera=dict(eye=dict(x=0, y=-1, z=0))\n",
" )\n",
")\n",
"\n",
"loss_surface.opacity = 0.9\n",
"fig = go.Figure(data = [raw3d, fit_plane, origin, data_plane], layout = layout)\n",
"py.iplot(fig, filename=\"L19_p1_p11\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Spin the above plot around, because it is pretty and also you will learn something.\n",
"\n",
"**Notice:** \n",
"1. The **red** plane is the plane corresponding to the model which passes through the points.\n",
"1. The **red** plane still passes through the origin (the big black circle).\n",
"1. The **gray** plane is the plane corresponding to the 2D plot in which the data originally lived. \n",
"1. The intersection of the **red** and **gray** planes is the line of best fit with the bias term added!\n",
"\n",
"You will notice that our data lives on the plane 3 dimensions and that the plane (in red) passes through the origin. \n",
"\n",
"This is a key theme with linear regression. That is by transforming the input we can often make complex data live on a plane that passes through the origin."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A Note on the Bias Term"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The **bias** term can be crucial for modeling data. Most software packages will have an option to add a bias term to the model. This is often implemented by transforming the input data though one has to be careful about adding how the bias term is treated in certain model tuning procedures (e.g., regularization which we will cover shortly)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Linear Regression Software"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In practice it is generally better to use existing software packages for linear regression. In Python, [scikit-learn](http://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares) is the standard package for regression. \n",
"\n",
"![scikit-learn logo](http://scikit-learn.org/stable/_static/scikit-learn-logo-small.png)\n",
"\n",
"Here we will take a very brief tour of how to use scikit-learn for regression. Over the next few weeks we will use scikit-learn for a range of different task."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fit a Model"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Slope: [ 1.44252835]\n",
"Intercept: 2.99260477199\n"
]
}
],
"source": [
"from sklearn import linear_model\n",
"reg = linear_model.LinearRegression(fit_intercept=True)\n",
"\n",
"reg.fit(np.array([X]).T, Y)\n",
"print(\"Slope:\", reg.coef_)\n",
"print(\"Intercept:\", reg.intercept_)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Make Predictions"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"yhat = reg.predict(np.array([X]).T)\n",
"py.iplot([raw_data, dict(x=X, y=yhat, name=\"Scikit Fit\")], filename=\"L19_p1_p12\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next we will explore the art and science of Featue Engineering."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:ds100]",
"language": "python",
"name": "conda-env-ds100-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}