Syllabus

⚠️ This content is archived as of March 2026 and is retained exclusively for reference. Find current offerings.

This syllabus is still under development and is subject to change.

Week	Lecture	Date	Topic	Lab	Discussion	Homework
1	1	8/23/18	Course Overview, Data Design and Sources of Bias [slides] Demo Notebook (also on Datahub) Textbook: Data Science Life Cycle Textbook: Data Design Screencast		Disc0 Solutions	HW0 Released HW0 Solutions
2	2	8/28/18	Data Manipulation with Pandas I [slides] Textbook: Tabular Data Demo Notebook (also on Datahub) Pandas Basics (HTML) Case Study (HTML) Screencast	Lab1 Solutions	Disc1 Solutions
2	3	8/30/18	Data Manipulation with Pandas II [slides] Demo Notebook on Datahub Case Study (HTML) Enrollment Exercise (HTML) Groupby & Pivot (HTML) Screencast
3	4	9/4/18	Data Cleaning & EDA [slides] Textbook: Data Cleaning Demo Notebook (ZIP) Groupby Pivot and Merge (HTML) Screencast	Lab2 Solutions	Lab2	HW0 Due, HW1 Released HW1 Solutions
3	5	9/6/18	EDA and Visualization [slides] Textbook: EDA EDA_and_Cleaning notebook (HTML) code and data (includes notebooks and scripts as needed) Screencast
4	6	9/11/18	Visualization and Data Transformations [slides] Textbook: Data Visualization Screencast	Lab3 Solutions	Disc3 Solutions TA Slides
4	7	9/13/18	Working with Text [slides] Textbook: Working With Text Screencast code and data (includes notebooks and scripts as needed)			HW1 Due, HW2 Released HW2 Solutions
5	8	9/18/18	Modeling and Estimation [slides] Textbook: Modeling and Estimation Estimation notebook (HTML Version) convex-functions notebook (HTML Version) Screencast code and data (includes notebooks and scripts as needed)	Lab4 Solutions	Disc4 Solutions TA Slides
5	9	9/20/18	Modeling and Estimation II [slides] Textbook: Gradient Descent Demo Notebook HTML Version Screencast
6	10	9/25/18	Generalization and Empirical Risk Minimization [slides] Textbook: Probability and Generalization Screencast	Lab5 Solutions	Disc5 Solutions	HW2 Due, HW3 Released HW3 Solutions
6	11	9/27/18	Linear Regression and Feature Engineering [slides] Textbook: Linear Regression Textbook: Feature Engineering Notebook (HTML) Notebook (zip) Screencast
7	12	10/2/18	Bias-Variance Tradeoff and Regularization [slides] Textbook: Bias-Variance Tradeoff Notebook (HTML) Notebook (zip) Screencast	Lab6 Solutions	Disc6 Solutions
7	13	10/4/18	Cross-Validation and Regularization [slides] Textbook: Regularization Textbook: Cross-Validation Bias-Variance and Regularization Notebook (HTML Version) Feature Engineering Part 1 Notebook (HTML Version) Feature_Engineering Part 2 Notebook (HTML Version) Make Toy Data Notebook (HTML Version) Screencast
8	14	10/9/18	Ethics [slides] Screencast	Lab7 Solutions	Disc7 Solutions	HW3 Due, Proj1 Released Proj1 Solutions
8	15	10/11/18	Midterm Review Part 1 [slides] Screencast
9	16	10/16/18	Midterm Review Part 2 [slides] Screencast	Midterm Review (Lab8)	Midterm OH
9	17	10/18/18	Classification and Logistic Regression I [slides] Textbook: Classification Extra Plots notebook (HTML Version) Logistic Regression Part 1 notebook (HTML Version) Logistic Regression Part 2 notebook (HTML Version) Notebook (zip) Screencast
10	18	10/23/18	Classification and Logistic Regression II [slides] Screencast	Project 1 OH	Disc8 Solutions
10	19	10/25/18	Probability theory, Monte Carlo, Bootstrapping [slides] Central Limit Theorem notebook PRNG notebook Restaurant Estimation notebook Screencast			Proj1 Due, HW4 Released HW4 Solutions
11	20	10/30/18	Hypothesis Testing I [slides] Textbook: Statistical Inference Notebook (ipynb) Notebook (html) Screencast	Lab9 Solutions	Disc9 Solutions
11	21	11/1/18	Numerical issues, condition numbers, higher dimensions Screencast Notebooks (HTML) KL Divergence Numerical Chaos Monte Carlo ND Condition Number Volumes in ND
12	22	11/6/18	SQL [slides] Textbook: SQL Notebook (ipynb) Notebook (html) Screencast	Lab10 Solutions	Disc10 Solutions
12	23	11/8/18	Advanced SQL [slides] Notebook (ipynb) Notebook (html) Screencast			HW4 Due, HW5 Released HW5 Solutions
13	24	11/13/18	Big Data [slides] Slides in PPT Format Spark Demo (HTML) Spark Demo (ipynb) Screencast	Lab11 Solutions	Lab11	Proj2A Released, Grad Project Released Proj2A Solutions
13	25	11/15/18	Distributed Computing [slides] Screencast Ray Documentation Notebook
14	26	11/20/18	A/B Testing [slides] Screencast Demo Notebook (HTML)	Project 2 OH	Break	HW6 Released, Proj2B Released HW6 Solutions Proj2B Solutions
14	27	11/22/18	Thanksgiving Break
15	28	11/27/18	Data Commons [slides] Screencast	Lab12	Lab12	HW5 Due
15	29	11/29/18	Conclusion [slides] Screencast			Proj2A Due
16	30	12/4/18	RRR week [slides] Screencast			Proj2B Due
16	31	12/6/18	RRR week Screencast			HW6 Due, Grad Project Due
17	32	12/11/18
17	33	12/13/18	Final Exam (11:30am-2:30pm)

Syllabus

Course Overview, Data Design and Sources of Bias [slides]

Data Manipulation with Pandas I [slides]

Data Manipulation with Pandas II [slides]

Data Cleaning & EDA [slides]

EDA and Visualization [slides]

Visualization and Data Transformations [slides]

Working with Text [slides]

Modeling and Estimation [slides]

Modeling and Estimation II [slides]

Generalization and Empirical Risk Minimization [slides]

Linear Regression and Feature Engineering [slides]

Bias-Variance Tradeoff and Regularization [slides]

Cross-Validation and Regularization [slides]

Ethics [slides]

Midterm Review Part 1 [slides]

Midterm Review Part 2 [slides]

Classification and Logistic Regression I [slides]

Classification and Logistic Regression II [slides]

Probability theory, Monte Carlo, Bootstrapping [slides]

Hypothesis Testing I [slides]

Numerical issues, condition numbers, higher dimensions

SQL [slides]

Advanced SQL [slides]

Big Data [slides]

Distributed Computing [slides]

A/B Testing [slides]

Thanksgiving Break

Data Commons [slides]

Conclusion [slides]

RRR week [slides]

RRR week

Final Exam (11:30am-2:30pm)