Principles and Techniques of Data Science

UC Berkeley, Summer 2020

  • All announcements are on Piazza. Make sure you are enrolled and active there.
  • The Syllabus contains a detailed explanation of how each course component will work this summer, given that the course is being taught entirely online.
  • The scheduling of all weekly events is in the Calendar.
  • Zoom links for live events: @11 on Piazza.


Week 1

Jun 22

Lecture 1 Course Overview (slides) (video) (code)

Ch. 1

Discussion 1 Prerequisite Review (video) (solutions)

Homework 1 Prerequisites (due Jun. 24)

Survey 1 Week 1 Survey (due Jun. 24)

Jun 23

Lecture 2 Data Sampling and Probability

Ch. 2

Lab 1 Prerequisite Coding (due Jun. 23)

Jun 24

Lecture 3 Random Variables

Ch. 12.1-12.2

Discussion 2 Random Variables (video) (solutions)

Jun 25

Lecture 4 SQL

Ch. 9

Homework 2 Trump Sampling (due Jun. 28)

Lab 2 SQL (due Jun. 25)

Jun 26

Live Session 1 Random Variables, SQL (video) (notes)

Week 2

Jun 29

Lecture 5 Pandas I

Ch. 3

Discussion 3 SQL (video) (solutions)

Project 1 Food Safety (due Jul. 6)

Survey 2 Week 2 Survey (due Jul. 1)

Jun 30

Lecture 6 Pandas II

Ch. 3

Lab 3 Pandas I (due Jun. 30)

Jul 1

Lecture 7 Data Cleaning and EDA

Ch. 4.1,Ch. 5

Discussion 4 Pandas II (video) (solutions)

Jul 2

Lecture 8 Regular Expressions

Ch. 8

Lab 4 Data Cleaning and EDA (due Jul. 2)

Live Session 2 Pandas Demo (video) (code) (code HTML)

Jul 3

N/A (Holiday)

Week 3

Jul 6

Lecture 9 Visualization I

Ch. 6.1-6.3

Discussion 5 Regex (video) (solutions)

Homework 3 Bike Sharing (due Jul. 12)

Survey 3 Week 3 Survey (due Jul. 8)

Jul 7

Lecture 10 Visualization II

Ch. 6.4-6.6

Lab 5 Transformations and KDEs (due Jul. 7)

Jul 8

Lecture 11 Modeling

Ch. 10

Discussion 6 Visualizations and Transformations (video) (notebook)(solutions)

Homework 4 Trump

Jul 9

Exam Midterm 1 (7-8:30PM)

Lab 6 Modeling and Loss Functions (due Jul. 12)

Jul 10

Live Session 3 Lecture Recap (12-1PM)

Week 4

Jul 13

Lecture 12 Simple Linear Regression

Ch. 13.1-13.3

Discussion 7 Correlation

Jul 14

Lecture 13 Ordinary Least Squares

Ch. 13.4

Lab 7 Regression

Jul 15

Lecture 14 Feature Engineering

Ch. 14

Discussion 8 Geometric Least Squares & One Hot Encoding

Jul 16

Lecture 15 Bias-Variance Tradeoff

Ch. 12.3, Ch. 15.1-15.2

Homework 5 Regression

Lab 8 Feature Engineering

Jul 17

Live Session 4 Lecture Recap (12-1PM)

Week 5

Jul 20

Lecture 16 Regularization & Cross-Validation

Ch. 16, Ch. 15.3

Discussion 9 Bias Variance & Cross Validation

Homework 6 Housing

Jul 21

Lecture 17 Gradient Descent

Ch. 11

Lab 9 Cross Validation

Jul 22

Lecture 18 Logistic Regression I

Ch. 17.1-17.3

Discussion 10 Gradient Descent & Logistic Regression

Jul 23

Lecture 19 Logistic Regression II and Classification

Ch. 17.4-17.7

Homework 7 Gradient Descent & Logistic Regression

Lab 10 Logistic Regression

Jul 24

Live Session 5 Lecture Recap (12-1PM)

Week 6

Jul 27

Exam Midterm 2 (7-8:30PM)

Discussion 11 Cross Entropy Loss and Classification

Jul 28

Lecture 20 Inference for Modeling

Ch. 18.1, 18.3

Lab 11 Bootstrap the model parameters

Jul 29

Lecture 21 Decision Trees

Discussion 12 Decision Trees & Random Forests

Project 2 Spam/Ham

Jul 30

Lecture 22 Dimensionality Reduction & PCA

Lab 12 Decision Trees

Jul 31

Live Session 6 Lecture Recap (12-1PM)

Week 7

Aug 3

Lecture 23 PCA

Discussion 13 PCA

Aug 4

Lecture 24 Clustering

Lab 13 Clustering

Aug 5

Lecture 25 Guest Lecture

Discussion 14 Clustering

Homework 8 PCA

Aug 6

Lecture 26 Conclusion (live)

Week 8

Aug 10

Lecture Review

Aug 11

Lecture Review

Aug 12

Exam Final Part 1 (7-8:30PM)

Aug 13

Exam Final Part 2 (7-8:30PM)