The new Fall 2017 website is here

Combining data, computation, and inferential thinking, data science is redefining how people and organizations solve challenging problems and understand the world. This intermediate level class bridges between Data 8 and upper division computer science and statistics courses as well as methods courses in other fields.

In this class, we will explore the data science lifecycle, including question formulation, data collection and cleaning, exploratory data analysis and visualization, statistical inference and prediction​, and decision-making.​ This class focuses on quantitative critical thinking​ and key principles and techniques needed to carry out this cycle. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.

Important Information:

  • When: Lectures Tuesdays and Thursdays from 12:30 to 2:00
  • Where: Soda Hall Room 306 (Hewlett Packard Auditorium)
  • What: tentative lecture schedule
  • News: We will post updates about the class on Piazza
Lifecycle Logo

If you have enrolled in the wait-list please complete the following Background Survey we will use this to help in admitting students into the class. Please sign up for Piazza to follow updates on the wait list.

Office Hours, Section, and Lab Schedule

Goals

  • Prepare students for advanced Berkeley courses in data-management (CS186), machine learning CS189), and statistics (Stat-154), by providing the necessary foundation and context

  • Enable students to start careers as data scientists by providing experience in working with real-world data, tools, and techniques

  • Empower student to apply computational and inferential thinking to tackle real-world problems

Prerequisites

While we are working to make this class widely accessible in the initial (beta) version of the class we plan to require the following (or equivalent):

  1. Foundations of Data Science: Data8 covers much of the material in DS100 but at an introductory level. Data8 provides basic exposure to python programming and working with tabular data as well as visualization, statistics, and machine learning.

  2. Computing: The Structure and Interpretation of Computer Programs CS61a or Computational Structures in Data Science CS88. These courses provide additional background in python programming (e.g., for loops, lambdas, debugging, and complexity) that will enable DS100 to focus more on the concepts in Data Science and less on the details of programming in python.

  3. Math: Linear Algebra (Math 54 or EE 16a): We will need some basic concepts like linear operators, eigenvectors, derivatives, and integrals to enable statistical inference and derive new prediction algorithms. This may be satisfied concurrently to DS100.

Instructors

Joey Gonzalez
Joseph E. Gonzalez
Joseph Hellerstein
Joseph Hellerstein
Deborah Nolan
Deborah Nolan
Bin Yu
Bin Yu