Syllabus

Jump to:


About Data 100

Combining data, computation, and inferential thinking, data science is redefining how people and organizations solve challenging problems and understand their world. This intermediate level class bridges between Data8 and upper division computer science and statistics courses as well as methods courses in other fields. In this class, we explore key areas of data science including question formulation, data collection and cleaning, visualization, statistical inference, predictive modeling, and decision making.​ Through a strong emphasis on data centric computing, quantitative critical thinking, and exploratory data analysis, this class covers key principles and techniques of data science. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.


Goals

  • Prepare students for advanced Berkeley courses in data-management, machine learning, and statistics, by providing the necessary foundation and context
  • Enable students to start careers as data scientists by providing experience working with real-world data, tools, and techniques
  • Empower students to apply computational and inferential thinking to address real-world problems


Prerequisites

While we are working to make this class widely accessible, we currently require the following (or equivalent) prerequisites. We are not enforcing prerequisites during enrollment. However, all of the prerequisties will be used starting very early on in the class. It is your responsibility to know the material in the prerequisites.:

  • Foundations of Data Science: Data8 covers much of the material in Data 100 but at an introductory level. Data8 provides basic exposure to python programming and working with tabular data as well as visualization, statistics, and machine learning.
  • Computing: The Structure and Interpretation of Computer Programs (CS 61A) or Computational Structures in Data Science (CS 88). These courses provide additional background in python programming (e.g., for loops, lambdas, debugging, and complexity) that will enable Data 100 to focus more on the concepts in Data Science and less on the details of programming in python.
  • Math: Linear Algebra (Math 54, EE 16a, or Stat89a): We will need some basic concepts like linear operators, eigenvectors, derivatives, and integrals to enable statistical inference and derive new prediction algorithms. This may be satisfied concurrently to Data 100.


Online Format

This fall, Data 100 will be run entirely online. This section details exactly how each component of the course will operate. But here’s a nice high-level “typical week in the course”:

Monday Tuesday Wednesday Thursday Friday
Office Hours Office Hours Office Hours Office Hours Office Hours
Live lab Lecture released Discussion Lecture released Homework released
Lab due, Quick Check due     Homework due Lab released

Note that these deadlines are subject to change.

  • To see when any live events are scheduled, check the Calendar.
  • To see when lectures, discussions, and assignments are released (and due), check the Home Page.

Lecture

  • There are 2 lectures per week.
  • Lectures will be entirely pre-recorded, in a format that is optimized for online learning (short 5-10 minute videos with conceptual problems in between). Lecture videos will be released on the mornings of Tuesday and Thursday.
    • Some of these will be from previous semesters, and some will be recorded this fall by the instructors.
    • Lecture videos will be posted on YouTube. Each “lecture” will be an html page linked on the course website, containing videos and links to slides and code.
    • There are “Quick Check” conceptual questions in between each lecture video, linked on the lecture webpage. See below for more details.
    • Each lecture will also have a Piazza thread for students to ask questions.

Note: Alongside each lecture are textbook readings. Textbook readings are purely supplementary, and may contain material that is not in scope (and may also not be comprehensive).


Quick Checks

Quick Checks, as mentioned above, are short conceptual questions embedded into each lecture, in the form of Google Forms. These are meant for you to check your understanding of the concepts that were just introduced. Since there are roughly 26 lectures, there are roughly 26 Quick Checks, each of which consists of 4-7 Google Forms.

Quick Checks are graded on completion. That is, your score on them does not matter, you just need to do them. For each lecture, you will be required to submit a code to Gradescope that you will receive after completing one of the Google Forms for that lecture. These are due the Monday after the lecture is released. (Though we will assign grades using Gradescope, we will also collect emails on the Google Forms themselves.)


Homeworks and Projects

Homeworks are week-long assignments that are designed to help students develop an in-depth understanding of both the theoretical and practical aspects of ideas presented in lecture. Projects are two-week-long assignments that integrate these ideas with real-world datasets.

  • In a typical week, homework is released on Friday and is due the following Thursday at 11:59PM.
  • Near the midterm, or during weeks in which a project is assigned, you will have more than one week to work on the current assignment.
  • One or two homeworks will be on-paper written assignments; the rest will be Jupyter notebooks.
  • Homeworks have both visible and hidden autograder tests. The visible tests are mainly sanity checks, e.g. a probability is <= 1, and are visible to students while they do the assignment. The hidden tests generally check for correctness, and are invisible to students while they are doing the assignment.
  • The primary form of support students will have for homeworks and projects are the office hours we’ll host, and Piazza.
  • Homeworks must be completed individually.


Labs

Labs are shorter programming assignments designed to give students familiarity with new ideas.

  • In a typical week, lab is released on Friday and is due the following Monday.
  • All lab autograder tests are visible.
  • To help with lab, we will host live lab sections on Monday at various times, in which GSIs will walk through the assignment via Zoom. See the Calendar for when these are scheduled.
  • Students can also get help with labs at office hours and on Piazza.


Discussions

Discussion sections are meant to allow students a chance to discuss conceptual ideas and solve problems with other students, with the help of a GSI (this becomes slightly harder given the fact that this course is being offered completely remotely). Each discussion consists of a worksheet.

  • In a typical week, we will release the discussion worksheet on Wednesday morning.
  • There are two “pathways” we envision students taking when it comes to consuming discussion content.
    1. Watching a pre-recorded discussion video.
      • Each discussion worksheet will be accompanied with a GSI-created video walkthrough, released at the same time. Students should watch this video soon after it is released.
      • With any lingering questions, students should come to office hours.
    2. Coming to a live Zoom discussion section.
      • We will be holding live discussion sections at several times on Wednesdays. In the first few weeks, students will be able to attend whichever section they desire, but we will eventually require you to sign up for a particular section if you want to keep attending (this is to keep sections small and personal).


Office Hours

  • We plan on hosting roughly 10 hours of office hours each weekday. These hours are listed on the Calendar.
  • OH will serve as a one-stop shop for students to get help with assignments.
  • Office Hours can be accessed via oh.ds100.org, where students add themselves to the “queue” and specify the assignment they need help on. Once it’s their turn, they will be provided with a Zoom link to join, in order to get help from staff.
  • The instructors will also be hosting conceptual office hours. These will be reflected on the Calendar.
  • We are also holding “lost office hours” once a week. These are designed to accommodate students who are behind on material and would like help catching up. These are meant for conceptual questions only, not for assignment help. These will also be reflected on the Calendar.


Exams

There will be one midterm exam, on October 15th (7-9PM PDT), and a final exam on December 15th (7-10PM PDT).

Alternate exams will only be given to students with a documented conflict, or to those who are in timezones very far from PDT, or to those who have extenuating circumstances.


Policies

Undergraduate Grading Scheme (for students enrolled in Data C100):

Category Weight Details
Homeworks 30% 9, with 2 drops
Labs 10% Roughly 13, with 3 drops
Projects 15% 7.5% each (2, with 0 drops)
Quick Checks 5%  
Midterm Exam 15%  
Final 25%  

Graduate Grading Scheme (for students enrolled in Data C200):

Category Weight Details
Homeworks 30% 9, with 2 drops
Projects 15% 7.5% each (2, with 0 drops)
Final Project 15%  
Midterm Exam 15%  
Final 25%  

Note that a ninth homework and second homework drop were announced partway through the semester.


Late Policy

All assignments are due at 11:59 pm on the due date specified on the syllabus. Gradescope is where all assignments are submitted. Extensions are only provided to students with DSP accommodations, or in the case of exceptional circumstances.

  • Homeworks and labs will not be accepted late.
    • Gradescope may allow you to make late submissions, but you will later be given a 0.
  • Projects are marked down by 10% per day, up to two days. After two days, project submissions will not be accepted.
    • Submission times are rounded up to the next day. That is, 2 minutes late = 1 day late.


Collaboration Policy and Academic Dishonesty

Assignments

Data science is a collaborative activity. While you may talk with others about the homework, we ask that you write your solutions individually in your own words. If you do discuss the assignments with others please include their names at the top of your notebook. Keep in mind that content from assignments will likely be covered on both the midterm and final.

If we suspect that you have submitted plagiarized work, we will call you in for a meeting. If we then determine that plagiarism has occurred, we reserve the right to give you a negative full score (-100%) or lower on the assignments in question, along with reporting your offense to the Center of Student Conduct.

Rather than copying someone else’s work, ask for help. You are not alone in this course! The entire staff is here to help you succeed. If you invest the time to learn the material and complete the assignments, you won’t need to copy any answers. (taken from 61A)

We also ask that you do not post your assignment solutions publicly.

Exams

Cheating on exams is a serious offense. We have methods of detecting cheating on exams – so don’t do it! Students caught cheating on any exam will fail this course. We will be following the EECS departmental policy on Academic Honesty, so be sure you are familiar with it.


We want you to succeed!

If you are feeling overwhelmed, visit our office hours and talk with us. We know college can be stressful – and especially so during the COVID-19 pandemic – and we want to help you succeed.