Resources

Exam Resources

Semester Midterm 1 Midterm 2 Final
Summer 2019 Exam (Solutions) [Video]   Exam (Solutions)
Spring 2019 Exam (Solutions) Exam (Solutions) Exam (Solutions)
Fall 2018 Exam (Solutions)   Exam (Solutions)
Spring 2018 Exam (Solutions)   Exam (Solutions) [Video]
Fall 2017 Exam (Solutions) [Video]   Exam (Solutions)
Fall 2017 Practice Exam (Solutions)   Exam (Solutions)
Spring 2017     Exam (Solutions)

Other Resources

We will be posting all lecture materials on the course syllabus. In addition, they will also be listed in the following publicly visible Github Repo.

Here is a collection of resources that will help you learn more about various concepts and skills covered in the class. Learning by reading is a key part of being a well rounded data scientist. We will not assign mandatory reading but instead encourage you to look at these and other materials. If you find something helpful, post it on Piazza, and consider contributing it to the course website.

You can send us changes to the course website by forking and sending a pull request to the course website github repository. You will then become part of the history of Data 100 at Berkeley.

Local Setup

Click here to read our guide on how to set up our development environment locally (as an alternative to using DataHub).

Probability Practice

We’ve compiled a few practice probability problems that we believe may help in understanding the ideas covered in the course. They can be found here, along with their solutions.

Web References

As a data scientist you will often need to search for information on various libraries and tools. In this class we will be using several key python libraries. Here are their documentation pages:

  • The Bash Command Line:

    • Linux and Bash: Intro to Linux, Cloud Computing (which you can skip for the purposes of this class), and the Bash command line. You can skip all portions that don’t pertain to using the command line.
    • Bash Part 2: Part 2 of the intro to command line.
  • Python:
    • Python Tutorial: Teach yourself python. This is a pretty comprehensive tutorial.
    • Python + Numpy Tutorial this tutorial provides a great overview of a lot of the functionality we will be using in DS100.
    • Python 101: A notebook demonstrating a lot of python functionality with some (minimal explanation).
  • Plotting:

    • matplotlib.pyplot tutorial: This short tutorial provides an overview of the basic plotting utilities we will be using.
    • seaborn: The Seaborn library has some nice additional visualization functions that we may use occasionally.
  • Pandas:

    • The Pandas Cookbook: This provides a nice overview of some of the basic Pandas functions. However, it is slightly out of date.
    • Learn Pandas A set of lessons providing an overview of the Pandas library.
    • Python for Data Science Another set of notebook demonstrating Pandas functionality.

Books

Because data science is a relatively new and rapidly evolving discipline there is no single ideal textbook for this subject. Instead we plan to use reading from a collection of books all of which are free. However, we have listed a few optional books that will provide additional context for those who are interested.