Basic Course Info
- Instructor: Samuel S. Watson
- Lecture Time: MWF from 13:00 - 13:50 in Friedman 201
- Join the Prismia course.
- For asynchronous students, course Zoom Link
Course Documentation
Course TA Staff
- HTA: Meera Kurup
- TA: Lukas Kania
- TA: Bangxi Xiao
- TA: Yue Wang
- TA: Ian Acosta
- TA: Zichang Gao
- TA: Shashidhar Pai
- TA: Hang Zhou
- TA: Rugved Mavidipalli
Course Overview
This course provides an introduction to computer science and programming for data science. Students will be able to…
- Import and manipulate data in a variety of formats
- Discuss how data is managed within organizations
- Describe how computers work at a basic level and reason about the implications of these hardware details for how we build software
- Take advantage of productivity-enhancing features of development environments (VS Code and Jupyter)
- Perform basic operations using the command line (Bash)
- Version control their software (Git)
- Solve programming exercises (Python)
- Create data visualizations using dashboarding software (Superset)
- Describe the relational data model and devise SQL schema appropriate to a given use case
- Set up a SQL database and write SQL queries to perform basic data manipulation tasks (PostgresQL with Supabase)
- Discuss the advantages and disadvantages of noSQL databases, set up and use a noSQL database (MongoDB).
- Solve exercises on data structures and algorithms (including abstract data types, asymptotic notation, sorting and binary search, graph algorithms, and database algorithms).
- Describe the paradigmatic use cases for graph databases (neo4j) and streaming databases (Kafka), and perform basic tasks using those databases.
- Build systems which can perform computations in parallel across multiple nodes (PySpark)
- Get data from the web via scraping or interacting with REST APIs.
- Deploy a dashboard-style website which draws from a data source and updates live.