Syllabus

Time: Spring 2026, Thursdays 3:30-5:10pm

Office Hours: Mondays 11:00am - 1:00pm, URBN 350D

Credits: 3

Modality: In Person, URBN 220

Instructor: Dr. Liming Wang (lmwang@pdx.edu)

Course Website: https://usp510.github.io/

This course introduces urban data science, an interdisciplinary approach to understanding, managing, and designing the city using data-driven theories and methods. Urban data science builds on the science and technologies of information processing, information systems, computer science, and statistics to develop applications to cities.

In this project-based class, students have an opportunity to develop applications that combine technical skills and domain knowledge and use information processing, analysis, and presentation to support problem solving in cities. It will introduce students to basic coding, data processing and analysis, visualization and mapping. Students will also learn to work effectively with large language models (LLMs) and AI agents as tools to accelerate data science workflows — from writing and debugging code to exploring data and generating visualizations.

There are no prerequisites, but it requires some tolerance for experimentation, self-directed trial and error, and an interest in learning to write computer code and work with AI tools.

Synopsis and Objectives

This course is designed to provide students with a toolkit of technical skills for quantitative problem solving. Through project-based hands-on learning, the course aims to achieve these objectives for students:

An introduction to the fundamentals of computer code to automate tasks;
Learning to use LLMs and AI agents as tools for coding, data analysis, and problem solving;
Familiarity with workflow and project management best practices working with data;
Developing skills of accessing, cleaning, visualizing, and analyzing urban data;
Learning to combine quantitative technical skills and domain knowledge to support problem solving in cities

Textbook and Readings

There is no specific textbook for the class. The course will draw on materials from a wide range of sources and will provide students with book excerpts, technical reports, and journal papers as appropriate to supplement lecture notes. The following textbooks are recommended as general references:

Yu and Barter, 2024, Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making, MIT Press. (Available online at https://vdsbook.com/)
Downey, 2024, Think Python, 3rd Edition. (Available online at https://allendowney.github.io/ThinkPython/)
Turrell, Python for Data Science. (Available online at https://aeturrell.github.io/python4DS/)
McKinney, 2022, Python for Data Analysis, 3rd Edition, O’Reilly. (Available online at https://wesmckinney.com/book/)
Rey, Arribas-Bel, and Wolf, 2023, Geographic Data Science with Python, CRC Press. (Available online at https://geographicdata.science/book/)

Grade

Component	USP 410	USP 510
DataCamp exercises (4 x 5pts)	20%	20%
Data science show & tell (2 x 5pts)	10%	10%
Assignments	30%	20%
Project presentation	10%	10%
Project report	30%	40%
Total	100%	100%

DataCamp exercises: Each DataCamp course is approximately 2-4 hours and you will have two weeks to complete each one.

Data science show & tell: Students will take turns sharing examples of good and bad data science projects/products at the beginning of each class (~5 minutes per presentation). Each student will present twice during the quarter. Sign up for your preferred weeks and submit your entry in the shared Google Doc following the template provided. Submissions are due by the end of Monday before your scheduled class.

Class project: The final product can be in the form of a project report, an infographic, or a dashboard, generated using Python and/or Quarto. Follow the best practices in creating infographics/dashboard & report. Submit your final product in appropriate (html/pdf/png) format and the accompanying Quarto document (& Python script if any). Your project presentation will be no more than 20 minutes in length with 5 minutes for Q&A.

USP 410 (undergraduate): Projects should demonstrate competency in the core skills covered in class — data cleaning, visualization, and presentation of findings using at least one dataset relevant to an urban issue.
USP 510 (graduate): Projects are expected to go beyond the core skills and demonstrate a higher level of analytical rigor. This includes integrating multiple data sources, applying spatial analysis or API-sourced data, and providing deeper interpretation of results that connects findings to relevant urban policy or planning contexts.

Assignments: Both assignments use Oregon Department of Transportation (ODOT) crash data. Use of AI tools such as coding agents (e.g., Claude Code, GitHub Copilot, Cursor) is encouraged and recommended — these assignments are an opportunity to practice the AI-assisted data science workflow introduced in class.

Assignment 1 — Exploring ODOT Crash Data: Use ODOT’s crash data to investigate questions of your choosing. Example questions include: Does the spring Daylight Saving Time change increase crashes? Do pedestrian fatalities exhibit a nighttime pattern similar to what was reported by the New York Times in 2023? You are free to explore other questions that interest you. Your submission should include a Quarto or Jupyter notebook with clear visualizations and a written narrative explaining your findings.
Assignment 2 — Interactive Crash Map: ODOT provides a basic fatal crash map viewer, but it is a general-purpose data browser. Your task is to create an interactive map (using folium, plotly, streamlit, or a similar library) that goes beyond data browsing — it should have a specific point of view, question, or audience. Example directions include: Which Portland corridors (e.g., 82nd Ave, Powell Blvd) are the most dangerous for pedestrians, and has that changed over time? Are crash hotspots concentrated in lower-income communities or communities of color? Where are the most dangerous intersections near schools, and what does a “safe routes” map look like for parents? How do crash patterns shift across the hours of the day — can you animate a 24-hour cycle to reveal when and where risk peaks? You are free to pursue other questions. Deploy or export your map as a standalone HTML file.

AI Policy

This course teaches you to work with AI tools as part of the data science workflow. However, different assignments have different goals, and the AI policy reflects that:

DataCamp exercises — AI tools are not permitted. These exercises build foundational skills. Using ChatGPT, Copilot, or other AI assistants to complete them undermines the learning process. You need to develop the mental models that make you an effective user of AI tools later.

Show & tell — AI may be used for research only. You may use AI tools to help discover examples of data science projects, but your written explanation and in-class discussion should reflect your own understanding and judgment.

Assignments — AI use is encouraged and recommended. You may (and are encouraged to) use AI coding agents and assistants to help with data processing, analysis, visualization, and writing code. Include a brief note describing which AI tools you used and for what tasks. You must understand and be able to explain any code or analysis in your submission.

Final project — AI use is encouraged, with disclosure. Using AI tools for your project mirrors real-world data science practice. The following requirements apply:

Include an AI Use Statement as an appendix to your final product. Describe which AI tools you used (e.g., ChatGPT, Claude, Copilot, Cursor), what tasks you used them for (e.g., writing code, debugging, generating visualizations, drafting text), and how you verified the outputs.
You must be able to explain every part of your project during your presentation and Q&A. If you cannot explain code or analysis that appears in your submission, it will not receive credit.
AI-generated analyses must be validated for correctness — consistent with the Veridical Data Science framework’s emphasis on truthful, reproducible results.

Topics and Schedule (Tentative)

Week	Date	Topic	Readings
W1	04/02	Slides · Overview, Computer Setup, Introduction to Python	Yu & Barter, Chapter 2: The Data Science Life Cycle; Downey, Chapter 1
W2	04/09	Slides · Learning and working with LLMs and AI agents	Karpathy, How I Use LLMs; Evkaya & de Carvalho, Using ChatGPT for Data Science Analyses (HDSR, 2026)
W3	04/16	All about data: Data import/export, cleaning & processing	Yu & Barter, Chapter 4: Data Preparation; McKinney, Chapters 6-8
W4	04/23	Workflow & project management	Yu & Barter, Chapter 3: Setting Up Your Data Science Project; Turrell, Workflow chapters
W5	04/30	Exploring and visualizing data	Yu & Barter, Chapter 5: Exploratory Data Analysis; Turrell, Visualize chapter
W6	05/07	Reproducible research/work; Quarto & Jupyter Notebooks	Turrell, Quarto for Python; jupyter
W7	05/14	Working with spatial data and maps	Rey et al., Chapters 1-4
W8	05/21	Accessing public data from the web and via APIs	censusdis documentation; Web Scraping with BeautifulSoup
W9	05/28	Developing infographics and dashboard	Ultimate Infographic Design Guide; Streamlit documentation
W10	06/04	Project workshop
W11	06/11	Project presentation

DataCamp Schedule

#	Course	Hours	Assigned	Due
DC1	Introduction to Python	4h	W1 (04/02)	W3 (04/16)
DC2	Data Manipulation with pandas	4h	W3 (04/16)	W5 (04/30)
DC3	Intro to Data Visualization with Seaborn	4h	W5 (04/30)	W7 (05/14)
DC4	Working with Geospatial Data in Python	4h	W7 (05/14)	W9 (05/28)

Assignment Schedule

#	Assignment	Assigned	Due
A1	Exploring ODOT Crash Data	W2 (04/09)	W4 (04/23)
A2	Interactive Crash Map	W6 (05/07)	W8 (05/21)

Project Milestones

Milestone	Due
Project idea	W3 (04/16)
Project proposal (1 page)	W6 (05/07)
Progress update	W8 (05/21)
Project presentation	W11 (06/11)
Final project submission	W11 06/12

Resources

DataCamp: Students will be able to take DataCamp courses free of charge courtesy of the DataCamp Classroom program.

Key Python Libraries

Purpose	Library	Notes
Data manipulation	`pandas`, `numpy`	Core data wrangling (replaces dplyr/tidyr)
Visualization	`matplotlib`, `seaborn`, `plotly`	seaborn for statistical plots; plotly for interactive
Reproducible documents	Quarto, Jupyter Notebooks	Quarto renders .qmd and .ipynb to HTML/PDF/Word
Web scraping	`requests`, `beautifulsoup4`	HTTP requests and HTML parsing
Census data	`censusdis`, `pygris`	Census API access and TIGER/Line shapefiles
Spatial analysis	`geopandas`, `folium`, `contextily`	Vector data, interactive maps, basemaps
Dashboards	`streamlit`	Pure Python dashboards with free cloud hosting