CLASS INFORMATION

Monday and Wednesday, 2:15 - 3:35, HB Crouse Kittredge

INSTRUCTOR

Prof. Simon Weschle

Email: swweschl@syr.edu, Phone: 315-443-8678

Office Hours: Wednesday, 12:00 - 1:30, Zoom (see Syllabus for details)

COURSE DESCRIPTION

Data and data analysis are increasingly important for political science research, but also in the public discourse and the workplace. In this class, you will learn how to conduct data analysis yourself. We'll cover topics such as finding data, data cleaning and data manipulation, data visualization, and data analysis. Along the way, we'll learn basic statistical functions and plots in the powerful (and free) statistical program R. Throughout, the class takes an applied approach, so students will develop their own research project and conduct their own data analyses.

TEXTBOOK

Kosuke Imai (2017): Quantitative Social Science: An Introduction. Princeton University Press.

I will refer to the book as QSS. You can rent/buy the book from Amazon, Princeton University Press, or your favorite book retailer as hardcover, paperback, or e-book. The book should also shortly be available to check out for a short period of time from the Library Course Reserves. I will post copies of the assigned chapters for the first week or two to give everyone a chance to get access to the book.

ASSIGNMENTS AND GRADING

- Class Participation (15%): To succeed in this course, you have to attend class on a regular basis, come prepared by having worked through the assigned reading, and actively participate and ask questions.
- Class Programming Review Exercises (10%): There will be short weekly review exercises that cover the basic R material we learned. Each exercise is graded as pass/fail, where a pass is worth 1 point and a fail worth 0.
- Problem Sets (30%): There will be 5 to 6 problem sets in which you are asked to use what you have learned in class to analyze different kinds of data. The answers to these problem sets should be typed. They are graded on a scale from 1 to 5, and late submissions will be penalized by 1 point for every 24 hours past the due date. Any extension requests must be made to me personally and as soon as possible.
- Data Analysis Memos (15%): Your main task in this class will be to write a paper with your own data analysis on a question that is of interest to you. To help you along the way, you will submit reports about the individual steps throughout the semester. The memos will cover: your research question and potential confounders, your data, data cleaning, descriptive statistics, bivariate relations, (first) regression results. The memos should be short (2-3 pages) and typed in their entirety. They are graded on a scale from 1 to 5, and late submissions will be penalized by 1 point for every 24 hours past the due date. Any extension requests must be made to me personally and as soon as possible. I will provide feedback to every memo to help you improve your final paper.
- Data Analysis Paper (30%): Your final paper should set out your research question, explain the data and statistical methods you use to investigate it, and describe what, based on your data analysis, the answer is. There is no minimum or maximum paper length. It should be as long as needed, but as short as possible. The papers are due at the beginning of the finals period (May 18).

SYLLABUS

For more detailed information on class policies and all of the fine print, please see the Syllabus.

CLASS TOPICS

Below is a list of topics that the class will cover. The exact week-to-week schedule will be developed and updated throughout the semester to reflect student interest and the pace at which we are progressing.

- Getting Started with R
- Causality, Single Variables
- Finding and Cleaning Data
- Bivariate Relationships
- Multiple Regression
- Prediction, Spatial Data, Network Data, Text as Data (We will choose some of those topics based on student interest)
- Website Scraping (Guest Lecturer: Sebastian Karcher)
- Data Analysis Paper Workshop

CLASS SCHEDULE

Below is a continuously updated class schedule. It contains information on what topics we are covering as well as on the readings and assignments. Please check this site EVERY WEEK.

Week 1: Getting Started with R

- Monday (2/8): Try to install R and RStudio before the first class using The Not So Scary Guide to R: Get Started
- Wednesday (2/10): QSS Ch. 1.3 (Blackboard)
- Data: UNpop.csv, UNpop.RData, turnout.csv
- Slides: Class 1, Class 2
- Code: Tutorial Setup, Class 1, Class 2

Week 2: Causality and Single Variables

- Monday (2/15): Work again through QSS Ch. 1.3
- Wednesday (2/17): QSS Ch. 2.1-2.2 (Blackboard)
- Data: resume.csv
- Slides: Class 3, Class 4
- Code: Class 3, Class 4
- Review Exercise 0 (due 2/22, but try to do by 2/17, submit on Blackboard): learnr::run_tutorial("00-intro", package = "qsslearnr").
- Problem Set 1 (due 2/26, submit on Blackboard). You'll need Kenya.csv, Sweden.csv, and World.csv.

Week 3: Causality and Single Variables

- Monday (2/22): QSS Ch. 2.3-2.5 (Blackboard)
- Wednesday (2/24): QSS Ch. 2.6, 3.1-3.3 (Blackboard)
- Data: minwage.csv
- Slides: Class 5, Class 6
- Code: Class 5, Class 6
- Review Exercise 1 (due 2/26, submit on Blackboard): learnr::run_tutorial("01-causality1", package = "qsslearnr").
- Review Exercise 2 (due 3/1, submit on Blackboard): learnr::run_tutorial("02-causality2", package = "qsslearnr").
- Data Analysis Memo 1 (due 3/5, submit on Blackboard).

Week 4: Finding and Cleaning Data

- Monday (3/1): Have a look at NYUAD's guide to Political Science datasets here, here, here, and here; as well as at Erik Gahner Larsen's Dataset of Political Datasets
- Wednesday (3/3): Weinberg, Harel, and Abramowitz, Ch. 4 (Blackboard)
- Data: qog.csv
- Slides: Class 7, Class 8
- Code: Class 7, Class 8
- Review Exercise 3 (due 3/3, submit on Blackboard): learnr::run_tutorial("03-measurement1", package = "qsslearnr").
- Problem Set 2 (due 3/12, submit on Blackboard). You'll need leaders.csv.

Week 5: Bivariate Relationships

- Monday (3/8): QSS, Ch. 3.5-3.6
- Wednesday (3/10): Weinberg, Harel, and Abramowitz, Excerpts of Ch. 5 (Blackboard)
- Data: congress.csv, polarization_gini.csv, unvoting.csv
- Slides: Class 9, Class 10
- Code: Class 9, Class 10
- Review Exercise 4 (due 3/15, submit on Blackboard): learnr::run_tutorial("04-measurement2", package = "qsslearnr").
- Data Analysis Memo 2 (due 3/19, submit on Blackboard).

Week 6: Bivariate Relationships

- Monday (3/15): Review after class: QSS, p. 139-145, 149-155
- Wednesday (3/17): Weinberg, Harel, and Abramowitz, Excerpts of Ch. 6 (Blackboard)
- Data: face.csv, pres08.csv, pres12.csv, women.csv
- Slides: Class 11, Class 12
- Code: Class 11, Class 12, p-value Simulation
- Review Exercise 5 (due 3/22, submit on Blackboard): learnr::run_tutorial("05-prediction1", package = "qsslearnr"). Important: Skip "Conceptual Questions" and "Coding loops", and go straight to "Linear Regression"
- Problem Set 3 (due 3/26, submit on Blackboard). You'll need intrade08.csv, pres08.csv, and polls08.csv.

Week 7: Multiple Regression

- Monday (3/22): QSS Ch. 4.3.1 and 4.3.2
- Wednesday (3/24): QSS Ch. 4.3.3
- Data: immig.csv, qogdata_reduced.csv (Codebook)
- Slides: Class 13, Class 14
- Code: Class 13, Class 14
- Data Analysis Memo 3 (due 4/2, submit on Blackboard).

Week 8: Multiple Regression

- Monday (3/29): read QSS Ch. 4.3.3 again
- Wednesday (3/31): The Beginner's Guide to Logistic Regression
- Data: social.csv, cces19.csv
- Slides: Class 15, Class 16
- Code: Class 15, Class 16
- Review Exercise 6 (due 3/31, submit on Blackboard): learnr::run_tutorial("06-prediction2", package = "qsslearnr"). Correct answer for the first conceptual question is "by predicting counterfactual outcomes".
- Problem Set 4 (due 4/9, submit on Blackboard). You'll need electric-company.csv.

Week 9: Extensions to Multiple Regression, Prediction

- Monday (4/5): No reading. Instead, carefully review the code from the last two weeks.
- Wednesday (4/7): QSS Ch. 4.1.
- Data: cces19_week9.csv, polls08_week9.csv, pres08_week9.csv
- Slides: Class 17, Class 18
- Code: Class 17, Class 18
- Data Analysis Memo 4 (due 4/16, submit on Blackboard).

Week 10: Prediction, Spatial Data

- Monday (4/12): QSS Ch. 4.1.
- Wednesday (4/14): QSS Ch. 5.3.1-5.3.4.
- Data: pollsUS08.csv
- Slides: Class 19, Class 20
- Code: Class 19, Class 20
- Review Exercise 7 (due 4/16, submit on Blackboard): learnr::run_tutorial("05-prediction1", package = "qsslearnr"). Important: Only do the "Coding loops" part.
- Problem Set 5 (due 4/23, submit on Blackboard). You'll need intrade08.csv, pres08.csv, and polls08_ps5.csv.

Week 11: Spatial Data

- Monday (4/19): QSS Ch. 5.3.1-5.3.4.
- Wednesday (4/21): No class (Wellness Day).
- Data: walmart.csv
- Slides: Class 21
- Code: Class 21
- Data Analysis Memo 5 (due 4/30, submit on Blackboard).

Week 12: Text as Data

- Monday (4/26): QSS Ch. 5.1.1-5.1.4.
- Wednesday (4/28): QSS Ch. 5.1.1-5.1.4.
- Data: federalist.zip
- Slides: Class 22, Class 23
- Code: Class 22, Class 23
- Problem Set 6 (due 5/7, submit on Blackboard). You'll need elections.csv, dtm.Rdata, and papers.csv.

Week 13: Webscraping

- Guest Lecturer: Sebastian Karcher
- Step-by-Step, Code

Week 14: Review and Final Paper Workshop

- Monday (5/10): Be prepared to talk about your project for 2-3 minutes: What question are you answering, what data do you use, how do you analyze it, and what have you found so far?
- Slides: Class 24