Browser automation and web scraping

This scraper, built with Python and Playwright, automatically completes the Hims medical intake form for weight loss treatment and records which medication the website recommends based on your entries. It can run repeatedly with different patient information in order to gather data on what treatments Hims recommends for different types of people.

Github repository

Mapping and spatial analysis

This map shows crimes that occurred near selected subsidized housing developments in Toledo, Ohio. I used QGIS to create half-mile buffer zones around specific addresses and spatially join them with crime data from the Toledo Police Department, creating a new dataset that distinguishes between crimes that happened inside and outside the buffer zones. I then imported the map layers into Mapbox and styled them.

Full map

Extracting data from PDFs

I took a batch of messy PDF files containing accident investigation reports for offshore oil and gas facilities, parsed them with pdfplumber, and transformed the information into an easy-to-analyze database.

Github repository

Undocumented APIs

This project used undocumented APIs on taskrabbit.com and care.com to gather data about the hourly rates charged by different gig workers. I built programs that used custom inputs (e.g. the type of gig work or the city) to call the APIs, clean the data, and organize it into an easily useable database.

Github repository

Building and analyzing datasets

I extracted a demographically equal random sample from the PERSUADE 2.0 dataset, which is used to train AI systems to grade student writing, and ran the essays through Grammarly and Writable, two popular grading engines. Using Pandas and Vega-Altair, I analyzed the resulting grades and demonstrated that the two AI models often disagreed with human graders and each other, giving very different scores to the same essays,

Project page