Early Warning System for Student Outcomes
End-to-end ML classification framework across 958 California public high schools, evaluating 7 models under class imbalance using PR-AUC. Random Forest (PR-AUC 0.775) deployed in a live Streamlit app. Manuscript accepted for publication in the 2025 USD Capstone Chronicles.
Workplace Health Policy Optimization
Cloud-based predictive analytics pipeline evaluating the ROI impact of workplace health policies on productivity and absenteeism using public CDC, BLS, and County Health Rankings datasets.
Washington Traffic Data Pipeline
End-to-end real-time data pipeline ingesting traffic, weather, and incident data from WSDOT REST APIs into a cloud-hosted MySQL database on Azure, with automated ETL scheduling and a live Tableau dashboard.
School Sentiment NLP
NLP-based sentiment and topic analysis comparing Reddit discussions of high- and low-performing school districts (Palo Alto vs. Oklahoma City) to surface community perception patterns.
Heart Disease Prediction (Multi-cohort)
Comparative predictive modeling of coronary heart disease risk using expanded clinical features across multiple international patient cohorts, evaluating model generalizability across populations.
jcds - Python Library for Reproducible EDA
Open-source Python library for reproducible EDA workflows. Features versioned releases, full pytest test suite, CI/CD via GitHub Actions, and published MkDocs documentation. Installable via pip.