Jun Clemente

Projects

A selection of applied data science, machine learning, and data engineering work. Most projects include code, notebooks, and documentation on GitHub.

Early Warning System for Student Outcomes

End-to-end ML classification framework across 958 California public high schools, evaluating 7 models under class imbalance using PR-AUC. Random Forest (PR-AUC 0.775) deployed in a live Streamlit app. Manuscript accepted for publication in the 2025 USD Capstone Chronicles.

Workplace Health Policy Optimization

Cloud-based predictive analytics pipeline evaluating the ROI impact of workplace health policies on productivity and absenteeism using public CDC, BLS, and County Health Rankings datasets.

Washington Traffic Data Pipeline

End-to-end real-time data pipeline ingesting traffic, weather, and incident data from WSDOT REST APIs into a cloud-hosted MySQL database on Azure, with automated ETL scheduling and a live Tableau dashboard.

School Sentiment NLP

NLP-based sentiment and topic analysis comparing Reddit discussions of high- and low-performing school districts (Palo Alto vs. Oklahoma City) to surface community perception patterns.

Heart Disease Prediction (Multi-cohort)

Comparative predictive modeling of coronary heart disease risk using expanded clinical features across multiple international patient cohorts, evaluating model generalizability across populations.

jcds - Python Library for Reproducible EDA

Open-source Python library for reproducible EDA workflows. Features versioned releases, full pytest test suite, CI/CD via GitHub Actions, and published MkDocs documentation. Installable via pip.


Flask CRUD Web Application (AWS Deployment)

A full-stack web application built with Flask and PostgreSQL, featuring user authentication, role-based authorization, and RESTful APIs. Deployed to a Linux server on AWS Lightsail.

Interactive Neighborhood Map Application

A single-page JavaScript application using the Google Maps API and third-party data sources to deliver an interactive, map-based user experience.