Jun Clemente

Projects

A selection of applied data science, machine learning, and data engineering work. Most projects include code, notebooks, and documentation on GitHub.

Early Warning System for Student Outcomes

Predictive modeling using California public education data to identify at-risk students and support early intervention decisions.

  • Python
  • scikit-learn
  • XGBoost
  • Public Data
View Project →

Workplace Health Policy Optimization

Cloud-based predictive analytics to forecast absenteeism and evaluate the ROI of workplace health policies.

  • Python
  • XGBoost
  • AWS SageMaker
View Project →

Washington Traffic Data Pipeline

Data ingestion, cleaning, and feature engineering pipeline built on public transportation datasets.

  • Python
  • Data Pipelines
  • Public Data
View Project →

School Sentiment NLP

NLP-based sentiment and topic analysis comparing online discussions of high- and low-performing school districts.

  • Python
  • NLP
  • Topic Modeling
View Project →

Heart Disease Prediction (Multi-cohort)

Comparative modeling of coronary heart disease using expanded clinical features across multiple international patient cohorts.

  • Python
  • Modeling
  • Clinical Data
View Project →

jcds - Exploratory Data Analysis Toolkit

A reusable Python library for fast, structured exploratory data analysis with consistent reporting patterns.

  • Python
  • Exploratory Data Analysis
  • Python Library
View Project →