Sankeerti Haniyur



Click on titles to view full write-ups or github repo

  • Created end-to-end prototype web application to detect and provide feedback on a user’s yoga poses for safe, convenient yoga practice

  • Built using Flask and hosted on an Amazon EC2 instance with GPU for deep learning

  • Model classified adjustments of a yoga pose from user’s video stream and achieved accuracy of 0.919

AI-Powered Yoga App

Python, Computer Vision, Flask, Bootstrap, AWS EC2, Deep Learning, GPU

  • Implemented Isolation Forest paper by constructing an ensemble of binary trees where each tree was created by splitting at random variables and values

  • Anomalies had a shorter path length on average than normal points and were more susceptible to isolation

  • Improved the algorithm by introducing max node weights to each tree to improve speed and handle noisy data

  • Used proprietary mobile app data given by Leanplum for in-class Kaggle competition to predict if users will make a purchase 7 days and 14 days in the future

  • Created custom recency, frequency, & monetization features to capture user behavior

  • Trained XGBoost machine learning model which achieved AUC score of 0.96 on test data set

Mobile App Purchases

Python, SparkSQL, Feature Engineering, Machine Learning, XGBoost

(No link due to data privacy concerns)

  • Processed data, trained, and evaluated a machine learning model on AWS EMR using Apache Spark

  • Analyzed the impact of EMR cluster size on performance & hosted our data in a MongoDB cluster

  • Trained RandomForest model which achieved an F1 score of 0.857 on test data set

Grocery Reorder Prediction

AWS EMR, MongoDB, Spark, Machine Learning, RandomForest

  • Predicted Kobe Bryant's shot success using machine learning models

  • Scraped additional feature data from NBA website and compared models via cross-validation results

  • Random Forest and Gradient Boosting models achieved log loss of 0.6103 and 0.6064 respectively

Shot Success Prediction

Python, Feature Engineering, Machine Learning, RandomForest, Gradient Boosting

Financial Data Analysis

Python, Data Analysis, Time Series, Correlation

  • Explored cross-correlation between time series of relative internet search term frequency and subsequent (lagged) stock losses of companies

  • Used APIs, made custom web data requests, calculated derived indicators, and conducted statistical analysis in Python

Forecasting CO2 Levels 

R, Time Series, Data Visualization

  • Predicted 2005 carbon dioxide levels from previous time-series data and compared results to true data

  • Removed trend and seasonal components in model and fit a linear regression to predict future outcomes

  • Demystified prediction results through graphs and provided clear explanation of the prediction process as if presenting to a real-life client

  • Identified anomalies, cleaned data, performed regressions, and processed text using pattern matching and regular expressions

Analysis of Vehicle Data

R, Data Cleaning, Regular Expressions