US Stock Risk and Return Prediction

Applying Risk Factors to Machine Learning Models to Predict Medium-term Return Rate and Volatility

Stock market data have been heavily investigated to explore the trend of securities’ return and their risk. Factor models are the most canonical and widely used models for asset pricing and security selection for portfolios. In this project, we aim to utilize various factors and... [Read More]

An Achromatic Approach of Compressing CNN Filters by Clustering Pattern-Specific Receptive Fields

Abstract CNN Transfer learnings have been widely used in computer vision such as image classification or pattern detection. Such models are based on specific architectures that have solved similar problems, and pretrained weights are loaded for faster training as well as better performance. However, a... [Read More]

Mini Project----Human-generated and Machine-generated Language Classification

A Basic Sequence Classification Problem using LSTM

Sequence classification is a type of basic problem in natural language processing. This mini project illustrates the basic methods of conducting sequence classification using LSTM model. Such algorithm can be used to detect spam comments or reviews on the internet. [Read More]

Additive Manufacturing Melt Pool Physics Prediction Using Physical Simulation Data

A Brown Datathon First Prize Winning Project

Additive Manufacturing(AM), widely known as 3D printing, normally utilizes physical simulation processes based on numerical PDE and its thermal mathematical model. Sometimes,microstructure simulations, however, could be difficult to scale to macro level for part level prediction. In this project, we used machine learning algorithms, specifically... [Read More]

NBA Data Visualization and Virtual Match Simulation

Using Dash User Interface to Visualze Player/Team Performance based on 2019-2020 NBA Player Data

In this data science project, an entire data engineering pipeline was built from scratch: Obtaining raw data from the website, storing data inside the database (MongoDB), retrieving and processing data from the database and visualizing data based on users’ queries. Specifically, the data we are... [Read More]

Breast Cancer Cell Classification

Prediction Accuracy and Sensitivity Analysis on Different Models for Breast Cancer Wisconsin Dataset

Forecasting breast cancer can significantly increase the survival rate of patients, and classifying the sample tumor cells as malignant or benign is one of the best and most direct ways to make accurate predictions. Breast Cancer Wisconsin from UCI Machine Learning Repository was chosen as... [Read More]