"All the models are wrong, but some are useful"
-- George Box

About Me

Hi, this is Liz!

I'm a Data Scientist with previous experience in consulting and banking industry. I am passionate to wrestle with complex data and apply machine learning to tell stories and solve business problems.

I will soon graduate from my Master's of Science in Data Science (MSDS) at the University of San Francisco, where I have developed a strong programming and statistics skill set that can tackle business problems involving big data.

As a Data Scientist at Wiser Solution , my primary role is to help my client boost revenue and optimize pricing strategy with predictive modeling techniques and machine learning algorithms.

In the following part, I'd like to share some of my projects I completed so far that I found interesting.

Click "more" for details and source code on github.

Featured Projects

Spark ML Infrastructure Design

A data pipeline which automates Ford Gobike data extraction from AWS S3 to MongoDB and connected to Spark to model bike demand.
(ETL, data pipeline, AWS, S3, MongoDB, SparkSQL, SparkML, Pyspark, )

SF Food Truck Map - Bites

A complete web product that shows dynamic food truck location and business information in SF.

(AWS, ETL, Flask, MongoDB, Google Analytics)

Playerunkown's Battleground gaming strategy visualization

A fully interactive visualization website to demonstrate gamer strategy in PUBG.

(D3, Plotly, Python, HTML)


Machine Learning and Statistics

Spam Classification

An Predictive model to classify whether an email is a spam or not

(python, boosting trees, numpy, XGBClassifier)

Movie Recommendation

A collaborative filtering system to potential predict movie rating for a viewer.

(matrix factorization, stochastic gradient descent optimization, numpy)

Handwritten Digit recognition

A vanila version of Neural Network to classify digits from images. Trained on MNIST dataset.

(Pytorch, AWS, neural network tuning, )

Canadian Bankruptcy Rate Prediction With Time Series

Time Series forecast of Canadian bankruptcy rate with macroeconomic indicator.

(R, Holt-Winters, SARIMA, VARX)

Iowa House Price Prediction With Linear Regression

An regression analysis and business report of house price prediction in Iowa.
(R, OLS, Lasso, Ridge, Elastic Net)

US Domestic Flight Delay Prediction

An Random Forest model to predict a given flight's delay rate.
(Python, feature engineering, model interpretation)

Natural Language Processing

Movie Review Sentiment

A sentiment prediction models to summarize whether an IMDB movie review conveys positive or negative sentiment

(NLTK, Naive Bayes, Word Embedding)

Twitter Sentiment Analysis

An digested twitter list page with colored twitter feeds based on feeds' sentiment and average sentiment score

(Tweepy, vaderSentiment, Jinja, flask)

BBC Article Recommendation

An interactive website deployed to recommend other similar articles to your choice.

(word2vec, Standford GloVe, AWS, Python)

Stay Connected