Tools: Python (Pandas, numpy, seaborn, matplotlib, folium, json, math, searchengine, sklearn), Tableau. Link to full project
Challenge:A car company has retooled to become an RV manufacturer, launching a new kind of camper under an original brand. They need descriptive insights and a predictive model for behavioral trends by those who already own and use RVs.
Cleaned and transformed dataset for all 2022 reservations from recreation.gov to retrieve information on 1.7 million RV users. Using Python libraries, calculated new variables like actual distance traveled, tested for stationarity and autocorrelation, and ran supervised and unsupervised machine learning (linear regression and k-means clustering) to describe distinct user subsets and overall usage trends. Used highly interactive visualizations to allow stakeholders levels of granularity, and thus shape the necessary subsetting and threshhold changes for model improvements.
Challenge:Determine when, and where to send medical staff within the US, in order to plan for the annual Influenza (flu) season. The project required integrating two different datasets, from the US Census and the CDC. Formulated and tested the research hypothesis, conducted exploratory data analysis on cleansed datasets, guided by the strong correlation between age and flu mortality. Created primarily spatial and temporal visualizations to communicate differing strategies for understanding the distribution of vulnerable US residents, and their varied health outcomes.
Challenge:A fictional bank wants to integrate a data mining algorithm into their operations to identify which customers are most likely to stop using their services. A maximum of four risk factors were to be identified and ranked, and the model presented for evaluation as part of CRISP-DM methodology. Cleaned dataset and addressed PI contained, conducted exploratory data analysis to identify meaningful differences between customers staying and leaving the bank. Ranked risk factors and designed a decision tree model for classification of customers, based on age, membership status, gender, and country of residence.
Challenge: Descriptive analysis to best posit fictional videogame company GameCo as they invest in the development and launch of a new game for the coming year. The company needs to understand the global landscape of game sales, particularly emerging markets where there is strategic opportunity to establish a new game ahead of competitors. Cleaned, sorted and filtered the dataset to answer key business questions. Used pivot tables and calculated fields to research sales figures and comparative genre popularity in three regions: Japan, North America, and Europe. Created visualizations to communicate the current and historic top region and genres, as well as illustrate exciting developments: rapid-growth regions, and a global consolidation of top genres that has partially supplanted more distinctive regional preferences
Tools: Python (Jupyter, pandas, numby, matplotlib, seaborn.)Link to full project
Challenge: Instacart wants to use insights from initial data and exploratory analysis to develop better segmentation of customers, and the marketing strategies to best reach them. The goal is to increase sales with greater understanding of any meaningfully distinct customer behaviors, such as late-night ordering, or regional dietary tendencies. Cleaned, transformed and integrated 5 datasets of department, product, customer and order information to engage with a dataframe in excess of 30 million rows, using Python libraries. Derived new variables and conceptualized numerous demographic subsets, testing with statistical analysis. Used visualizations and reporting narrative to answer key business questions about customer behavior and demographics.
Challenge: "Rockbuster," a fictional movie rental company, needs insights to inform launch strategy for their new, online streaming service. Descriptive SQL analysis to answer specific business questions, formatted as a concise presentation covering the performance of their video library and customer base for existent DVD-based rental service. Set up a relational database environment using PostgreSQL, extracted an ERD for analysis to create a data dictionary and identify keys. Conducted CRUD operations and filtering and aggregating data. Cleaned, and summarized data statistics for EDA. Used the joining of tables, subqueries and common table expressions to offer highly nuanced understanding of global customer distribution and spending by country and city, as well as identifying the top 5 lifetime value customers across the planet. Created spatial and multivariate statistical visualizations.