Hip Hop’s 2023 Heavyweights
With over 15 million listeners, Spotify’s RapCaviar has been called “the most influential playlist in music.” For the last year, I’ve saved a daily snapshot of the playlist using the Spotify API to empirically determine the biggest rappers in hip hop.
Tags: Python, R, visualization
GitHub Actions for Data Analysts
Often, web scraping is most valuable when performed on a scheduled basis, to incorporate new or refreshed values into the dataset. This post walks through the implementation of a simple GitHub action, which scrapes the headline mortgage rates posted on Freddie Mac’s home page daily.
Tags: R, web scraping
Meetinghouses: A Proxy for Growth in the LDS Church
When the LDS church was organized in a small town in New York in 1830, there were only six members. Today, the Mormon Church has grown to over 16 million members, with congregations in 160 countries. This project is an attempt to more precisely measure church growth by tracking changes in the number and distribution of meetinghouses and wards over time.
Tags: R, web scraping
Minivan Wars: Visualizing Prices in the Used Car Market
With the recent birth of our second child, it was time to face a harsh reality: the impending necessity of a minivan. Looking to get acquainted with prices in the used minivan market, I scraped 20 years’ worth of monthly average price data from CarGurus for five minivan models.
Tags: R, web scraping, visualization
Exploring the Marvel Cinematic Universe in Tableau
While re-watching parts of the MCU franchise, I compiled a dataset measuring things like budget, box office sales, and Rotten Tomatoes rating for the 23 movies. Using this data, I created an interactive visual in Tableau.
Tags: Tableau
How to send yourself a notification when your code is done running
I have a couple of Python scripts scheduled to run daily. Sometimes, the jobs fail and the code doesn’t run. Since I’d like to know when that happens, I add a few lines of code to send myself a text when something goes wrong.
Tags: Python
5 simple ways to improve your digital security
When it comes to online security and privacy, it doesn’t hurt to be a bit paranoid. Even if you follow best practices, it’s possible that loved ones are not doing the same. Here are five actionable steps to help family members get serious about their digital footprint.
Tags: security, privacy
How to scrape IMDb and analyze your favorite TV shows like a true nerd
Curious to see how well-liked some of my favorite shows were in their time, I scrapped 818 episode ratings and descriptions from IMDb.com to plot popularity across seasons and loosely quantify character importance.
Tags: R, web scraping, visualization
The Rise of Rap: A Genre Popularity Analysis
Today it feels like rap is bigger and more mainstream than ever. Looking for more than anecdotal evidence of the rise of rap as a genre in the mainstream music landscape, I developed a data-driven methodology to measure the high-level trend in music genre popularity over time.
Tags: Python, R, API, web scraping
Building a Scripture Search Tool with R Shiny
Looking for a way to quickly understand what the scriptures say on a given topic, I developed a simple Shiny app using R as a scripture study tool.
Tags: R, Shiny
Web Scraping: NBA Salaries
ESPN publishes NBA player salary data going back 20 years. Rather than manually copying this data, which is spread across hundreds of web pages, we can write a script to compile it automatically.
Tags: R, web scraping
Which programming language should I learn first?
Aspiring programmers and data scientists often ask, “Which programming language should I learn first?” In this post, we’ll take a data-driven approach using question tag data from Stack Overflow to prove that Python is probably the best place to start.
Tags: SQL, R, visualization
How to build a simple mobile app in 30 minutes or less
Combining the power of Google Sheets, Glide, and RapidAPI, I built a simple mobile app to help level up my vocabulary-building efforts and prepare for the GRE. The explosion of no-code/low-code platforms (i.e. Glide) means anyone can be a builder.
Tags: App, API
Trends in Vault Banking Rankings
Vault.com surveys professionals to rank the top employers in industries like law, consulting, and banking. In this post, I visualize how the Vault Top 50 Banking rankings have changed over time, from 2011 to 2020.
Tags: R, visualization
Visualizing Rap Communities with Python & Spotify’s API
Finding new music you like can be tough. In my experience,there’s no single discovery mechanism that delivers consistently. In this post, I leverage Spotify’s “similar artists” API to build interactive network charts, visualizing how artists are linked together, as measured by the similarity of their fans.
Tags: Python, API, visualization
Building a Birthday Text Bot using Twilio
A good way to show family and friends you care is remembering their birthday. Thankfully, you can automate that! While outsourcing birthday check-in duties does feel a bit impersonal, this post is a tutorial for building a birthday text bot using Twilio.
Tags: Python, API, automation
Building a Lyrics Profanity Analyzer with R Shiny
A few weeks ago, a family member asked me to make them a Spotify playlist with recent rap hits. To avoid including anything excessively profane, I’d pull up the song lyrics on genius.com and manually search for potentially offensive words. Looking to streamline this process, I built a simple tool that quickly measures profanity in any song.
Tags: R, Shiny, web scraping
Studying Trends in World Religion using R
Using a data set from the Pew Research Center, this post is about unpacking trends in world religion, a good exercise in brainstorming ways to slice a seemingly simple data set in pursuit of insights.
Tags: R, visualization
A Simple Crypto Alert Bot in Python
Considering the volatility of the cryptocurrency markets, can a crypto enthusiast be smart about when to buy, in pursuit of a “bargain”, and do so in a systematic, automated fashion? This post outlines the process of building a crypto alert system using Python, which sends a push notification via Slack when a cryptocurrency (BTC, XRP, ETC) appears “cheap” relative to historical prices.
Tags: Python, API, cryptocurrency, bot
Measuring Commute Times with IFTTT
As a recent transplant to New York City, I do my fair share about complaining about the daily commute in and out of the city, which consumes at least an hour of time each way. Looking to add some quantitative weight to my whining, I automatically logged when I left home and work each day using IFTTT and Google Sheets. After collecting 2 months of data, I visualized the distribution of commute times to and from work.
Tags: R, IFTTT, visualization, NYC
Analyzing 800+ Shark Tank Pitches
Even though it’s been around for years, I just recently discovered Shark Tank. I usually wonder if there’s a method to the deal-making madness, especially when a pitch that resonates with me falls flat on the sharks. In this post I explore what kind of pitches have the highest changed of being offered a deal.
Tags: R, NLP, API, web scraping
Jamie’s Shareholder Letters: Text Analysis
With the recent release of Warren Buffet’s much anticipated annual shareholder letter, I decided to show J.P. Morgan Chase chief Jamie Dimon some love by performing a text analysis on a sample of his annual shareholder letters. In this post I’ll analyze Jamie’s thoughts on the firm, the economy, and politics using tidytext principles, including sentiment, term frequency-inverse document frequency, and bigram network visualization.
Tags: R, NLP, visualization
Choosing the Right Hospital: Analysis in R
With our baby’s due date quickly approaching, my wife and I needed to find a hospital for delivery. Hoping to contribute something meaningful to the decision, I found data published by the state of New York with hospital-level labor and delivery metrics. By visualizing measures like the percentage of cesarean delivers, I narrowed the list of hospitals without our county to choose from. I guess data-driven decision making can help new data navigate parenthood too?
Tags: R, visualization
Visualizing NYC Housing Trends
The NYC housing market is wild. Where else in the US can you pay so much and get so little in return? In this post I’ll use the gganimate package in R to visualize the ebb and flow of rental housing availability in NYC, using publicly available data from StreetEasy, “NYC’s leading real estate marketplace”. If the law of supply and demand holds, this should inform ideal times for apartment hunting.
Tags: R, Visualization, gganimate, NYC
Mapping Scarsdale Real Estate Data
This year my wife and I moved to New York for the start of a new job. Initially overwhelmed by the scope and pace of the NYC housing market, we were given the very generous and unexpected opportunity by a family friend to live in a house north of the city in Westchester County. Built in the early 1930s, the historic home is situated in central Scarsdale, an affluent suburban town known for high-achieving schools and extravagant real estate. Using Python, the Google Maps geocoding API, and rich property data made available by the Village of Scarsdale, this post contains visualizations of real estate metrics like year built, assessed value, and sales date.
Tags: Python, API, geocoding, visualization
Scraping Stack Overflow Salaries
The Stack Overflow salary calculator takes inputs like role, location, and education and outputs salary predictions at the 25th, 50th, and 75th percentile. To extract Data Scientist salary data (or extrapolated data) from the tool, I wrote a Python script using Selenium to loop through 350+ different combinations of location, education and experience.
Tags: Python, Selenium, web scraping
Visualizing Pocket Articles with R
Every day I see dozens of things online I don’t have time to read or view in the moment. With Pocket I save news articles, blog posts, talks, or tutorials for later viewing. Over the last 2 years I’ve saved just shy of 2,000 links, encompassing a variety of content. n this post I extract insights from these links in R, using link domain and topic frequency to assess my interests.
Tags: R, text mining, visualization
Extracting Transactions from the Venmo API
Public by default, your Venmo transactions are surprisingly accessible to anyone with an internet connection. It’s public API provides a real-time snapshot view of transactions processed through the system, including usernames and payment subjects. In this post, I how to pull data from the API in an effort to expose the kind of information being openly shared.
Tags: R, API
Visualizing Baby Name Popularity Trends
From the earliest days of our marriage, my wife and I talked about baby names. Your name is a core part of your identity, so choosing the right name for your child feels like a weighty affair. Like always, I turned to data to assist with the decision process. Using a dataset provided by the Social Security Administration, I created functions with R to visualize and compare the popularity of names over time.
Tags: R, functions, visualization
Web Photo Archiving with R
My wife and two of her sisters ran cross-country and track in high school. I recently learned that their team website, which hosts thousands of event photos from the past 10 years, is being shut down. Wanting to save my mother-in-law from the unimaginably tedious task of manually downloading each image, I wrote a script in R to automate the process.
Tags: R, web scraping, automation
Visualizing Chase’s Presence in Utah
In this post, I use Python to create a Chase branch coverage map for my home state of Utah, scrapping branch and ATM information from Chase.com and obtaining geographic coordinates using the Google Maps geocoding API.
Tags: Python, API, web scraping, geocoding
Analyzing Drake’s Catalog (Spotify’s API)
I’ve been a Drake fan since 2009 when I first heard “Best I Ever Had” from So Far Gone. Over the last decade, I’ve watched Drake transform into a global rap and pop superstar. This weekend I saw Drake live in Brooklyn as part of the Aubrey & the Three Migos tour. What better way to celebrate than by analyzing his catalog using Spotify’s API? I look at things like who Drake likes to collaborate with and if his songs have become more “danceable” over time.
Tags: Python, API, Tableau, visualization
Insights via Google Sheets Query
The Google Sheets query function brings some of the power of SQL to spreadsheets. In this post, I’ll walk through three examples of the query function to explore a CrunchBase dataset of startup companies. The CrunchBase dataset contains information about 49,000+ startups including the startup name, website, market, status, funding, and location. We’ll explore the number of startups by state, the number of California and New York startups over time, as well as total funding by market.
Tags: Google Sheets, SQL, visualization
The Hunt for Housing in NYC
This summer my wife and I relocated to New York City in preparation for the start of my new job. Housing in Manhattan and the surrounding boroughs is notoriously expensive, so I decided to pursue a data-driven approach to our apartment search. I wrote a Python script to scrape 9,000+ apartment listings on Craigslist for zip codes in the five boroughs: Manhattan, Bronx, Brooklyn, Queens, and Staten Island. I then visualized the median rent by zip code in Tableau.
Tags: Python, web scraping, visualization
Complete Selenium Web Scraping Example
I recently listed a couple of items for sale on a Craigslist-like site called KSL Classifieds. It’s a rich marketplace to buy and sell almost anything. I instinctively started thinking about how to collect information about listings in this marketplace in a systematic way. In this post I provide a step-by-step walk through on how to leverage the Selenium package in Python to do web scraping.
Tags: Python, web scraping
Interactive Investment Tool with R Shiny
R Shiny is a fantastic framework to quickly develop and launch interactive data applications. I recently wrote some investing advice and was looking for a way to illustrate two case studies. Building on an RStudio template, I created a tool to visualize the return of an investment over time, allowing the user to modify each parameter and observe its effect:
Analyzing iPhone Usage Data in R
I’m constantly thinking about how to capture and analyze data from day-to-day life. Moment is an iPhone app that tracks screen time and phone pickups. Under the advanced settings, the app offers data export (via JSON file) for nerds like me. In this post I’ll perform a basic basic analysis of my usage data using R.
Tags: R, visualization
Credit Card Advice
David Robinson, Chief Data Scientist at DataCamp, once tweeted: “When you’ve written the same code 3 times, write a function. When you’ve given the same advice 3 times, write a blog post.” I’ve recently given advice to a few family members about selecting a credit card so, in the spirit of David’s tweet, I’ve compiled some tips and information about credit cards.
Fundamentals of Investing
The following phrase encapsulates my investing philosophy and serves as a reminder of four fundamentals of investing: I will pursue low-cost, tax-efficient investments that form a diversified, long-term portfolio.
6 Tips for Economics Majors at BYU
With graduation on the horizon, it’s natural to wax reflective. I’ve spent some time compiling some unsolicited advice for new Economics majors at Brigham Young University. These are tidbits of knowledge and insight I’ve gained slowly by trial-and-error and experience over the past four years, compactly compiled for your educational enhancement.
5 Essential iPhone Apps
I’m always looking for apps that enhance my life and productivity. A family member recently purchased a new iPhone and asked for my top app recommendations. Here are my top five: LastPass, Pocket, Moment, Mint, and Google Keep.
Basic Interview Tips
There are few things more nerve-racking than an interview. Here I’ve compiled some helpful tips to help set yourself apart.