Web Scraping: NBA Salaries

Inspired in part by Python’s Beautiful Soup, the R package rvest makes it delightfully easy to scrape data from the web. As part of the Tidyverse collection of packages, rvest fits nicely within the broader data workflow: In this post I’ll walk through an example of using rvest to compile a dataset of NBA player … Continue reading Web Scraping: NBA Salaries

Complete Python Selenium Web Scraping Example

Introduction I recently listed a couple of items for sale on a Craigslist-like site called KSL Classifieds. It’s a rich marketplace to buy and sell almost anything. This is what a listing looks like: I instinctively started thinking about how to collect information about listings in this marketplace in a systematic way.  Why might this … Continue reading Complete Python Selenium Web Scraping Example

Scraping Stack Overflow Salaries with Python

I recently discovered a salary calculator on Stack Overflow. The tool takes inputs like role, location, and education and outputs salary predictions at the 25th, 50th, and 75th percentile. Based on the results of the annual developer survey, the calculator seems like an interesting way to study the marginal impact of expereince and education on … Continue reading Scraping Stack Overflow Salaries with Python

GitHub Actions for Data Analysts

Web scraping is a useful tool for data practitioners, to state the obvious. Often, scraping is most valuable when performed on a scheduled basis, to incorporate new or refreshed values into the dataset. In the past, I’ve paid a (small) monthly fee to PythonAnywhere to run scraping jobs. However, there’s a better, free alternative offered … Continue reading GitHub Actions for Data Analysts

Minivan Wars: Visualizing Prices in the Used Car Market

With the recent birth of our second child, it was time to face a harsh reality: the impending necessity of a minivan. After trying to cope by dreaming up a list of alternative “family” cars, the truth set in: with young kids, features like sliding doors, captain chairs, and amble storage space can’t be beat. … Continue reading Minivan Wars: Visualizing Prices in the Used Car Market

Lessons from the Tank: Analyzing 800+ Shark Tank Pitches

Even though it’s been around for years, I just recently discovered Shark Tank, the show where hopeful entrepreneurs pitch business ideas to a panel of wealthy investors, or “sharks”. I usually wonder if there’s a method to the deal-making madness, especially when a pitch that resonates with me falls flat on the sharks. In this … Continue reading Lessons from the Tank: Analyzing 800+ Shark Tank Pitches

Hip Hop’s 2023 Heavyweights With over 15 million listeners, Spotify’s RapCaviar has been called “the most influential playlist in music.” For the last year, I’ve saved a daily snapshot of the playlist using the Spotify API to empirically determine the biggest rappers in hip hop. Tags: Python, R, visualization GitHub Actions for Data Analysts Often, … Continue reading

The Rise of Rap: A Genre Popularity Analysis

Today it feels like rap is bigger and more mainstream than ever. A casual scan of the charts reveals that many of today’s biggest music icons are rappers. How long has it been this way? I remember a time when pop legends like Katy Perry, Lady Gaga, and Rihanna ruled the charts. Looking for more … Continue reading The Rise of Rap: A Genre Popularity Analysis

Using the Google Maps API to Visualize Chase’s Presence in Utah

I’ve been a happy Chase customer since 2010. I’ve appreciated the investment in their mobile platform and was excited about the recent You Invest announcement, allowing customers to trade 100 stocks and ETFs a year for free. With 5,100+ branches and 16,000 ATMs+ nationwide, Chase has a strong national footprint. In this post, I use Python … Continue reading Using the Google Maps API to Visualize Chase’s Presence in Utah

css.php