Hip Hop’s 2023 Heavyweights

With over 15 million listeners, Spotify’s RapCaviar has been called “the most influential playlist in music.” RapCaviar is curated by Spotify’s editorial team and updated daily to represent the latest and greatest hip-hop and rap tracks.

For the last year, I’ve saved a daily snapshot of the playlist using the Spotify API to empirically determine the biggest rappers in hip hop today. In this post, we’ll use hard data to approximate influence, hustle, and longevity for rap’s biggest names during 2023.

Methodology

To collect the data, I scheduled a Python script to run daily to (1) hit Spotify’s API to collect the RapCaviar track list and (2) save the resulting data frame as a .csv file to an S3 bucket. After pulling down and combining all daily files from S3 using an R script, the tidied dataset contains 11 fields:

Field NameSample Value
Playlist Id37i9dQZF1DX0XUsuxWHRQd
Playlist NameRapCaviar
Track Playlist Position2
Track NameMad Max
Track Id2i2qDe3dnTl6maUE31FO7c
Track Release Date2022-12-16
Track Added At2022-12-30
Artist Track Position1
Artist NameLil Durk
Artist Id3hcs9uc56yIGFCSy9leWe7
Date2023-01-02
The value for “artist track position” helps distinguish owners from features. For example, both Lil Durk and Future participate in the track Mad Max. Since it’s Lil Durk’s song and Future is the feature, a two row exists in the dataset, with values of “artist track position” set to 1 (Lil Durk) and 2 (Future).

After the cleaning and duplication process, the dataset contains 469 total tracks with 271 distinct artists represented across 351 distinct playlist snapshots between January 1 to December 27, 2023.

Metrics

Influence

Let’s start with influence: what percent of available days was a given artist represented on the playlist? For example, if an artist appeared in 50 of the 351 possible daily snapshots, their “influence” score would be 14.2%.

Here are the top ten rappers ranked by this influence metric, for 2023:

NameDays RepresentedPercent of Available
Drake351100%
Future351100%
Gucci Mane351100%
Travis Scott351100%
21 Savage34498%
Kodak Black33094%
Yeat32893%
Latto31189%
Lil Uzi Vert30988%
Quavo30286%

Impressively, four artists yielded sufficient influence to maintain a presence on the playlist every day of the year: Drake, Future, Gucci Mane, and Travis Scott. Here’s a visual representation of their dominant year:

Each colored line represents a unique track. With the y-axis reversed, the chart shows how new tracks enter the playlist positioned near the top and then descend over time. The biggest surprise to me is Gucci Mane, who managed to maintain his presence on the playlist via 14 distinct tracks released throughout the year:

The hustle shown here reminds me of my favorite Lil Wayne clip of all time.

Notably, 21 Savage was only a week short of full coverage, coming in at 98%.

Looking at the distribution of influence scores for all artists appearing at least once during the year, 38 (14%) were present in the RapCaviar playlist more than half of the year:

Density

It’s one thing for an artist to have one of their tracks represented on RapCaviar, but the heavyweights often have several at once. “Density” is calculated as a distinct count of tracks by artist and day.

The highest density score for 2023 was 6, a score achieved by just four rappers:

DensityArtist | Dates
6 tracks21 Savage | Jun 23 – Jul 13 (21 days)
6 tracksLil Wayne | Nov 10 – 16 (7 days)
6 tracksDrake | Oct 7 – 12 (6 days)
6 tracksTravis Scott | Aug 3 (1 day)

Most impressive is 21 Savage’s dominant 21-day, 6-track run over the summer, preceded by a 20-day, 5-track run. Notably, during the 6-track spree, all six were features or joint tracks:

  1. Pull Up (feat. 21 Savage)
  2. Wit Da Racks (feat. 21 Savage, Travis Scott & Yak Gotti)
  3. Peaches & Eggplants (feat. 21 Savage)
  4. 06 Gucci (feat. DaBaby & 21 Savage)
  5. War Bout It (feat. 21 Savage)
  6. Spin Bout U (with Drake)

Contributing to more than 10% of the playlist’s track count simultaneously is truly impressive (RapCaviar usually has 50 tracks total); rap’s heavyweights are dense.

Longevity

Finally, let’s consider longevity, meaning how long an artist’s tracks remains on the playlist. Here are the top ten songs by lifespan on the RapCaviar track list during ’23:

TrackArtistDaysFirst DayLast Day
f*kumeanGunna179Jun 19Dec 14
Turn Yo Clic UpQuavo167Jul 14Dec 27
Search & RescueDrake161Apr 7Sep 15
500lbsLil Tecca159Jul 21Dec 27
I KNOW ?Travis Scott153Jul 28Dec 27
Paint The Town RedDoja Cat146Aug 4Dec 27
MELTDOWNTravis Scott143Aug 1Dec 21
Private LandingDon Toliver136Feb 24Jul 13
SuperheroMetro Boomin135Jan 2May 25
All My LifeLil Durk133May 12Sep 21

Importantly, four of the tracks in the top ten are still active (italicized above), so there’s a decent chance Turn Yo Clic Up could outlive f*kumean. Speaking of which, Gunna’s first top-ten solo single managed to spend almost six months on RapCaviar, complete with a position surge in mid-August:

Zooming in, here’s the position history for all of those top ten tracks:

Most of the time, a track will debut on the playlist and then fade out over time, sinking deeper in the set list before falling off. Good examples are All My Life, Private Landing, and Search & Rescue. Hits like 500lbs and Paint The Town Red are more anomalous, with momentum building within the playlist over time.

To close this metric out, let’s look at the top ten rappers with the highest average longevity per track, for those artists with three or more distinct tracks ever appearing on the playlist during the year:

NameMedian LongevityAverage LongevityTrack Count
Gunna1161034
Metro Boomin92966
Ice Spice96724
Latto75744
Moneybagg Yo75696
Lil Uzi Vert68648
Don Toliver41645
Toosii79633
Key Glock61624
Sexyy Red63624
Conclusion

The influence and density metrics point toward the same heavyweights: 21 Savage, Drake, and Travis Scott. This is intuitive since the two metrics are correlated. The longevity metric shines the spotlight on a different subset of rappers, like Gunna, Metro Boomin, and Ice Spice.

Either way, it was a great year for rap. Thanks for reading!

How to send yourself a notification when your code is done running

I have a couple of Python scripts scheduled to run daily and hourly using pythonanywhere. These scripts help automate tasks for me, like tracking cryptocurrency prices or sending texts to friends on their birthday.

Sometimes, the jobs fail and the code doesn’t run. Since I’d like to know when that happens, I add a few lines of code to send myself a text when something goes wrong.

To get started, create a Twilio account. You’ll need three things from your Twilio account to make the code work: an account sid, auth token, and active Twilio phone number.

Locate your account sid and auth token from the Twilio dashboard home page.

Copy the code snippet below, switching out the account sid, auth token, and sender and recipient phone numbers. Note that the phone numbers should be formatted like this: +18299321023. This is the machine-readable version of (829) 932-1023.

In practice, you can set up your automated scripts with a structure like this:

Pretty easy, right? Twilio’s API makes it simple to get SMS alerts in just a few lines of code.

The Rise of Rap: A Genre Popularity Analysis

Today it feels like rap is bigger and more mainstream than ever. A casual scan of the charts reveals that many of today’s biggest music icons are rappers. How long has it been this way? I remember a time when pop legends like Katy Perry, Lady Gaga, and Rihanna ruled the charts.

Looking for more than anecdotal evidence of the rise of rap as a genre in the mainstream music landscape, I developed a data-driven methodology to measure the high-level trend in music genre popularity over time.

Using Billboard’s Hot 100 Artist data, and mapping each artist to a genre using the Spotify API, I calculated what percent of the artists were represented within each genre over time, from 2006 to the present.

Here’s the trend:

Here’s another view with the same data, as a line chart:

It seems like the data supports my observation that rap has gone mainstream, with the percentage of rap artists in the Billboard Hot 100 growing steadily since 2014, and surpassing pop artists in 2018.

According to Rolling Stone, much of rap’s growth can be attributed to its reactivity on streaming services, with 92% of the genre’s total consumption coming from streaming channels. The timeline fits, with streaming giants like Apple launching in 2015, and Spotify hitting ~40 MAU in 2016.

While pop and country have maintained a relatively stable level of popularity, rock appears to be trending down, with rock artists composing less than 5% of the Billboard Hot 100 artist list in 2019.

What does the future hold? As the lines between genres continue to blur, with artists like Post Malone and Lil Nas X cutting across pop, rap, country, and even rock, it stops making sense to box artists into a single genre. In the age of the playlist, it’s easier than ever to rebel against the very idea of genre.

Walkthrough

The first step of the project was scraping the historical list of Hot 100 Artists from Billboard. Using the tidyverse and rvest packages in R, I quickly looped over the 13 years of available data:

Below is a preview of the first few rows of the resulting dataset:

Next, using a Python script and the Spotify API, I looped through each of the artists from the Billboard Hot 100 dataset and collected a list of their corresponding sub-genres. For example, Spotify associates Sean Paul with the sub-genres of dance pop, dance hall, and pop rap.

Here’s a preview of what the resulting data looks like:

The next step took some thought. I needed a way to map back each of the thousands of sub-genres labeled by Spotify into a few core genres, like pop, rap, country, and rock. Tapping into the work of the Every Noise project, which attempts to create “an algorithmically-generated, readability-adjusted scatter-plot of the musical genre-space”, I developed logic to assign a single genre to each of the artists in the original Billboard Hot 100 artist table:

Using this logic, each artist was assigned to a single genre bucket:

The last step was to merge the billboard artist and artist genre tables and calculate the genre percentage breakdown over time, from 2006 to 2019.

This R code produces the chart below that visualizes relative genre popularity over time:

You can find the GitHub repo for this project here.

Visualizing Rap Communities with Python & Spotify’s API

Finding new music you like can be tough. In my experience, there’s no single discovery mechanism that delivers consistently. I usually rely on a mix of sources: websites like Pitchfork or Genius, subreddits like popheads or hiphodheads, and curated playlists like Get Turnt or Hot Rhythmic. Lately, I’ve found new favorites through a Spotify feature called “Fans Also Like”.

FANS ALSO LIKE – A Spotify music discovery feature

Listed on each artist page, the “Fans Also Like” section is an algorithmically populated discovery feature built using a metric called “artist similarity”. This metric is based on shared fans, meaning the more fans two artists have in common, the higher their similarity score.

“Artist similarity is probably the second-most important piece of data we extract from listening patterns—after popularity. It’s the data behind radio, genres, and Discover pages.”

Glenn McDonald, Spotify’s data alchemist (source)

The cool thing is that Spotify exposes this discovery algorithm via API. After authenticating and supply an artist id, the API will return a list of 20 similar artists. Obviously, this is a huge win for music data nerds everywhere.

In this post, I’ll leverage Spotify’s “similar artists” API to build interactive network charts, visualizing how artists are linked together, as measured by the similarity of their fans.

Walkthrough

To access the Spotify API, you’ll need a Spotify account (free or premium), and a registered application. To make things easy, I used the spotipy library in Python, which supports all of the features of the Spotify Web API.

Next, leaning on the the spotipy library to do the heavy lifting, I can retrieve the artist and “similar artist” data with two lines, passing the artist id to the artist and artist_related_artists functions.

Here’s a sample of the result when we query Spotify for the artists most similar to Drake, according to listener behavior:

NamePopularityFollower Count
Big Sean87 7,113,709
J. Cole90 10,379,858
Jeremih84 4,094,532
Wale80 2,457,939
Rick Ross86 3,839,127

The list of similar artists is returned in order of ranked similarity score, meaning that according to the listener data, Drake is most similar to Big Sean, J.Cole, and Jeremih. Surprising? Let’s make the list more visual by creating an interactive plot using Flourish.

It’s a fun visual, but you’d find these same faces if you looked at “Fans Also Like” on Drake’s artist page. Let’s take it a step further and query the API for similar artists for the artists similar to Drake. Then we’ll start to get a sense of the pop-rap landscape.

Right off, it looks like Jeremih is the odd one out, with none of his peer artists overlapping with the rest of the group. In contrast, Big Sean overlaps three of five, J.Cole, Wale, and Rick Ross, with Drake.

Let’s see how things look when we pull in the full dataset, with each of Drake’s top 20 most similar artists and each of their 20 most similar artists.

How could we use this data to find new music? Counting the number of times an artist appeared across the second iteration of similar artists, below are the top artists to check out if you’re a Drake fan:

This has been one approach to understanding “community” in rap music. Another would be to analyze collaboration between artists and the frequency of features shared. However you find new music, “Fans Also Like” is a fantastic tool to explore new artists, and even genres.

You can find the full code to create the dataset used here and the dataset itself here.

Building a Birthday Text Bot using Twilio

A good way to show family and friends you care is remembering their birthday. It seems simple enough but in practice, birthday tracking for anyone beyond immediate family and very close friends can be time consuming. Thankfully, you can automate that!

While outsourcing birthday check-in duties does feel a bit impersonal, you can always follow up on the generic message after getting a reply. This post is a tutorial for building a birthday text bot using Twilio.

The first step to building a birthday bot is storing the list of birthdays and contact information somewhere. In this example, I’ve used Coda.io to store name, birthday, and phone number. While I’d prefer to use Google Sheets, Coda’s API interface makes it very easy to import data into the Python environment. Authentication occurs via a bearer token and the API returns a JSON file.

After a bit of unpacking and cleaning, we have a birthday data frame like the one below (this is dummy data, for obvious privacy reasons).

Next, since this code will be deployed to a server and run on a daily schedule, we need to determine which, if any, of our family or friends is celebrating their birthday today.

Finally, we need to tap into the power of Twilio to send the actual SMS message. Twilio is a really cool API service that allows you to programmatically make phone calls and send or receive text messages.

twilio.com

The actual code required to make the birthday bot come to life only requires about eight lines of code. After supplying an account identifier and authentication token, the message client takes as input the body of your text, your Twilio number, and the recipient’s phone number.

That’s it! Let’s see what the message looks like on the recipient’s end.

Very slick. By connecting a database (Coda.io) to a messaging API (Twilio), we’ve created a simple birthday text service, capable of earning you the reputation of most thoughtful friend. Enjoy!

Full code can be found on GitHub here.

Feature photo by Sarah Pflug from Burst.

Building a Simple Crypto Alert Bot in Python

Introduction

In the long run, I think cryptocurrencies will be more valuable than they are today, on average. The investment strategy consistent with that belief is to buy and hold (disclaimer below). However, considering a record of considerable volatility, could a crypto enthusiast be smarter about when to buy, in pursuit of a “bargain”?

This post outlines the process of building a simple crypto “bargain buy” alert system using Python, which sends a notification when a given cryptocurrency (BTC, XRP, ETH, etc.) appears “cheap” relative to historical prices. I use CoinAPI for current and historical cryptocurrency pricing and the Slack API for iOS and web push notifications.

My “Crypto Alerts” Slack bot notifies me of “bargain” opportunities daily

The true focus here is not the specific strategy (i.e. determining the right time to buy) but rather, demonstrating how APIs can power the creation of new and valuable services.

I broke the alert system process into four pieces:

  • Retrieve the crypto’s current price (CoinAPI)
  • Retrieve the crypto’s historical price data (CoinAPI)
  • Determine if current price is a “bargain”
  • Summarize findings via push notification (Slack API)
CoinAPI offers a entry-tier API key with 100 free daily calls

After writing the script in Python, I deployed it to PythonAnywhere and scheduled it to run daily. With that overview in place, let’s dive in and walk through the details!

Code Walkthrough

As usual, we’ll start by bringing in the necessary libraries. We’ll use the request library to make the API calls (GET from CoinAPI and POST to the Slack API), the pandas library to organize the JSON response.

To start, we send a request to CoinAPI to retrieve the current price of the cryptocurrency, measured in USD.

To retrieve historical exchange rates, we’ll modify the URL and specify that we’d like daily values for the last 30 days. For simplicity, we can save the results into a pandas data frame.

Now that we have the current price and a historical benchmark, we can take a stab at determining if the cyrpto is a “bargain”.

My approach here is unsophisticated. If the current price is less than the 20% percentile of prices from the last 30 days, it’s considered a bargain. If it’s greater than the 80% percentile, it’s a “rip-off”.

This goes without saying, but this strategy won’t make you a Bitcoin millionaire! However, it does provide a basic alert bot framework.

When I ran this code while testing, at a price of $11,706, BTC was labeled as a rip-off. Here’s a sample of the message the bot produces:

BTC is a RIP-OFF today. The current price of $11,706.27 is higher than 83.3% of closing prices during the last 30 days.

Finally, the last piece of the alert system is to distribute the trading insight via a push notification. Luckily, this is pretty easily accomplished using the Slack API.

To leverage this free resource, I created a new domain and registered an application. This supplied the required authentication token.

Once automated through Python Anywhere, the messages look like this inside of my “crypto-alerts” channel. They are also conveniently pushed to my iPhone via the Slack mobile app.

You can find the complete script here. Thanks for reading!

Disclaimer: This content is for informational purposes only. Nothing contained here constitutes a solicitation, recommendation, endorsement, or offer to buy or sell any securities or other financial instruments (including cryptocurrencies) in this or in in any other jurisdiction.

Mapping Scarsdale Real Estate Data with Python

This year my wife and I moved to New York for the start of a new job. Initially overwhelmed by the scope and pace of the NYC housing market, we were given the very generous and unexpected opportunity by a family friend to live in a house north of the city in Westchester County. Built in the early 1930s, the historic home is situated in central Scarsdale, an affluent suburban town known for high-achieving schools and extravagant real estate.

As a graduate student of historic preservation, my wife has been especially enthralled by the rich styles and architecture of the houses within the Scarsdale village limits. Naturally, we frequently discuss and analyze the homes we pass on walks and runs, her comments generally centered around history and architecture, mine on economics and valuation.

Sourcing the Data

Wishing to analyze the houses of Scarsdale in a more systematic way, I began to experiment with the Zillow API. Disappointed by both accessibility and content, I continued to search for a superior data source. Soon after, I discovered a tool developed by the Village of Scarsdale to search property information by road name and wrote a Python script to scrape the data. Curious to know if additional variables were available, I contacted the Scarsdale Village administration and was sent an Excel file with the complete set of residential properties, rich with detail and with few missing values (5,000+ rows, 100+ columns).

The dataset includes the address of each residential property, but for visualization purposes, I needed geographic coordinates (latitude, longitude). Luckily, the Google Maps API provides this exact functionality, known as geocoding. Having some experience with this API, it was simple to write a Python script to retrieve the geographic coordinates for each of the 5,000 properties.

After writing an R script to scrub the data (creating more descriptive variable names, filtering, removing duplicates), I was ready to visualize the real estate data of America’s most affluent town.  You can find both the raw and cleaned datasets here.

Mapping the Data

After considering the many potential ways to map the properties, I settled on three key views: Year Built, Total Assessed Value, and Sales Date.

After some research, I discovered the folium library, which leverages the mapping strengths of the leaflet.js library within the Python ecosystem to provide Tableau-like functionally. The timing was ideal considering my free Tableau college subscription recently expired!

1. Year Built

With (a few) homes built as early as the 1600s and (some) as recently as 2018, this view shows clusters of homes built in similar time periods and paints a picture of development over time.

Here, the color spectrum plots blue for older houses and red for newer houses. Drag to interact with the map and click on a dot to view the address and year built.

Note the layers of development along the Saxon Woods Golf course border and the concentration of older homes in the Greenacres area.

Full Page Map: Link

2. Assessed Value

In this heatmap, the brighter the dot the higher the assessed value. Clicking on a circle reveals the total assessed value for the current tax year as well as the square footage of the home.

Full Page Map: Link

Sales Date

Which neighborhoods are hot on the market? This view maps the data according to sales date, with more recent sales colored in green. No clear trend emerges here, with a fairly equal distribution across the village. Clicking on a dot reveals the latest sales date and the number of years since sale.

Full Page Map: Link

Code Appendix

We’ll now dive into how these maps were created. As usual, we start by calling the necessary libraries. Beyond the essential pandas and numpy libraries, I use folium for map creation and matplotlib.cm for color assignment.

In order to visualize a feature such as assessed value or years since last sales date, I needed to be able to bucket the values and assign each bucket a color.

The function below achieves that need, allowing the user to specify the number of buckets and a color spectrum. BI software such as Tableau replicates this kind of functionality, but with superior algorithms that scale for large datasets.

Finally, below is the framework used to create each of the maps. A dot is created for each of the properties, colored according to the bucket assigned and labeled by year built, total assessed value, square footage, or sales date.

You can find the complete code to replicate these maps here and the dataset here. Thanks for reading!

Scraping Stack Overflow Salaries with Python

I recently discovered a salary calculator on Stack Overflow. The tool takes inputs like role, location, and education and outputs salary predictions at the 25th, 50th, and 75th percentile.

Salary Calculator Interface

Based on the results of the annual developer survey, the calculator seems like an interesting way to study the marginal impact of expereince and education on earnings. As a recent undergraduate, I might be interested in understanding the impact of graduate degrees on income potential.

Calculator Output

To extract Data Scientist salary data (or extrapolated data) from the tool, I wrote a Python script using Selenium to loop through 350+ different combinations of location, education and expereince.

Results

There are many reasons to exercise skepticism when analyzing this data, like self-selection bias inherent to surveys. It’s obviously very unlikely that a data scientist responded from each location, education, and experience combination. Even if they did, salaries are likely to vary widely. To strengthen any insight derived from this analysis, I’d also collect data from sources like Glassdoor or Indeed, especially before making any significant education or relocation decisions!

With that long disclaimer in mind, below I visualize the scraped data with an interactive Tableau dashboard. You can filter by years of expereince and location to understand salary levels by education level:

One disappointment I had was realizing that much of the data returned from the calculator was the same across locations. The same salaries were also returned across expereince and education levels for graduate and postgraduate degrees. Despite the data shortcomings, this was an interesting exercise in automating data extract from web forums using Selenium. Thanks for reading!

Appendix

Python Script: Link
R Script: Link
Dataset: Link
Tableau Dashboard: Link

Speaker Gender Ratios in LDS General Conference

This weekend was LDS General Conference, a semiannual meeting where leaders speak to church members worldwide. After following the Twitter #GeneralConference hashtag, I became interested in the frequency of women speakers during past conferences. Using Python, I scrapped 40+ years of speaker data from LDS.org to understand the speaker gender ratio trend over time. Below is the code used and a graphic illustrating my findings.

Over the past 47 years, on average, women have comprised about 10% of the speakers per conference.

You can find the GitHub gist here and the full dataset here.

Using the Google Maps API to Visualize Chase’s Presence in Utah

I’ve been a happy Chase customer since 2010. I’ve appreciated the investment in their mobile platform and was excited about the recent You Invest announcement, allowing customers to trade 100 stocks and ETFs a year for free. With 5,100+ branches and 16,000 ATMs+ nationwide, Chase has a strong national footprint.

In this post, I use Python to recreate the map below for my home state of Utah, scrapping branch and ATM information from Chase.com and obtaining geographic coordinates using the Google Maps geocoding API.

chase-footprint
Chase branches in the U.S. in 2010. Source: Wikipedia

Before going further, I’d invite you to read Chase.com’s Terms of Use as well as Roberto Rocha’s article about the ethics of web scrapping. To avoid excessive server demands (although an unlikely issue for Chase), we’ll explicitly space out requests, made easy with Python’s time sleep method.

Scrapping Branch & ATM Information with Selenium

As usual, we’ll begin by calling the necessary libraries.

Next, we need to pass the driver a URL. Here I’ve used the Utah URL. This could easily be adapted to other states by changing the last two letters of the link.

Also note the executable path, which is pointed to the directory where my ChromeDriver is located. You can download the driver here.

When this code finishes running, the “locations” list contains location names, such as the following Utah cities:

We then convert these locations into Chase.com URLs.

The links now look like this:

The function below represents the process of scrapping the data for each location.

We’ll apply the function to each location URL to extract the corresponding branch and ATM information.

Finally, we’ll clean the information we’ve scrapped and organize it into tidy columns.

Here a sample of what the final dataset looks like:

LocationAddressType
Bountiful510 S 200 W Bountiful, UT 84010Branch
Farmington Station Park100 N Station Pkwy Farmington, UT 84025Branch
Brigham Young University800 E Campus Dr Provo, UT 84602ATM
Fashion Place6255 S State St Murray, UT 84107Branch

Geocoding Branch Address via Google Maps API

Per Google’s Get Started article, geocoding is the process of converting addresses into geographic coordinates, like latitude and longitude. Once we have a longitude and latitude combination, we can plot the branch and ATM locations on a map using Tableau or R.

Here is the Python code used to accomplish the geocoding:

Please note that you’d need to insert your own Google Cloud API key to make the code run. Finally, let’s visualize some of the data points with R!

Here’s the code to create this visualization:

You can view the data here and the complete code here. Thanks for reading!

css.php