Sometimes rankings are useful, since they collapse many data points into a single metric, allowing for easy comparison. The problem is when rankings build on subjective methodologies or abstract criteria are taken as absolute truth, rather than a directional guide.
With that disclaimer as backdrop, it’s no surprise that Vault.com surveys professionals to rank the top employers in industries like law, consulting, and banking. The rankings they produce are based on surveys that try to measure things like prestige, culture, satisfaction, work/life balance, training, and compensation.
Obviously, the inputs (“prestige” and “culture”) are inherently abstract and highly subjective, so the output (rankings) is likely to be noisy and subjective as well. That said, I was interested to see how rankings, specifically in banking, had changed over time, so I compiled the Top 50 lists from 2011 to 2020.
The lists are composed of companies across the banking spectrum, from bulge bracket firms like Goldman Sachs and Morgan Stanley to elite boutiques like Centerview and Evercore to middle market banks like Piper Sandler and Raymond James.
Below are the results for the bulge bracket and elite boutique segments, along with a few observations, based on loose categories suggested by mergersandinquisitions.com.
Dominance of GS: Over the ten year period, Goldman only dipped below #1 briefly, in 2012-13.
Decline of JPM: Despite clenching the #1 spot in 2012-13, JPM declined in the following years, landing at #5 in 2020.
Growth of BAML: Starting in #9 in 2011, BAML’s rank steadily improved over time, hovering at #3 in 2020.
I compiled this data manually, but used r and ggplot to clean and filter the data and create the charts. You can find the full repo on Github here.
Import, Define ggplot Theme
Thanks for reading! Feel free to check out my other blog posts or click a tag below to see related blog posts.
Even though it’s been around for years, I just recently discovered Shark Tank, the show where hopeful entrepreneurs pitch business ideas to a panel of wealthy investors, or “sharks”. I usually wonder if there’s a method to the deal-making madness, especially when a pitch that resonates with me falls flat on the sharks.
In this post, I take my fandom to a deeper level by using episode descriptions from Wikipedia to understand what kinds of pitches have the highest chance of being offered a deal. In the process, I’ll use tools like web scraping, natural language processing, and API calls to gather, transform, enhance, and visual the data.
I’ve divided my workflow for this project into four steps:
Obtain episode-level descriptions via web scraping
Visualize key trends by season and pitch categories
1. Obtain episode-level descriptions via web scraping
This analysis is possible because of a Wikipedia page that contains short descriptions of every pitch delivered on Shark Tank.
The first step is to extract this information via the rvest package in R, looping over each of the nine tables (corresponding to nine seasons) within the page.
Next we’ll do a bit of cleaning, simplifying column naming conventions, and adjusting the data types for the air date and viewership fields.
2. Reshape data from episode-level to pitch-level
In its current form, we won’t be able to detect any patterns with this data since the descriptions are bundled at the episode-level, like this:
“Crooked Jaw” a mixed martial arts clothing line (NO); “Lifebelt” a device that prevents the car from starting without the seat belt being fastened (NO); “A Perfect Pear” a gourmet food business (YES);
We need to “un-nest” the descriptions so that each row contains a single pitch. This is easily accomplished using the unnest function from tidyr.
Now we have a clean dataset, ready to enhance and analyze. Here’s a sample of the data structure, highlighting a few variables:
a pie company
an implantable Bluetooth device requiring surgery to insert the device into the user's head
an electronic hand-held device for waiting rooms
a plastic elephant-shaped device that helps parents give small children oral medicine
a packing and organizing service based on an already successful business called College Hunks Hauling Junk
a mixed martial arts clothing line
a device that prevents the car from starting without the seat belt being fastened
a gourmet food business
a Post-It note arm for laptops
a musical way to teach students Shakespeare
3. Enhance data by categorizing descriptions via API
How can we systematically analyze what kind of pitches are more likely to be offered a deal when all we have is a brief text description? Rather than build my own NLP model from scratch to categorize pitches, I used uClassify, which offers “Classification as a Service” (CAAS).
These functions construct a URL with my personal API key, the classifier API name, and the text (pitch description) to be categorized. A GET call then returns a JSON with a list of categories and “match” scores.
For example, take pitch #803, “Thrive+”, which has this description: “capsules that reduce alcohol’s negative effects.” The category with the highest “match” score was Health, followed closely by Science. By categorizing the pitch descriptions, we’ll be more equipped to uncover some key elements of successful Shark Tank pitches.
4. Visualize key trends by season and pitch categories
Now for the fun part! After compiling, cleaning, and enhancing our dataset, we’re ready to visualize and model the data. First, let’s take a look at Shark Tank’s popularity over time, measured in TV viewership (in millions).
Even without the fitted line, it’s easy to see a rise and fall in popularity, with the peak around 2015 with 7.5 million viewers. Next, let’s look at how willing sharks were to make deals over the course of the show, across nine seasons:
During Season 1, less than 50% of pitches were offered a deal from the sharks. By season 9, deals were made over 65% of the time! I wonder if this had anything to do with sliding viewership.
Let’s dig a bit deeper and start looking at characteristics of successful pitches. Using the tidytext methodology, I determined which words within the pitch descriptions were most often associated with a strong response from the sharks (for better or worse).
Clothing is mentioned in 22 pitch descriptions, 70% of which were unsuccessful! On the flip side, when the pitch included something “portable”, the sharks were willing to make a deal 10 out of 13 times. If you make it onto Shark Tank, don’t mention ice! For whatever reason, almost 90% of those pitches resulted in no deal with the sharks.
Now let’s see what else we can learn by using the categories generated from the uClassify API classifiers:
Here we summarize pitch success by category, with the total number of pitches within the category represented above each bar. The dashed grey line represents the 50% cutoff, where a pitch within a given category is equally likely to be accepted or rejected.
Notice how over 65% of deals classified as “Recreation” were offered a deal by the sharks over the course of the nine seasons. It looks like “Game” entrepreneurs didn’t snag funding quite as easily!
This has been a fun and quick way to explore some of the nuance in the world of Shark Tank deal-making. Truthfully, the dataset we created was pretty limited. Adding in information like which shark (or sharks) made the deal, for how much, and for what percentage of equity would add more precision compared to simply knowing if a deal was made or not.
In addition, access to full pitch transcripts (rather than simplistic descriptions of ~10 words or less) would be much more helpful in accurately classifying the pitches into meaningful categories.
You can find the complete R code here and the final dataset here, both hosted on GitHub. Thanks for reading!