Love is all around: Popular words in pop hits

Data scientist Giora Simchoni recently published a fantastic analysis of the history of pop songs on the Billboard Hot 100 using the R language. Giora used the rvest package in R to scrape data from the Ultimate Music Database site for the 350,000 chart entries (and 35,000 unique songs) since 1940, and used those data to create and visualize several measures of song popularity over time.
A novel measure that Giora calculates is “area under the song curve”: the sum of all the ranks above 100 for every week the song is in the Hot 100. By that measure, the most popular (and also longest-charting) song of all time is Radioactive by Imagine Dragons:

It’s turns out that calculating this “song integral” is pretty simple in R when you use the tidyverse:

calculateSongIntegral %
filter(EntryDate >= date_decimal(1960)) %>%
group_by(Artist, Title) %>%
summarise(positions = list(ThisWeekPosition)) %>%
mutate(integral = map_dbl(positions, calculateSongIntegral)) %>%
group_by(Artist, Title) %>%
tally(integral) %>%

Another fascinating chart included in Giora’s post is this analysis of the most frequent words to appear in song titles, by decade. He used the tidytext package to extract individual words from song titles and then rank them by frequency of use:

So it seems as though Love Is All Around (#41, October 1994) after all! For more analysis of the Billboard Hot 100 data, including Top-10 rankings for various measures of song popularity and the associated R code, check out Giora’s post linked below.
Sex, Drugs and Data: Billboard Bananas

from Revolutions


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s