My wife and two of her sisters ran cross-country and track in high school. I recently learned that their team website, which hosts thousands of event photos from the past 10 years, is being shut down. Wanting to save my mother-in-law from the unimaginably tedious task of manually downloading each image, I wrote a script in R to automate the process.
The website has a page for each season with links to event photo albums. For example, in the 2012 season, there are 81 photos albums and 10,000+ photos.
Each photo album contains somewhere between 80 and 150 photos. I needed to design the script to loop through and download each photo from each photo album.
In other words, I needed a way to pass a URL like the one below into the “file
After downloading the season overview page with the list of photo albums, I used html_nodes and
Finally, I looped through each photo album, replicating the folder structure locally, and downloading each of
The final step was the upload the images to the cloud for easy sharing and storage. Luckily, the
Assuming each image would have taken 20 seconds to download, label, and upload, the manual process would have taken ~500 hours, non-stop! Writing the scripts and monitoring the download and upload process took about 8 hours, for a net time saved of ~492 hours.
You can find the complete code here and archived photos here.