By now you've probably guessed that I love chocolate and Big Data.
Welcome to my CMPT732 Final Project!
In a recent study by Hu, Manikondaa and Kambhampati, at the University of Arizona, Instagram posts were clustered using their hashtags as image descriptions with the K-Means Algorithm. Instagram posts were successfully categorized in 8 different categories, one of which is Food.
This project will not expand on the research above, but instead dive directly into one of the most popular subcategories in Food, namely Chocolate!
Several weeks of data collection and ETL was preformed to create a database of all Instagram posts containing the hashtag #chocolate for the one year period from October 10, 2014 to October 9, 2015. The database contains over 9 million Instagram posts from across the world!
This project hopes to shed light on understanding the ways in which Instagram users interact with chocolate and the many popular brands out there. A similar, but much larger, study was conducted earlier this year NetBase where they created a Brand Passion Report on the social consumer views of chocolate using over 150 million social media and web sources.