Social media facilitates an easy platform for people to express their views and feelings in real-time. Therefore, it offers an alternative way to measure happiness based on sentiment expressed in public views. This project measures the temporal variation in Happiness in Dutch tweets. This method to measure happiness uses the language assessment by Mechanical Turk (labMT) word list to score the happiness of a corpus.
We are collecting Dutch tweets on an hourly basis via a streaming API. The raw tweets are susceptible to inconsistencies. Hence these raw tweets need to be cleaned and sanitized before they can be used.
We first extract the term frequency of individual words (monogram model) in the preprocessed text. We then compute the weighted-average level of happiness for the preprocessed tweets, thus of all the terms, based on the mean score of each word in the labMT word list. This gives us the hourly happiness score in Dutch tweets. Apart from the score, this algorithm also shows the high frequency words appearing in the tweets every hour. With this information we can find correlations between variations in happiness, public opinion expressed in social media, and real-live events that might have happened.