When using new sources of data, also new techniques such as data mining and clustering or semantic analyses on natural language are essential. Among other things, this includes analysis of open answers, text files, sections and reports, as well as messages, such as e-mail or social media posts. The textual information can be analysed and represented using semantic weighting and cohesion.
One example of our work in this area is the use of Twitte-R, a package to continuously extract data from the social media pool of Twitter. During the run up to the 2017 national elections, we extracted and analysed Twitter messages to determine factors as sentiment, party interactivity, etc. By tracing specific ontological patterns, determining the proximity of semantics, and determining pragmatics, such as sentiment value, we were able to find correlations between parties, instances in time, the nature of the Twitter message, and intentions.
Assembling large quantities of textual information provides a complex information set. Using visualization techniques, such as chord charts and network graphs, we are able to detect clusters within the information set. Based on correspondence analysis we estimate the ideology of tweets and are able to detect echo chambers within Dutch (political) tweets.
Generally, we use natural language processing technology, program in various languages, for example Python, R, and C#, and utilize well-established packages and toolboxes for the applications within this discipline. The basis is often the writing of robust algorithms, setting up automated processes (scripts) and interfacing with open data protocols, such as APIs, that make processing and analysis possible.
Another example of this type of analysis, a so-called happiness live ticker that is derived from Dutch tweets on how happy people feel, can be found at Temporal Happiness Score of Dutch Tweets.