Polarized User and Topic Tracking in Twitter
Short paper accepted at SIGIR ’16: ACM Conference on Research and Development in Information Retrieval .
Abstract. Digital traces of conversations in micro-blogging platforms and in OSNs provide information about user opinion with a high degree of resolution. These information sources can be exploited to understand and monitor collective behaviors. In this work, we focus on polarization classes, i.e., those topics that require the user to side exclusively with one position. The proposed method provides an iterative classification of users and keywords: first, polarized users are identified, then polarized keywords are discovered by monitoring the activities of previously classified users. This method thus allows to track users and topics over time. We report several experiments conducted on two Twitter datasets during political election time-frames. We measure the user classification accuracy on a golden set of users and we provide an analysis of the relevance of the extracted keywords for the ongoing political discussion.
Our method requires some initial seed topics that identify the classes of interests. We propose to identify them with a single textual keyword for each class. Although each keyword identifies a topic, e.g., a political party, it is not sufficient to correctly classify users.
The Polarization TRacker (PTR) algorithm iterates the two classification steps, namely UserClass and HashtagsClass, that continuously improve the classification into polarization classes of users and of the hashtags they used. The goal of the first step is to identify polarized users on the basis of the given hashtags. First, we identify polarized tweets, which mention seed hashtags. We discard all of those tweets which contain hashtags belonging to more than one polirized class. We thus measure the user polarization, i.e., if for a polarization class, the number of tweets by a user is significantly larger than for any other class, then the user is labeled with the corresponding class. The goal the second step is to process all the hashtags adopted by classified users in order to discover a new set of discriminating hashtags. We take into considerations all the hashtags used, and not only those occurring in the polarized tweets as in the previous step. This allows to extend our analysis to the full set of topics discussed by the users, even if they were not captured in the early iterations of the algorithm.
We built an evaluation dataset by identifying those users whose opinion can be inferred with high confidence. During elections, as for other events, very specific hashtags are used over Twitter to express a strong intention of vote or an explicit membership in a group. We assume that users that frequently use one of such hashtags are strongly sided with one of the competing parties and they will not change idea in the short term. Such hashtags, named golden hashtags, were handpicked among the 500 most frequent in the data. The used golden hashtags are of the kind #i-vote-party.
This evaluation dataset id used to evaluate the user classification accuracy of the proposed algorithm. Experimental results show that the F-measure achieved by the proposed PTR provides an overall improvement w.r.t. the k-means baseline of +71% and +7% on datasets IT13 and EU14 respectively.