Pages

Electoral Predictions with Twitter: a Machine-Learning approach

By admin | April 27, 2015

Several studies have shown how to approximately predict public opinion, such as in political elections, by analyzing user activities in blogging platforms and on-line social networks. The task is challenging for several reasons. For instance, sample bias and automatic understanding of textual content are two of several non-trivial issues. The approaches rely on indicators based on tweet and user volumes, often including sentiment analysis techniques.

We analyzed a data set of geo-located tweets (95.000) related to the 2013 primary elections of the
major Italian political party: PD (Partito Democratico).We calculated the percentage of tweets (TweetsCount) mentioning one of the candidates (Renzi, Cuperlo or Civati). In addition we computed the percentage of users mentioning one of the candidates, showing its correlation with the electoral outcome. In particular we used the predictor UserShare, according towhich a single user vote is split among the mentioned candidates. Moreover we studied how amachine learning approach can learn correction factors for the indicators, based on regional outcomes. To evaluate the methods we calculated MAE (mean absolute error), RMSE and MRM (root mean squared error), MRM (mean rank match: the percentage of region where the predicted ranking among the candidates is correct).

Algorithm MAE RMSE MRM
TweetCount 0.0818 0.1024 0.35
UserShare 0.0616 0.0792 0.35
ML-UserShare 0.0536 0.0705 0.75

We agree that the task of predicting the elections through Twitter has many critical points: “file drawer” effect, correlation analysis instead of prediction (all studies report prediction of events whose outcome is already known), Twitter bias (twitter users are not a random sample of the voters), arbitrariness of data collecting (period, keywords, etc.), single dataset (all studies show results on a single dataset, with no comparisons), minimal cleaning phase (noisy data which contain robots, propaganda, spam, etc.), different evaluation measures.

Two papers discussing the above work have been published:

Topics: Uncategorized | No Comments »


« Previous Entries Next Entries »