Animesh Mukherjee

Animesh Mukherjee

I am an Associate Professor and A K Singh Chair in the Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur. My main research interests center around (a) investigation of hate and abusive content on social media platforms, (b) fairness and bias in information retrieval systems, (c) media bias, and (d) quality monitoring of Wikipedia articles. Some of the notable awards that I have received INAE Young Engineering Award, INSA Medal for Young Scientist, IBM Faculty Award, Facebook AI and Ethics Research Award, Google Tensorflowaward, GYTI Award, Humboldt Fellowship for Experienced Researchers.

10:50 am - 11:20 am
Session: Medical Text and Data Mining

Characterizing the spread of exaggerated news content over social media

In this work, we consider a dataset comprising press releases about health research from different universities in the UK along with a corresponding set of news articles. As a first step we perform an exploratory data analysis to understand how the basic information published in the scientific journals get exaggerated as they are reported in these press releases or news articles. This initial analysis shows that some news agencies exaggerate almost 60% of the articles they publish in the health domain; more than 50% of the press releases from certain universities are exaggerated; articles in topics like lifestyle and childhood are heavily exaggerated. Motivated by the above observation we set the central objective of this paper to investigate how exaggerated news spreads over an online social network like Twitter. We next study the characteristics of the users who never or rarely post exaggerated news content and compare them with those who post exaggerated news content more frequently. We observe that the latter class of users have fewer retweets/mentions per tweet, have significantly more followers, use more slang words, fewer hyperbolic words, and less word contractions. We also observe that the LIWC categories like ‘bio’, ‘health’, ‘body’, and ‘negative emotion’ are more pronounced in the tweets posted by the users in the latter class. As a final step, we use these observations as features and automatically classify the two groups achieving an F1-score of 0.83.