Before starting modeling our data and exploring different techniques for identidying Twitter bots using tweets data from the Twitter developer API. We have reviewed several literature in this perspective. In general, many different classification models have been already developed adapted to this field, and below are a few highlights.
Stefan Wojcik, “Bots in the Twittersphere”
In this work, Botometer which is a tool that uses machine learning algorithm were used to identify the tweets account. The botometer gives a score between 0 and 1 for each tweeter account by analysing more than 1000 information about the tweeter account. Then by manually identifying around 300 tweets accounts, it was possible to assign the treshold that classify the tweet accounts as bot or non-bot.
Chris Baraniuk,“How Twitter Bots Help Fuel Political Feuds”
In this study, we can see how bots can influence people in the political decision by having bots retweeting messages and trying to give a wrong percepective of the political situation. It seems these bots had influence multiple critical political decision such as U.K.’s “Brexit” referendum and Donald Trump’s 2016 campaign. The twitter CEO is working on stopping the bots abuse on Twitter, another test will be seen with the coming US elections.
Chengcheng Shao et al., “The spread of low-credibility content by social bots”
This paper studies how bots influence the the spreading of misinformation produced by sources that have low-credibility. The study found that bots utilitized specific strategies that proved to be effective in “viral” spreading of non credible content. Such strategies were to mention the names of individuals that are highly influential in tweets that contained or linked to misinforming content. The spreading of the information by influential people creates the false illusion that the content is credible.
Asbjan Ottesen Steinskogetal et al., “Twitter Topic Modeling by Tweet Aggregation”
This paper explores the utilization of topic modeling to gain insight into trending topics on twitter. Due to the limitedness of tweet texts, various differerent methods of tweet aggregation have been studied for more effective topic modeling. More specifically, Hashtag aggregation and author aggregation seems to make topic modeling more effective and results in better interpretability than standard topic models.
http://www.tweepy.org
https://developer.twitter.com
Twitter’s Tweet object
Twitter’s User object
API Documentation:
https://github.com/IUNetSci/botometer-python
Botometer API Overview:
https://market.mashape.com/OSoMe/botometer