Information Retrieval using Social Media

Social Networks like Facebook, Twitter, Instagram etc have become an important part of our
 day-to-day lives. Apart from providing a platform for socializing with others, they have also 
become a medium for relaying and obtaining important information. All major real-life events 
can be found on social media within seconds of their occurrence. Such is the impact of social
 media that if an earthquake occurs, people open their social network accounts to confirm that 
such an event has happened instead of watching the news or reading newspapers.
 Social media provides a huge amount of meaningful data which can be analyzed to make
 interesting inferences. These inferences can be used to understand various things like how
 a particular event has affected the public, most affected locations etc. Social media is now 
seen as an effective medium to relay information to a large number of people in seconds and
 is also used for problem-solving.

IIn 2009, US Airways flight 1549 crashed in New York's Hudson River. The image given 
below appeared on Twitter just 4 minutes after the crash. Social media was used for the first
 time to request immediate help from government organizations.


Source:https://www.telegraph.co.uk/technology/twitter/4269765/New-York-plane-crash-Twitter-breaks-the-news-
again.html


Twitter has specifically become the most important social network to know what is happening
 around the world. A metadata tag called “hashtag” is usually associated to tweet regarding a 
particular event. News related to elections, sports, celebrities or any other domain instantly 
start trending on Twitter and is open to the public. To obtain and use information of twitter, one 
can use Twitter API to collect tweets regarding a particular search query and perform various
 data analyzing techniques on it. The example given below demonstrates how to do social
 information retrieval.

To collect tweets for a particular hashtag, say # FakeNews, I wrote the following snippet of 
code.



Source : PSOSM Assignment 1, Monsoon semester, 2017

It will collect 10 k recent tweets with #FakeNews in the English language from all over the 
world using twitter API. Once the dataset is collected, the following analysis can be done-

  • Make a histogram of top 20 words used in tweets after stopwords removal
  • Plot a pie chart to analyze the percentage of tweets from each country
  • Plot a graph of the number of tweets vs time
  • Analyze kind of photos, videos and links shared


By plotting most frequently used words with #FakeNews, one can find out the topics on which 
people are spreading fake news. It can be seen from the histogram that most of the fake news
 is related to US president Donald Trump.

Source : PSOSM Assignment 1, Monsoon semester, 2017


Another interesting thing to find out is the location from where these fake news originate. 
Twitter tweets have geolocation tags in them which can let us find the longitude and latitude 
of the place from where the tweet came. The pie chart given below shows the distribution of 
fake news across all countries.

Source : PSOSM Assignment 1, Monsoon semester, 2017

We can see that most of the fake news tweets come from the US. Similarly, other interesting 
observations can be made.

Since social media has the power to influence a large number of people, collecting data from 
social media and analyzing it can give interesting insights. Right now, not much importance
 has been given to information retrieval using social media. Currently, data generated by 
social networks is much more than the data collected by prominent search engines. In the near
 future, the data generated by online social networks is only going to increase. I believe the 
next major change in information retrieval systems is going to be the integration of social 
networks with IR systems. This is technically called Social Information retrieval (SIR).


Search engines can be enhanced to include more data from social networks in real time and
 produce an interesting analysis which can have various uses. For example, during elections, 
opinions of public from various social media platform can be extracted by search engines to 
give a combined public review. SIR can also be used to capture criminals and terrorists. 
Several terrorist organizations use Twitter to recruit people due to its mass reach. Location of 
any tweet containing any term related to terrorist organizations can be found with their location.
 Law enforcement is increasingly using social media information to solve cases.


Thus, Online social networks (OSN) provide an abundance of useful information depicting 
human relationships, opinions, suggestions etc. Having a search engine specifically for social
 media generated information will be interesting to see.




Sources :

1)https://www.telegraph.co.uk/technology/twitter/4269765/New-York-plane-crash-Twitter-
breaks-the-news-again.html
2)PSOSM Assignment 1, Monsoon semester, 2017

Comments