Using Open Source Libraries for Sentiment Analysis on Social Media

Using Open Source Libraries for Sentiment Analysis on Social Media

Social network and communication

Social media plays a crucial role in the formation of public opinion. Sentiment analysis, also known as opinion mining, is the processing of natural language, text analysis and computational linguistics to extract subjective information from source material. Sentimental analysis is used in poll result prediction, marketing and customer service.

Sentiment analysis is widely used by research scholars and others. In this approach, there are a number of tools and technologies available for fetching live data sets, tweets, emotional attributes, etc. Using these tools, real-time tweets and messages can be extracted from Twitter, Facebook, Whats App and many other social media portals. This article presents the fetching of live tweets from Twitter using Python programming.
The emotional attributes of Internet users on social media portals can be analysed, and certain conclusions arrived at and predictions made using this method. Let us suppose that we want to evaluate the overall cumulative score of a celebrity. For this, Python or PHP based programming scripts can fetch live tweets about that celebrity from Twitter. After that, using natural language processing toolkits, the fetched data in the form of tweets or messages can be analysed and the popularity of that particular person or movie or celebrity can be more accurately assessed.

The following are the statistical reports from and about the real time data on social media and related Web portals.
Around 350 million tweets flow daily from more than 500 million accounts on Twitter. Around 571 new websites are hosted every minute on the World Wide Web. There are more than 5 billion users on their mobile phones concurrently.
On WhatsApp, there are 700 million active users. There are more than 1 million new user registrations every month.Around 30 billion messages are sent and 34 billion received every day on WhatsApp. On Facebook, five new profiles are created every second. There are also around 83 billion fake profiles. Around 300 billion photos are uploaded every day by 890 billion daily active users. About 320TB of data is processed daily, with 21 minutes being spent by every user, on an average.
Now, the question is: how to do research on these datasets? Also, which technologies can be used to fetch the real-time datasets? The live streaming data can be fetched using Python, PHP, Perl, Java and many others used for network programming.


Figure 1

Figure 1: Real-time data analytics by


Fetching live streaming data from Twitter using Python code
Specific packages named Tweepy and Twitter with Python are required to fetch live tweets from Twitter. After these packages are installed, the Python code will be able to fetch live data from Twitter.
These can be installed using the Pip command as follows:

$ python -m pip installtweepy
$ python -m pip installTwitter

The code to fetch live tweets from Twitter is:

from tweepy importStream
from tweepy importOAuthHandler
from tweepy.streaming importStreamListener
my_app_consumersecret = ‘ XXXXXXXXXXXXXXXXXXXXXXXXXX ‘
my_app_accesssecret = ‘ XXXXXXXXXXXXXXXXXXXXXXXXXX ‘
class TweetListener(StreamListener):
def on_data(self, mydata):
print mydata
def on_error(self, status):
print status
auth = OAuthHandler(my_app_consumerkey, my_app_consumersecret)
auth.set_my_app_accesstoken(my_app_accesstoken, my_app_accesssecret)
stream = Stream(auth, TweetListener())
stream.filter(track=[Name of the Celebrity or Movie or Person’])

After execution of this script, the output dataset is fetched in JSON file format. The JSON file can be parsed using the OpenRefine tool in the XML, CSV or any other readable format by the data mining and machine learning tools.
OpenRefine is a powerful and effective tool used for processing the Big Data and JSON file formats.
In a similar way, the timeline of any person or Twitter ID can be fetched using the following code:

my_app_consumerkey = ‘XXXXXXXXXXXXX’
my_app_consumersecret = ‘ XXXXXXXXXXXXX ‘
my_app_accesstoken = ‘ XXXXXXXXXXXXX ‘
my_app_accesssecret = ‘ XXXXXXXXXXXXX ‘
auth = tweepy.auth.OAuthHandler(my_app_consumerkey, my_app_consumersecret)
auth.set_my_app_accesstoken(my_app_accesstoken, my_app_accesssecret)
api = tweepy.API(auth)
list= open(‘Twitter.txt’,’w’)
print ‘Connected to Twitter Server’
currentuser = tweepy.Cursor(api.followers, screen_name=”gauravkumarin”).item()
u = next(currentuser)
list.write(u.screen_name +’ \n’)
u = next(currentuser)
list.write(u.screen_name +’ \n’)

The following script of Python can be used to parse the JSON to CSV format:

JSON - CSV Parser
l = []
forcurrentline infileinput.input():
currentjson = json.loads(‘’.join(l))
keys = {}
fori incurrentjson:
fork ini.keys():
keys[k] = 1
mycsv = csv.DictWriter(sys.stdout, fieldnames=keys.keys(),
forrow incurrentjson:

Figure 2

Figure 2: Statista as a prominent and key portal for statistical data

Figure 3

Figure 3: Live tweets fetched from Twitter in JSON format

Fetching data from Twitter using PHP code
For fetching live tweets using PHP code, the API TwitterAPIExchange is required. After including this API in this PHP code, the script will directly interact with the Twitter servers and live streaming data.

define(‘ CURRENTDBPASSWORD ‘,’Twitter’);
$settings = array(
‘oauth_my_app_accesstoken’ => “XXXXXXXXXXXXXXXXXX”,
‘oauth_my_app_accesstoken_secret’ => “ XXXXXXXXXXXXXXXXXX “,
‘my_app_consumerkey’ => “ XXXXXXXXXXXXXXXXXX “,
‘my_app_consumersecret’ => “ XXXXXXXXXXXXXXXXXX “
$url = “”;
$myrequestMethod = “GET”;
$getfield = ‘?screen_name=gauravkumarin&count=20’;
$Twitter = new TwitterAPIExchange($settings);
$string = json_decode($Twitter->setGetfield($getfield)
->buildOauth($url, $requestMethod)
->performRequest(),$assoc = TRUE);
if($string[“errors”][0][“message”] != “”) {echo“<h3>Sorry, there was a problem.</h3><p>Twitter returned the following error message:</p><p> <em>”.$string[errors][0][“message”].”</em></p>”;exit();}
foreach($string as $items)
echo“Tweeted by: “. $items[‘currentuser’][‘name’].”<br />”;
echo“Screen name: “. $items[‘currentuser’][‘screen_name’].”<br />”;
echo“Tweet: “. $items[‘text’].”<br />”;
echo“Time and Date of Tweet: “.$items[‘timestamp’].”<br />”;
echo“Tweet ID: “.$items[‘id_str’].”<br />”;
echo“Followers: “. $items[‘currentuser’][‘followers’].”<br /><hr />”;
if($mysqli->connect_errno) {
return‘Failed to connect to Database: (‘ . $mysqli->connect_errno . ‘) ‘ . $mysqli->connect_error;
$QueryStmt=’INSERT INTO ‘.MYDBNAME.’.’.CURRENTTWEETTABLE.’ (name, screen_name, text, timestamp, id_str, followers) VALUES (?,?,?,?,?,?);’;
if($insert_stmt = $mysqli->prepare($QueryStmt)){
$insert_stmt->bind_param(‘ssssid’, $name,$screen_name,$text,$timestamp,$id_str,$followers);
if(!$insert_stmt->execute()) {
return‘Tweet Creation cannot be doneat this moment.’;
return‘Tweet Added.’;
return‘No Tweet were Added.’;
return‘Prepare failed: (‘ . $mysqli->errno . ‘) ‘ . $mysqli->error;

Figure 4

Figure 4: OpenRefine tool for processing of messy datasets

Using these technologies, the parsing, processing and predictions on real-time tweets and their association with a particular event can be mapped. News channels adopt these technologies for exit polls, which help to predict the probability of a political party or candidate winning. In a similar manner, the success of a movie can be predicted after careful analysis of the live streaming data.

Research scholars can work on such real life topics related to Big Data analytics, so that effective and presentable research work can be accomplished.


April 6, 2016 / by / in , , , , , , , , , , ,

Leave a Reply

Show Buttons
Hide Buttons

IMPORTANT MESSAGE: is a website owned and operated by Scooblr, Inc. By accessing this website and any pages thereof, you agree to be bound by the Terms of Use and Privacy Policy, as amended from time to time. Scooblr, Inc. does not verify or assure that information provided by any company offering services is accurate or complete or that the valuation is appropriate. Neither Scooblr nor any of its directors, officers, employees, representatives, affiliates or agents shall have any liability whatsoever arising, for any error or incompleteness of fact or opinion in, or lack of care in the preparation or publication, of the materials posted on this website. Scooblr does not give advice, provide analysis or recommendations regarding any offering, service posted on the website. The information on this website does not constitute an offer of, or the solicitation of an offer to buy or subscribe for, any services to any person in any jurisdiction to whom or in which such offer or solicitation is unlawful.