JAZ
Our project is an analysis of the viral issue: ABS-CBN Tax Evasion Mis/Disinformation. Our team, JAZ, aims to use our knowledge about Data Science to gain knowledge and new insight to this issue and share it to the world.
Overview
This is the reason why we decided to focus on this case: to get to the root behind how misinformation in social media can ultimately contribute to the stifling of the freedom of the press.
Misinformation regarding ABS-CBN’s, particularly about its tax evasion case, is prevalent on social media sites like Twitter.
Our solution is to use data science to gain insights on the spread of misinformation on Twitter and subsequently, uncover actionable steps that can battle this misinformation phenomenon.
Background
The renewal of the franchise was opposed by former president Rodrigo Duterte for many reasons. One of his main allegations against the media network was that they had been “cheating [the] government by the billions [of pesos] in taxes”, despite BIR clearing ABS-CBN of any tax delinquencies. While the NTC said that ABS-CBN would be allowed to continue operating after May 4, they issued a cease and desist letter against ABS-CBN after being pressured by the Solicitor General. This would prevent ABS-CBN from airing on TV and radio stations.
This led us to ask,
All of the tweets are equally likely to gain interaction whether it appeals to emotion, credibility, and logic.
Tweets that use a specific type of rhetoric appeal gain more interaction.
Analyze the content of tweets that posted mis/disinformation about ABS - CBN’s tax payment.
Data Collection
Analyze the appeals of the tweets containing disinformation about the ABS-CBN tax ecasion case.
View our dataset hereMethods
We performed inferential statistics to learn more about our data.
View our data exploration hereData is never clean when we first collect it. There are always missing values or incorrect formats. An important first step to our data exploration is handling missing data, ensuring formatting consistency as well as applying some inferential statistics by encoding categorical data and standardizing.
When we check for columns with missing values, the columns 'Account bio', 'Tweet Translated', 'Screenshot', 'Remarks', 'Reviewer' and 'Review' all had missing values, but they are deemed unnecessary for our data exploration so they are left as is.
The column 'Views' is also disregarded since all the tweets were posted before Dec. 22, 2022, when the Views feature was added to Twitter.
To ensure formatting consistency, we have formatted all of the data as per instructions when we collected our data using our Python algorithm.
The columns 'Account type', 'Tweet Type', 'Content type', 'Rating', and 'Appeal' have qualitative data which can be encoded using numerical values.
We used numbers starting from 1 to maximum amount of categories for a parameter to encode the categories.
We used the mean and z-scores to standardize some of the categories like 'Following', 'Followers', 'Likes', 'Replies', 'Retweets', and 'Quote Tweets' to get an overview on what the average interaction of a tweet containing mis/disinformation regarding ABS-CBN Tax Evasion.
Outliers show that there are data gathered that exceed average expectations. This could cause skewed distributions in standardization of the data and may cause abnormalities in the modeling stage.
However, the outliers also show some meaning behind the interactions that tweets containing mis/disinformation. Tweets that contain high interaction are few and far in between and users that have high followings are rare.
Hence, the outliers are left as is.
Processing the contents of the tweets proved to be difficult in Filipino. We had needed to translate the tweets into English and manually check for the correct translation before we could process them for stemming and lemmatization.
Because the majority of the tweets were in Filipino, they had to be manually cleaned to ensure that the translation would be more accurate. Here are some key parts of the manual cleaning:
GoogleTrans was used to translate the cleaned raw tweets from Filipino to English. The translation was further cleaned by:
The Google Sheets was updated to include the translated tweets. The translations were then checked manually to improve the context accuracy, fix erroneous spacing and catch mistakes. After updating, the new data was pulled to be used in the analysis.
Simple words would not demonstrate the our foundings. After all our hardwork, we've collated our newfound knowledge using graphs and plots to make it easily digestible.
This first bar graph displays an average of 24 likes on logically appealing tweets. Which is evidently higher than tweets that appeal to emotion and credibility. Overall, logical tweets have higher average interaction than the other appeals.
This graph shows that the peak of the tweets discussing the “ABS-CBN” tax evasion case peaked in January 2022 and slowly declined. However, the discussion resurfaced once again around December 2022.
The bar graph shows the amount of followers of our Twitter user sample at the time we collected the data. On the far left, more than 60% of the sample have less than 500 followers. There are lesser Twitter users who propagated misinformation about ABS-CBN and had a high follower count.
From the histogram, there are some interesting insights that we can gather. While the four most common words aren’t surprising, the word “billion” is predominantly used by disinformation tweets that appeal to logic and credibility.
Additionally, only tweets that appealed to logic used “GMA”, “hectare” and ITR, which are key points often brought up to explain why the ABS-CBN tax evasion case is valid.
This heatmap shows the correlation between when a tweet was posted and when the user joined Twitter. This gives further insight to the behavior of the “trolls” who made disinformation posts about the ABS-CBN case.
Visually, we can observe that around half of the users who made disinformation tweets were relatively new users whose accounts were made from 2020 onwards.
After getting a clearer picture of our data, we used Chi Square Test of Independence to determine whether there is a relationship between the appeals and average interaction of a tweet.
But first things first, there are two assumptions for this test.
We can assume #2 to always be true, and you can see in the table below that our dataset achieves the #1 since the lowest expected count is 5.4.
Appeal | Interaction | Observed | Expected |
---|---|---|---|
Logic | Higher than Ave. | 3 | 12.6 |
Logic | Lower than Ave. | 60 | 50.4 |
Emotion | Higher than Ave. | 21 | 16 |
Emotion | Lower than Ave. | 59 | 64 |
Credibility | Higher than Ave. | 10 | 5.4 |
Credibility | Lower than Ave. | 17 | 21.6 |
Observed and Expected Values
Using stats from scipy, we got the following results from the test:
Critical Value | 5.99 | Significance Level | 0.05 |
---|---|---|---|
Chi-Square Value | 15.9941 | Degrees of Freedom | 2 |
P-value | 0.0003 |
Chi Square Results
Results
We combined our insights from our inferential statistics, Natural Language Processing and statistical modeling.
A p-value of 0.0003 means we reject our null hypothesis in favor of the alternative hypothesis which states that the amount of interaction and the appeal of a tweet are associated, specifically tweets appealing to logic had lower than its average of 24 likes.
If we take a look at emotional appealing tweets, at the height of the controversy, it had the highest count compared to the other appeals but only had a low average interaction. The low frequency of credibility appealing tweets combined with the fact that most of the accounts were relatively new at the time, more than half of those accounts were anonymous, and had the least amount of followers, we can say credibility did not matter as much as the other two.
Even if tweets appealing to logic were associated with low interaction, it was the only appeal that had the key arguments relating to “GMA”, “hectare”, and “ITR”. The users who created those tweets generally had more followers and the amount of logic tweets were more consistent over time.
In conclusion,
Now, we have an idea on how misinformation tweets are structured and how it could appeal to the common Twitter user. We must be vigilant when we are presented with multiple pieces of information that could make an argument seem factual or logical, convincing us to believe fallacious statements. A good dose of skepticism and factual verification must always be our default when it comes to new information.
Another interesting insight is the lack of appeal to credibility of misinformation tweets. We must also verify our sources of information and whether they have the credibility to back-up their claims. ABS-CBN needed to build their credibility for the people to trust them yet misinformation can tarnish their reputation without a single ounce of credibility. Before even reading or listening to an argument, verify that the source is credible.
We have realizations that could be further improved in future endeavors: