JAZ

Good information may cost a fortune, but bad information can cost a country

Our project is an analysis of the viral issue: ABS-CBN Tax Evasion Mis/Disinformation. Our team, JAZ, aims to use our knowledge about Data Science to gain knowledge and new insight to this issue and share it to the world.

Overview

While social media can increase freedom of expression and speech, we can also see how it can do the opposite — the spread of misinformation surrounding ABS-CBN’s tax evasion case on Twitter has arguably contributed to the public support for its non-renewal.

This is the reason why we decided to focus on this case: to get to the root behind how misinformation in social media can ultimately contribute to the stifling of the freedom of the press.

Problem

Misinformation regarding ABS-CBN’s, particularly about its tax evasion case, is prevalent on social media sites like Twitter.

Solution

Our solution is to use data science to gain insights on the spread of misinformation on Twitter and subsequently, uncover actionable steps that can battle this misinformation phenomenon.

Background

On May 5, 2020, ABS-CBN, one of the most prominent Philippine media networks, was shut down despite Congress filing 11 bills to renew the franchise since 2014.

The renewal of the franchise was opposed by former president Rodrigo Duterte for many reasons. One of his main allegations against the media network was that they had been “cheating [the] government by the billions [of pesos] in taxes”, despite BIR clearing ABS-CBN of any tax delinquencies. While the NTC said that ABS-CBN would be allowed to continue operating after May 4, they issued a cease and desist letter against ABS-CBN after being pressured by the Solicitor General. This would prevent ABS-CBN from airing on TV and radio stations.

This led us to ask,

Based on the 3 rhetorical appeals (logic, emotion, credibility), how do tweets containing misinformation possibly influence public opinion to oppose the franchise renewal of ABS-CBN?

Null Hypothesis

All of the tweets are equally likely to gain interaction whether it appeals to emotion, credibility, and logic.

Alternative Hypothesis

Tweets that use a specific type of rhetoric appeal gain more interaction.

Action Plan

Analyze the content of tweets that posted mis/disinformation about ABS - CBN’s tax payment.

Data Collection

We mined the internet for fake news data on ABS-CBN Tax Evasion using the following

Analyze the appeals of the tweets containing disinformation about the ABS-CBN tax ecasion case.

View our dataset here

Key Words

Tools

Methods

Let's talk about our data science methodology.

We performed inferential statistics to learn more about our data.

View our data exploration here

Preprocessing

Data is never clean when we first collect it. There are always missing values or incorrect formats. An important first step to our data exploration is handling missing data, ensuring formatting consistency as well as applying some inferential statistics by encoding categorical data and standardizing.

Handling Missing values

When we check for columns with missing values, the columns 'Account bio', 'Tweet Translated', 'Screenshot', 'Remarks', 'Reviewer' and 'Review' all had missing values, but they are deemed unnecessary for our data exploration so they are left as is.

The column 'Views' is also disregarded since all the tweets were posted before Dec. 22, 2022, when the Views feature was added to Twitter.

Ensuring Formatting Consistency

To ensure formatting consistency, we have formatted all of the data as per instructions when we collected our data using our Python algorithm.

Categorical Data Encoding

The columns 'Account type', 'Tweet Type', 'Content type', 'Rating', and 'Appeal' have qualitative data which can be encoded using numerical values.

We used numbers starting from 1 to maximum amount of categories for a parameter to encode the categories.

Standardization

We used the mean and z-scores to standardize some of the categories like 'Following', 'Followers', 'Likes', 'Replies', 'Retweets', and 'Quote Tweets' to get an overview on what the average interaction of a tweet containing mis/disinformation regarding ABS-CBN Tax Evasion.

Dealing With Outliers

Outliers show that there are data gathered that exceed average expectations. This could cause skewed distributions in standardization of the data and may cause abnormalities in the modeling stage.

However, the outliers also show some meaning behind the interactions that tweets containing mis/disinformation. Tweets that contain high interaction are few and far in between and users that have high followings are rare.

Hence, the outliers are left as is.
Natural Language Processing

Processing the contents of the tweets proved to be difficult in Filipino. We had needed to translate the tweets into English and manually check for the correct translation before we could process them for stemming and lemmatization.
Cleaned the Filipino Tweets

Because the majority of the tweets were in Filipino, they had to be manually cleaned to ensure that the translation would be more accurate. Here are some key parts of the manual cleaning:
- Changing Filipino slang to improve accuracy (changing shorthand Filipino to the formal word)
- Lemmatizing specific, non-english terms (words referring to ABS-CBN to “abscbn”)
- Lemmatizing groups of special category of words (variants of haha, hahaha, HAHAHAH as “laughter”)
Translated the Tweets

GoogleTrans was used to translate the cleaned raw tweets from Filipino to English. The translation was further cleaned by:
- Lowercasting the translated tweets
- Removing punctuation marks
Manually Corrected the Translation

The Google Sheets was updated to include the translated tweets. The translations were then checked manually to improve the context accuracy, fix erroneous spacing and catch mistakes. After updating, the new data was pulled to be used in the analysis.
Visualization

Simple words would not demonstrate the our foundings. After all our hardwork, we've collated our newfound knowledge using graphs and plots to make it easily digestible.

Average interaction (replies & retweets, vs likes) of the three appeals

This first bar graph displays an average of 24 likes on logically appealing tweets. Which is evidently higher than tweets that appeal to emotion and credibility. Overall, logical tweets have higher average interaction than the other appeals.

Frequency of Tweets about ABS CBN Tax Evasion

This graph shows that the peak of the tweets discussing the “ABS-CBN” tax evasion case peaked in January 2022 and slowly declined. However, the discussion resurfaced once again around December 2022.

Follower Distribution of Accounts
Follower count can change over time due to various reasons which might be an unreliable measure of credibility in the long run. However, we still believe that follower count is regarded as one of the measures of a user’s credibility on Twitter.
The bar graph shows the amount of followers of our Twitter user sample at the time we collected the data. On the far left, more than 60% of the sample have less than 500 followers. There are lesser Twitter users who propagated misinformation about ABS-CBN and had a high follower count.

NLP Word Frequency Visualization

From the histogram, there are some interesting insights that we can gather. While the four most common words aren’t surprising, the word “billion” is predominantly used by disinformation tweets that appeal to logic and credibility.

Additionally, only tweets that appealed to logic used “GMA”, “hectare” and ITR, which are key points often brought up to explain why the ABS-CBN tax evasion case is valid.

Account Date Joined vs Tweet Date Posted

This heatmap shows the correlation between when a tweet was posted and when the user joined Twitter. This gives further insight to the behavior of the “trolls” who made disinformation posts about the ABS-CBN case.

Visually, we can observe that around half of the users who made disinformation tweets were relatively new users whose accounts were made from 2020 onwards.

Testing

After getting a clearer picture of our data, we used Chi Square Test of Independence to determine whether there is a relationship between the appeals and average interaction of a tweet.

But first things first, there are two assumptions for this test.

All expected counts are at least 5
Individual observations are independent and the population should be at least 10 times as large as the sample, (10n < N)

We can assume #2 to always be true, and you can see in the table below that our dataset achieves the #1 since the lowest expected count is 5.4.

Appeal	Interaction	Observed	Expected
Logic	Higher than Ave.	3	12.6
Logic	Lower than Ave.	60	50.4
Emotion	Higher than Ave.	21	16
Emotion	Lower than Ave.	59	64
Credibility	Higher than Ave.	10	5.4
Credibility	Lower than Ave.	17	21.6

Observed and Expected Values

Using stats from scipy, we got the following results from the test:

Critical Value	5.99	Significance Level	0.05
Chi-Square Value	15.9941	Degrees of Freedom	2
P-value	0.0003

Chi Square Results

Results

Here's what we found out about the data we analyzed.

We combined our insights from our inferential statistics, Natural Language Processing and statistical modeling.

Results 1

A p-value of 0.0003 means we reject our null hypothesis in favor of the alternative hypothesis which states that the amount of interaction and the appeal of a tweet are associated, specifically tweets appealing to logic had lower than its average of 24 likes.
Results 2

If we take a look at emotional appealing tweets, at the height of the controversy, it had the highest count compared to the other appeals but only had a low average interaction. The low frequency of credibility appealing tweets combined with the fact that most of the accounts were relatively new at the time, more than half of those accounts were anonymous, and had the least amount of followers, we can say credibility did not matter as much as the other two.
Results 3

Even if tweets appealing to logic were associated with low interaction, it was the only appeal that had the key arguments relating to “GMA”, “hectare”, and “ITR”. The users who created those tweets generally had more followers and the amount of logic tweets were more consistent over time.

In conclusion,

Tweets appealing to logic can be said to be a major factor in propagating the misinformation regarding ABS-CBN’s tax evasion by having a consistent narrative of comparing it to GMA as well as providing additional misinformation regarding the land it occupied.

Implications

Now, we have an idea on how misinformation tweets are structured and how it could appeal to the common Twitter user. We must be vigilant when we are presented with multiple pieces of information that could make an argument seem factual or logical, convincing us to believe fallacious statements. A good dose of skepticism and factual verification must always be our default when it comes to new information.

Another interesting insight is the lack of appeal to credibility of misinformation tweets. We must also verify our sources of information and whether they have the credibility to back-up their claims. ABS-CBN needed to build their credibility for the people to trust them yet misinformation can tarnish their reputation without a single ounce of credibility. Before even reading or listening to an argument, verify that the source is credible.

Future Recommendations

We have realizations that could be further improved in future endeavors:

We suggest the development of Natural Language Processing steps designed for Filipino texts. The meaning of the tweets could be lost if we analyzed our data in English instead of Filipino, additional insight or definitive conclusions could be missed.
We would encourage the use of Filipino in presenting future data science projects to include the common Filipinos in the scope of our target audience. Raising awareness about misinformation and sharing our findings are most beneficial if shared with public using our national language.
In continuing this topic, we suggest the creation of a structured classification for appeals to logic, emotion, and credibility for better explanation on how the tweets were categorized.

Good information may cost a fortune, but bad information can cost a country

While social media can increase freedom of expression and speech, we can also see how it can do the opposite — the spread of misinformation surrounding ABS-CBN’s tax evasion case on Twitter has arguably contributed to the public support for its non-renewal.

Problem

Solution

On May 5, 2020, ABS-CBN, one of the most prominent Philippine media networks, was shut down despite Congress filing 11 bills to renew the franchise since 2014.

Based on the 3 rhetorical appeals (logic, emotion, credibility), how do tweets containing misinformation possibly influence public opinion to oppose the franchise renewal of ABS-CBN?

Null Hypothesis

Alternative Hypothesis

Action Plan

We mined the internet for fake news data on ABS-CBN Tax Evasion using the following

Let's talk about our data science methodology.

Preprocessing

Handling Missing values

Ensuring Formatting Consistency

Categorical Data Encoding

Standardization

Dealing With Outliers

Natural Language Processing

Cleaned the Filipino Tweets

Translated the Tweets

Manually Corrected the Translation

Visualization

Average interaction (replies & retweets, vs likes) of the three appeals

Frequency of Tweets about ABS CBN Tax Evasion

Follower Distribution of Accounts

NLP Word Frequency Visualization

Account Date Joined vs Tweet Date Posted

Testing

Here's what we found out about the data we analyzed.

Results 1

Results 2

Results 3

Tweets appealing to logic can be said to be a major factor in propagating the misinformation regarding ABS-CBN’s tax evasion by having a consistent narrative of comparing it to GMA as well as providing additional misinformation regarding the land it occupied.

Implications

Future Recommendations