Bias Investigation in DBpedia - addendum

This addendum is to look into additional information related to the first two stories, if there were any.

The first data story can be found here and the second one can be found here.

The queries attempted in this story is to check for outliers or to attempt to add more value to the previous stories.


Checking for religious/ nationality/ ethnicity biases

The first part of the investigation was to attempt to find biases based around gender bias. Though the results were interesting, we now also attempt to look into results based around other probable types. These were narrowed down to religious, nationality and ethnicity. The goal here is to check for lack of representation or misrepresentation or simple data quality issues.

Religious Bias

First we look at religious bias. In the results above we see that the count for religion across both dbo and dbp shows us that Roman Catholics have the highest count followed by Islam. To understand this data better we will concentrate our search to just "Catholics". This filter leads to the result of multiple Catholicism churches present all over the world as well as different types of Catholicism such as Anglican and Lapsed. This might be due to the difference and liturgy and traditions held by most of the listed results. But the drop in rate of results is an indication of lack of data as seen below.

Ethnicity Bias

Similarly to the previous bias search we look into ethnicity of individuals and we see that Mexicans are the highest result followed by Bengali people. Looking into results for Mexicans we see that this is again the case of misusing the relation where we see case-sensitive duplicates for the same result. This issue falls under data quality issue that was seen in most of our findings.

Nationality Bias

The last bias we can look into is the nationality bias which leads us into what seems to be a rather biased data as the difference between the first result and the rest is a large one. We see that United States of America has the largest count of results. A deeper look into results with "state" provides us with results that are duplicates and unnecessarily tagged differently which again leads to the fact that it might be classified as data quality issues.


types of creators in dbo:creator

The idea behind this addendum comes from the second part of the investigation, where we saw that there were more male creators who created female characters compared to female creators and male characters.

dbo:creator is used when a creation has a creator attached to the result. This creation is not limited to just characters but also contains tv series and shows. Thus, for our usage purposes we clean this data by ensuring the results are listing characters and not tv shows for our testing purposes.

We see in the above query that the top five results are all male creators when sorted by male count with the only exception of Shonda Rhimes when who appears third if we sort by female count. This is an interesting start as we can clearly see the discrepancy between male and female creation count. Comic artist Chris Claremont is an interesting study as he seems to have created more female characters over male characters compared to the other male creators who usually have more male characters or equal number of creations.

When we narrow down this search even further to list only male creators and then only female creators, we get the results shown below.

In the above query we can see that the difference between male and female count is rather large with most of the male creators creating male creations. Only major exception so far seen is Chris Claremont as mentioned above. Compared to him, comic artists Stan Lee and Jack Kirby have 76 & 64 male creations while less than half female creations. On the other hand for female creators, there seems to be no defined skew for creations as seen below.

Upon diving into the dbpedia pages themselves, cases were found where certain characters were not assigned any gender at all. A prominent example would be the character of Hermione Granger, a secondary protagonist in the hit novel series Harry Potter by author J. K. Rowling. Though J. K. Rowling is listed as a creator, Hermione Granger does not appear in the female count despite also being listed in dbo:creation for J. K. Rowling.