Notability – Who and What Deserves To Be on Wikipedia?

Description

To be considered worthy of a Wikipedia article, the subject of the article has to be ‘notable’. The concept of notability is an important foundation of Wikipedia, which, unlike traditional print encyclopedias, does not have an editor who makes decisions about inclusion/exclusion of content. Without it, Wikipedia might see a lot of self-promoting content. Unfortunately, the concept can combine with existing biases in society to exclude topics and biographies that arguably should be included.

[Clarice] Phelps purified the berkelium-249 that was used in the discovery and identification of Tennessine (element 117), named after the location of the lab where she works. But Phelps was not named in the official announcement and was not profiled by international newspapers. Without these crucial pieces of recognition, her biography was quickly deemed not appropriate for Wikipedia. (Maryam Zaringhalam and Jess Wade in the Washington Post April 12, 2019)

Phelps is a woman and African American. She does have a Wikipedia entry today thanks to efforts from people like Zaringhalam and Wade. However, for each story that ends well (if we can say this), there will be many that do not. Using several metrics, Wagner et al.’s (2016) research shows that women’s threshold to be identified as notable by the Wikipedia editorial community is higher than men’s.

The aim of this project is to study the way that the concept of notability operates in the Wikipedia community. How is it operationalised in the Wikipedia guidelines? How do people use the guidelines in discussing the notability of a subject, e.g., in the talk pages or in edit comments? Are there common patterns that can be identified in data from Wikipedia? Do notability guidelines get applied to people of different demographics in different ways? How are decisions about notability countered?

Supervisors

Dr Alexander Voss, Abd Alsattar Ardati

Artefact(s)

This project can use either qualitative or quantitative methods. The former might be some form of virtual ethnography (Hine 2015). The output would be a dataset annotated using a qualitative data analysis tool such as Nvivo. This work does not assume knowledge from any CS modules, though some experience with qualitative methods would be ideal and doing this work convincingly will require good analytical and writing skills.
A quantitative analysis would involve writing code to extract a subset of data from a Wikipedia data dump followed by the construction of a reproducible analysis. This could take the form of a Jupyter notebook, for example. This requires familiarity with a suitable programming environment such as Python as well as basic quantitative data analysis skills. It might therefore best suit someone enrolled in the Data-Intensive Analysis MSc.
A mixed methods approach is also possible but given the time constraints it would probably be better suited if the project was run as a group project, with one student focusing on the qualitative analysis while the other focuses on the quantitative.

Background

Hine, C. (2015). Ethnography for the Internet. London: Taylor & Francis Group.

Wagner, Claudia, Graells-Garrido, Eduardo, Garcia, David, & Menczer, Filippo. (2016). Women through the glass ceiling: Gender asymmetries in Wikipedia. EPJ Data Science, 5(1), 1-24.

――――――――――――――――――――――――――――――――――――