With gender equality being raised in a wide variety of domains, a data-backed analysis can help our society understand its current position and the steps it needs to take to improve. This project aims at analysing the status of females through the lens of media.
By using the Quotebank dataset, a corpus of english quotations from a decade of news, the project provides insights on how gender is represented. The data in this project covers quotes published between 2015 and 2020. The site names at the origin of the quotes were extracted from the URLs of the article, which was provided in the original dataset. Based on this list, which uses Google Page Rank and other independent web metrics for various search engines (more about the ranking method here), 116 sites were selected based on their web ranking scores. Only quotes whose source was within this list were kept for the study. This filtering allowed to reduce the media sources to the most known and common journals or sites.