Analysis of Wikipedia Coverage in Spanish-Language Media between 2013 to 2023

Authors

DOI:

https://doi.org/10.26441/RC24.1-2025-3726

Keywords:

Encyclopaedias, Press and Spanish Language, News Coverage, Communication Theory, Data Analysis, Trend Analysis, Content Analysis, Virtual Communities, Gender Equality, Lexical Analysis

Abstract

This article analyses Wikipedia’s coverage in news from Spanish-speaking digital media. Framing Theory is used to examine how media outlets present Wikipedia in their article headlines. A total of 652 news articles were analyzed from the Factiva database between the years 2013 and 2023. Various analyses were conducted, including the distribution and temporal trends of the news, word frequency and heatmaps, the Latent Dirichlet Allocation (LDA) algorithm, and word co-occurrence in content and headlines. Natural language processing and machine learning techniques were applied for topic analysis. The results show that Spanish media has published the most about Wikipedia, with increased coverage during global events such as the COVID-19 pandemic and the Ukraine conflict. Controversies related to the biographies of public figures, particularly politicians, are also highlighted during key moments. Furthermore, the analysis reveals a gender bias, with women participating less in Wikipedia editing and content related to them being more frequently deleted. The study concludes that there is a need to promote greater diversity within the editing community and to implement further measures to mitigate biases on the platform.

Metrics

Metrics Loading ...

Author Biography

Juan-José Boté-Vericad, Universidad de Barcelona

Doctor en Información y Documentación por la Universidad de Barcelona en España y Doctor en Lingüística y Ciencias de la Información por la Universidad de Hildesheim en Alemania. Es profesor del departamento de Biblioteconomía, Documentación y Comunicación Audiovisual en la Universidad de Barcelona. Sus líneas de investigación incluyen el comportamiento ante la información y la ciencia abierta. https://orcid.org/0000-0001-9815-6190, juanjo.botev@ub.edu

References

Afolabi, I. T., y Uzor, C. N. (2022). Topic Modelling for Research Perception: Techniques, Processes and a Case Study. En M. Al-Emran y K. Shaalan (Eds.), Recent Innovations in Artificial Intelligence and Smart Applications (pp. 221-237). Springer. https://doi.org/10.1007/978-3-031-14748-7_13 DOI: https://doi.org/10.1007/978-3-031-14748-7_13

Aletras, N., y Stevenson, M. (2013). Evaluating Topic Coherence Using Distributional Semantics. En A. Koller y K. Erk (Eds.), Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers (pp. 13-22). Association for Computational Linguistics. https://aclanthology.org/W13-0102

Boté-Vericad, J.-J. (2024). Códigos Python Análisis de Contenidos de Noticías sobre Wikipedia en la prensa hispanoablante. Zenodo. https://doi.org/10.5281/zenodo.13827464

Boté-Vericad, J.-J. (2023). Integrating mixed methods to analyse information behaviour in the use of educational videos in higher education [Stiftung Universität Hildesheim]. https://doi.org/10.25528/141

Boté-Vericad, J.-J. (2022). Analysis of Spotify Spanish spoken profiles in Twitter. https://doi.org/10.5281/zenodo.6618902

Bradshaw, S., Elswah, M., Haque, M., y Quelle, D. (2024). Strategic storytelling: Russian state-backed media coverage of the Ukraine war. International Journal of Public Opinion Research, 36(3), edae028. https://doi.org/10.1093/ijpor/edae028 DOI: https://doi.org/10.1093/ijpor/edae028

Cambria, E., Schuller, B., Xia, Y., y Havasi, C. (2013). New Avenues in Opinion Mining and Sentiment Analysis. IEEE Intelligent Systems, 28(2), 15-21. IEEE Intelligent Systems. https://doi.org/10.1109/MIS.2013.30 DOI: https://doi.org/10.1109/MIS.2013.30

Carmel, E. (2013). Mobility, migration and rights in the European Union: critical reflections on policy and practice. Policy Studies, 34(2), 238–253. https://doi.org/10.1080/01442872.2013.778028 DOI: https://doi.org/10.1080/01442872.2013.778028

Casado-Gutiérrez, F., Sánchez, R., Luque González, A., y García Guerrero, J. E. (2021). La pandemia Covid-19 según los medios internacionales: El caso de Ecuador a través de la teoría del framing en Twitter. RISTI: Revista Ibérica de Sistemas e Tecnologias de Informação, Extra 40, 410-422. https://www.risti.xyz/issues/ristie40.pdf#page=59

Chhabra, A., y Iyengar, S. R. S. (2020). Who Writes Wikipedia? An Investigation from the Perspective of Ortega and Newton Hypotheses. Proceedings of the 16th International Symposium on Open Collaboration, 1-11. https://doi.org/10.1145/3412569.3412578 DOI: https://doi.org/10.1145/3412569.3412578

Debus, M., y Florczak, C. (2022). Using party press releases and Wikipedia page view data to analyse developments and determinants of parties’ issue prevalence: Evidence for the right-wing populist ‘Alternative for Germany’. Research & Politics, 9(3). https://doi.org/10.1177/20531680221116570 DOI: https://doi.org/10.1177/20531680221116570

Ferran-Ferrer, N., Boté-Vericad, J.-J., y Minguillón, J. (2023). Wikipedia gender gap: A scoping review. El Profesional de la información, e320617. https://doi.org/10.3145/epi.2023.nov.17 DOI: https://doi.org/10.3145/epi.2023.nov.17

Gluza, W., Turaj, I., y Meier, F. (2021). Wikipedia Edit-a-thons and Editor Experience: Lessons from a Participatory Observation. Proceedings of the 17th International Symposium on Open Collaboration, 1-9. https://doi.org/10.1145/3479986.3479994 DOI: https://doi.org/10.1145/3479986.3479994

Goffman, E. (1974). Frame analysis: An essay on the organization of experience. Harper & Row.

Hinnosaar, M. (2019). Gender inequality in new media: Evidence from Wikipedia. Journal of Economic Behavior & Organization, 163, 262-276. https://doi.org/10.1016/j.jebo.2019.04.020 DOI: https://doi.org/10.1016/j.jebo.2019.04.020

Hobolt, S. B., Leeper, T. J., y Tilley, J. (2021). Divided by the vote: Affective polarization in the wake of the Brexit referendum. British Journal of Political Science, 51(4), 1476–1493. https://doi.org/10.1017/S0007123420000125 DOI: https://doi.org/10.1017/S0007123420000125

Johnson, G., Anderson, C., Dunning, K., y Williamson, R. (2024). National ocean policy in the United States: Using framing theory to highlight policy priorities between presidential administrations. Frontiers in Marine Science, 11. https://doi.org/10.3389/fmars.2024.1370004 DOI: https://doi.org/10.3389/fmars.2024.1370004

Kaffee, L.-A., Arora, A., y Augenstein, I. (2023). Why should this article be deleted? Transparent stance detection in multilingual Wikipedia editor discussions. En H. Bouamor, J. Pino, y K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 5891–5909). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.361 DOI: https://doi.org/10.18653/v1/2023.emnlp-main.361

Keswani, K., Das, I., Shrivastava, B., Gupta, A., y Katarya, R. (2020). LDA based model for mining textual features from financial news articles. En 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN) (pp. 43–48). https://doi.org/10.1109/ICACCCN51052.2020.9362882 DOI: https://doi.org/10.1109/ICACCCN51052.2020.9362882

Krishnamoorthy, S. (2018). Sentiment analysis of financial news articles using performance indicators. Knowledge and Information Systems, 56(2), 373-394. https://doi.org/10.1007/s10115-017-1134-1 DOI: https://doi.org/10.1007/s10115-017-1134-1

Lee, P. T. Y. (2018). In search of public agenda with text mining: An exploratory study of agenda setting dynamics between the traditional media and Wikipedia. En M. Ganji, L. Rashidi, B. C. M. Fung, y C. Wang (Eds.), Trends and applications in knowledge discovery and data mining (pp. 309–317). Springer. https://doi.org/10.1007/978-3-030-04503-6_30 DOI: https://doi.org/10.1007/978-3-030-04503-6_30

Liu, B. (2012). Sentiment analysis and opinion mining. Springer. https://doi.org/10.1007/978-3-031-02145-9 DOI: https://doi.org/10.1007/978-3-031-02145-9

Liu, C. (2020). Analysis of Relationship Between Hot News and Stock Market——Based on LDA Model and Event Study. Journal of Physics: Conference Series, 1616(1), 012048. https://doi.org/10.1088/1742-6596/1616/1/012048 DOI: https://doi.org/10.1088/1742-6596/1616/1/012048

Messner, M., y South, J. (2011). LEGITIMIZING WIKIPEDIA: How US national newspapers frame and use the online encyclopedia in their coverage. Journalism Practice, 5(2), 145-160. https://doi.org/10.1080/17512786.2010.506060 DOI: https://doi.org/10.1080/17512786.2010.506060

Mishra, A., Sahay, A., Pandey, M. A., y Routaray, S. S. (2023). News text analysis using text summarization and sentiment analysis based on nlp. 2023 3rd International Conference on Smart Data Intelligence (ICSMDI), 28-31. https://doi.org/10.1109/ICSMDI57622.2023.00014 DOI: https://doi.org/10.1109/ICSMDI57622.2023.00014

Morris-O’Connor, D., Strotmann, A., y Zhao, D. (2022). Editorial Behaviors for Biasing Wikipedia Articles. Proceedings of the Association for Information Science and Technology, 59(1), 226-234. https://doi.org/10.1002/pra2.618 DOI: https://doi.org/10.1002/pra2.618

Muñiz, C. (2011). Encuadres noticiosos sobre migración en la prensa digital mexicana: Un análisis de contenido exploratorio desde la teoría del framing. Convergencia, 18(55), 213-239.

Mutua, S. N., y Oloo, D. (2020). Online news media framing of COVID-19 pandemic: Probing the initial phases of the disease outbreak in international media. European Journal of Interactive Multimedia and Education, 1(2), e02006. https://doi.org/10.2139/ssrn.4667716 DOI: https://doi.org/10.30935/ejimed/8402

Pang, B., y Lee, L. (2008). Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval. Foundations and Trends in Information Retrieval, 21(2), 1-135. DOI: https://doi.org/10.1561/1500000011

Pérez-Salazar, G. (2019). Teoría del encuadre y plataformas sociodigitales de interacción: Un análisis de coyuntura. Revista Mexicana de Ciencias Políticas y Sociales, 64(236), 333–353. https://doi.org/10.22201/fcpys.2448492xe.2019.236.68820 DOI: https://doi.org/10.22201/fcpys.2448492xe.2019.236.68820

Petroni, F., Broscheit, S., Piktus, A., Lewis, P., Izacard, G., Hosseini, L., Dwivedi-Yu, J., Lomeli, M., Schick, T., Bevilacqua, M., Mazaré, P.-E., Joulin, A., Grave, E., y Riedel, S. (2023). Improving Wikipedia verifiability with AI. Nature Machine Intelligence, 5(10), 1142-1148. https://doi.org/10.1038/s42256-023-00726-1 DOI: https://doi.org/10.1038/s42256-023-00726-1

Pinto, R., Lacerda, J., Silva, L., Araújo, A. C., Fontes, R., Lima, T. S., Miranda, A. E., Sanjuán, L., Gonçalo Oliveira, H., Atun, R., y Valentim, R. (2023). Text mining analysis to understand the impact of online news on public health response: Case of syphilis epidemic in Brazil. Frontiers in Public Health, 11, 1248121. https://doi.org/10.3389/fpubh.2023.1248121 DOI: https://doi.org/10.3389/fpubh.2023.1248121

Piñeiro-Naval, V., y Mangana, R. (2018). Teoría del encuadre: Panorámica conceptual y estado del arte en el contexto hispano. Estudios sobre el Mensaje Periodístico, 24(2), 1541–1557. https://doi.org/10.5209/ESMP.62233 DOI: https://doi.org/10.5209/ESMP.62233

Piñeiro-Naval, V.; Igartua, J.-J., Marañón-Lazcano, F. de J., y Sánchez-Nuevo, A. (2018). El análisis de contenido y su aplicación a entornos web: un caso empírico. Tendencias metodológicas en la investigación académica sobre comunicación. Espejo De Monografías De Comunicación Social, (2), 253–272. https://doi.org/10.52495/c6.2.emcs.2.mic6 DOI: https://doi.org/10.52495/c6.2.emcs.2.mic6

Prasad, O. J., Nandi, S., Dogra, V., y Diwakar, D. S. (2023). A systematic review of NLP methods for Sentiment classification of Online News Articles. 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 1-9. https://doi.org/10.1109/ICCCNT56998.2023.10308056 DOI: https://doi.org/10.1109/ICCCNT56998.2023.10308056

Ptaszek, G., Yuskiv, B., y Khomych, S. (2024). War on frames: Text mining of conflict in Russian and Ukrainian news agency coverage on Telegram during the Russian invasion of Ukraine in 2022. Media, War & Conflict, 17(1), 41-61. https://doi.org/10.1177/17506352231166327 DOI: https://doi.org/10.1177/17506352231166327

Quintais, J. P. (2019). The new copyright in the digital single market directive: A critical look. European Intellectual Property Review, 2020(1). https://doi.org/10.2139/ssrn.3424770 DOI: https://doi.org/10.2139/ssrn.3424770

Ren, R., y Xu, J. (2024). It’s not an encyclopedia, it’s a market of agendas: Decentralized agenda networks between Wikipedia and global news media from 2015 to 2020. New Media & Society, 26(11), 6235-6259. https://doi.org/10.1177/14614448221149641 DOI: https://doi.org/10.1177/14614448221149641

Röder, M., Both, A., y Hinneburg, A. (2015). Exploring the space of topic coherence measures. En Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (pp. 399–408). https://doi.org/10.1145/2684822.2685324 DOI: https://doi.org/10.1145/2684822.2685324

Sádaba, T. (2001). Origen, aplicación y límites de la "teoría del encuadre" (framing) en comunicación. Comunicación y Sociedad, 14, 143-175. https://doi.org/10.15581/003.14.36373 DOI: https://doi.org/10.15581/003.14.36373

Shao, D., Li, C., Huang, C., Xiang, Y., y Yu, Z. (2022). A news classification applied with new text representation based on the improved LDA. Multimedia Tools and Applications, 81(15), 21521–21545. https://doi.org/10.1007/s11042-022-12713-6 DOI: https://doi.org/10.1007/s11042-022-12713-6

Silva, L., y Barbosa, L. (2022). Matching news articles and Wikipedia tables for news augmentation. Knowledge and Information Systems, 65(4), 1713–1734. https://doi.org/10.1007/s10115-022-01815-0 DOI: https://doi.org/10.1007/s10115-022-01815-0

Sv, S. B., y Geetha, A. (2019). Determination of news biasedness using content sentiment analysis algorithm. Indonesian Journal of Electrical Engineering and Computer Science, 16(2), 882–889. https://doi.org/10.11591/ijeecs.v16.i2.pp882-889 DOI: https://doi.org/10.11591/ijeecs.v16.i2.pp882-889

Szostek, J. (2018). Nothing is true? The credibility of news and conflicting narratives during “information war” in Ukraine. The International Journal of Press/Politics, 23(1), 116–135. https://doi.org/10.1177/1940161217743258 DOI: https://doi.org/10.1177/1940161217743258

Urologin, S. (2018). Sentiment analysis, visualization and classification of summarized news articles: A novel approach. International Journal of Advanced Computer Science and Applications, 9(8), 616–625. https://doi.org/10.14569/IJACSA.2018.090878 DOI: https://doi.org/10.14569/IJACSA.2018.090878

Valera-Ordaz, L. (2016). El sesgo mediocéntrico del 'framing' en España: Una revisión crítica de la aplicación de la teoría del encuadre en los estudios de comunicación. ZER: Revista de Estudios de Comunicación, 21(40), 13–30. https://doi.org/10.1387/zer.17259 DOI: https://doi.org/10.1387/zer.16404

Vállez, M., Boté-Vericad, J.-J., Guallar, J., y Bastos, M. T. (2024). Indifferent about online traffic: The posting strategies of five news outlets during musk’s acquisition of twitter. Journalism Studies, 25(11), 1249-1271. https://doi.org/10.1080/1461670X.2024.2372437 DOI: https://doi.org/10.1080/1461670X.2024.2372437

Van Eck, N. J., y Waltman, L. (2023). VOSviewer (Version 1.6.20) [Computer software]. https://www.vosviewer.com

Walter, S. (2019). Better off without you? How the British media portrayed EU citizens in Brexit news. The International Journal of Press/Politics, 24(2), 210–232. https://doi.org/10.1177/1940161218821509 DOI: https://doi.org/10.1177/1940161218821509

Wirawan, R., Krisnanik, E., y Arista, A. (2024). Text mining for news forecasting on the Turnback Hoax website. JOIV: International Journal on Informatics Visualization, 8(1), 96–106. https://doi.org/10.62527/joiv.8.1.1939 DOI: https://doi.org/10.62527/joiv.8.1.1939

Yang, P., y Colavizza, G. (2024). Polarization and reliability of news sources in Wikipedia. Online Information Review, 48(5), 908–925. https://doi.org/10.1108/OIR-02-2023-0084 DOI: https://doi.org/10.1108/OIR-02-2023-0084

Yang, Y., Kaizhong, J., Mingjun, Y., & Laxin, H. (2022). Selecting optimal LDA numbers to identify news topics. Data Analysis and Knowledge Discovery, 6(11), 72–78. https://doi.org/10.11925/infotech.2096-3467.2022.0115

Yin, R. K. (2003). Case Study Methodology. En Case Study Research Design and Methods (3.ª ed., pp. 96–106). Sage.

Zheng, S. (2020). The communication power of Chinese novel coronavirus pneumonia (COVID-19) news reports in light of the framing theory. Theory and Practice in Language Studies, 10(11), 1467–1473. https://doi.org/10.17507/tpls.1011.18 DOI: https://doi.org/10.17507/tpls.1011.18

Published

2025-02-03

How to Cite

Boté-Vericad, J.-J. (2025). Analysis of Wikipedia Coverage in Spanish-Language Media between 2013 to 2023. Revista De Comunicación. https://doi.org/10.26441/RC24.1-2025-3726

Issue

Section

Papers