Resumen
Negative social media usage during the COVID-19 pandemic has highlighted the importance of understanding the spread of misinformation and toxicity in public online discussions. In this paper, we propose a novel unsupervised method to discover the structure of online COVID-19-related conversations. Our method trains a nine-state Hidden Markov Model (HMM) initialized from a biclustering of 23 features extracted from online messages. We apply our method to 16,000 conversations (1.5 million messages) that took place on the Facebook pages of 15 Canadian newspapers following COVID-19 news items, and show that it can effectively extract the conversation structure and discover the main themes of the messages. Furthermore, we demonstrate how the PageRank algorithm and the conversation graph discovered can be used to simulate the impact of five different moderation strategies, which makes it possible to easily develop and test new strategies to limit the spread of harmful messages. Although our work in this paper focuses on the COVID-19 pandemic, the methodology is general enough to be applied to handle communications during future pandemics and other crises, or to develop better practices for online community moderation in general.