Zusammenfassung
Determining reliability of online data is a challenge that has recently received increasing attention. In particular, unreliable health-related content has become pervasive during the COVID-19 pandemic.Previous research has approached this problem with standard classi-fication technology using a set of features that have included linguisticand external variables, among others. In this work, we ...
Zusammenfassung
Determining reliability of online data is a challenge that has recently received increasing attention. In particular, unreliable health-related content has become pervasive during the COVID-19 pandemic.Previous research has approached this problem with standard classi-fication technology using a set of features that have included linguisticand external variables, among others. In this work, we aim to replicateparts of the study conducted by Sondhi and his colleagues using our owncode, and make it available for the research community. The perfor-mance obtained in this study is as strong as the one reported by theoriginal authors. Moreover, their conclusions are also confirmed by ourreplicability study. We report on the challenges involved in replication,including that it was impossible to replicate the computation of somefeatures (since some tools or services originally used are now outdatedor unavailable). Finally, we also report on a generalisation effort madeto evaluate our predictive technology over new datasets.