[Paper]
The pervasiveness of abusive content on the internet can lead to severe
psychological and physical harm. Significant effort in Natural Language
Processing (NLP) research has been devoted to addressing this problem through
abusive content detection and related sub-areas, such as the detection of hate
speech, toxicity, cyberbullying, etc. Although current technologies achieve
high classification performance in research studies, it has been observed that
the real-life application of this technology can cause unintended harms, such
as the silencing of under-represented groups. We review a large body of NLP
research on automatic abuse detection with a new focus on ethical challenges,
organized around eight established ethical principles: privacy, accountability,
safety and security, transparency and explainability, fairness and
non-discrimination, human control of technology, professional responsibility,
and promotion of human values. In many cases, these principles relate not only
to situational ethical codes, which may be context-dependent, but are in fact
connected to universal human rights, such as the right to privacy, freedom from
discrimination, and freedom of expression. We highlight the need to examine the
broad social impacts of this technology, and to bring ethical and human rights
considerations to every stage of the application life-cycle, from task
formulation and dataset design, to model training and evaluation, to
application deployment. Guided by these principles, we identify several
opportunities for rights-respecting, socio-technical solutions to detect and
confront online abuse, including nudging', quarantining’, value sensitive
design, counter-narratives, style transfer, and AI-driven public education
applications.