Online negativity is a difficult thing to deal with and it seems that Google is attempting to fix those problems.
There is just one slight hiccup; its Perspective AI doesn’t seem to be up to the task.
A group of researchers at Aalto University and the University of Padua have discovered Google’s artificial intelligence can easily be tricked.
The state-of-the-art hate speech detection models excel only when tested by the same type of data they were trained on.
Simple tricks to get around Google’s AI include: inserting typos; adding spaces between words; or adding unrelated words to the original sentence.
The AI detects hate speech by assigning a toxicity score to a piece of text.
This score defines the text as rude, disrespectful, or unreasonable enough that anyone might leave the conversation.
However, when it comes to understanding the context of expletives, the AI system is not intelligent enough.
A simple change between “I love you” and “I fucking love you” sees a change in score from 0.02 to 0.77.
This means that ‘toxicity’, as defined by the Perspective AI, is not applicable to hate speech in any legal or substantial form.
Similarly, typographical errors or ‘leetspeek’ (replacing common letters with numbers and/or symbols to turn ‘freak’ into ‘fr33k’) are also effective at tricking the AI while still retaining the original message’s readability and emotional impact.
With many social media platforms, such as Facebook, Twitter, and YouTube struggling to find the boundary between offensive and acceptable speech, an easily applicable artificial intelligence would clearly have its benefits.
Unfortunately, with this news, and the recent examples of artificially intelligent chatbots tweeting racist content, it seems AI will need to improve before it’s let loose on the comments section.