/The Data Science of Abuse

The Data Science of Abuse

Findings suggest that the more men were abused as children, the more they were likely to rape as adults.

Let us talk about the good side of Facebook, Google, Instagram and all other platforms which collect data to monetize. I want to talk about rape, rapists and what data science can do about it. If you have worked enough with data you must have come to 2 conclusions. Nothing happens without data. And if you manage to get data, cleaning and compressing the data can lead to wonderful results.

This article by New York Times published in 1991 has some interesting and contradicting insights. But I cannot expect more from them in 1991 since there existed no Data Science and Machine Learning was more of a taboo than science; something people did in labs to publish paper. When a naive non-data-science person looks at the data and patterns associated with a behaviour, first he is over-whelmed by the diversity of patterns and then he starts believing its too difficult to conclude anything. But if you outsource the work to computer algorithms like Boosted forest, they can perform surprisingly well. But how do you get the right data? Even crude oil if not the oil!

What I propose is that government agencies should run a data science experiment on all the confirmed rapists. Since we are dealing with a variety of pattern and large number of features, we need to do this across the globe to collect enough data to crack our highly imbalanced dataset problem. Now, there are 2 approaches to collect data. Offline approach – creating a dataset after asking questions about their a behaviour and history – quite an excruciating task. There can be issues with this as the rapists might not reveal everything since they might be ashamed to talk about something like a child abuse or their view on the past/themselves is different from the reality or they don’t have an online presence or they might have simply forgotten their past. Online approach – collect all the search and browsing history from google(or other search engines). Collect the social networking data which includes the interest, likes and posting data from Facebook. There is a possibility that the person might not like or post anything to avoid getting into the eyes of people. A lot of rapists look normal in real life and are hiding in plain sight – some are also opportunistic rapist – they don’t plan rape but they rape out of impulsiveness. I am sure Facebook must be keeping browsing history. Chat data is another interesting data to look into.

Now that we have data, we can definitely turn to NLP to find the content rapists are spending time on. I feel this is the holy grail for making the psychological map of a rapist. People with a history of child abuse go through severe depression and are sure to read content around depression more often. Certain rapists also indulge in humiliating women – hence will love posts where women are humiliated or put to shame. It is also proved that men who enjoy or get arousal from videos where women are abused are potential rapists. For collecting this kind of data, we can collect porn browsing history and google searches. The NYT article noted, “Rapists often recall being intensely angry, depressed or feeling worthless for days or even months leading up to the rape.” – NLP can help us find people who are showing all these symptoms of depression, feeling worthless and anger towards women. Since a lot of features can be highly correlated, we can definitely use dimensionality reduction algorithm to reduce the features. We can also turn to Bayesian Belief Network for finding underlying feature interactions.

Rapists often recall being intensely angry, depressed or feeling worthless for days or even months leading up to the rape.

Once we collect enough data and train a model, what can we do with it? Facebook has an algorithm in place to predict who is showing suicidal instincts and help them with trained human moderators. Whether we should use a similar model for rapists is a question worth pondering. Is prevention better than cure?

At the end, there are 2 technicalities to be thought about in detail:

1. Can the model be sure about False Positives(i.e. falsely predicted rapists)? From the perspective of Data Science, we can reduce False Positives by gathering more data and understanding these failures of prediction.

2. Sharing of this information with the government – is this a breach of privacy? Should we value privacy over safety?

Please post your views for a ‘healthy’ discussion 🙂

An AI evangelist and a multi-disciplinary engineer. Loves to read business and psychology during leisure time. Connect with him any time on LinkedIn for a quick chat on AI!