It Is Your Time to Beat the Attackers
Three Steps You Should Add to Your Adversarial Text NLP Model
Winning a war against an adversary is not an easy task. The most known classic NLP models tend to work great with “ideal” text. But when it comes to real world text, it becomes nasty. In this post, I present some of the ways to deal with more dirty and malicious texts. Ways that I found appropriate to work within such cases from my own experience.
Introduction
Usually, when you search for different NLP models you will find many academic materials and theoretical ways to deal with textual information. Most of the models you will read about will be those that deal with an “ideal” text (a text without misspellings and other possible problems), such as BERT, ELMo and other similar models.
But what happens when the textual information that you are trying to analyze is created by a malicious party that is trying to exploit certain vulnerabilities in your system? In such cases, the common models will yield bad results because the text on which they are trained is an “ideal” text. Which is not the case in an adversarial text. In that case, there will often be a lot of intentional spelling mistakes, invisible characters, emojis, swapping letters with look-alike numbers or other letters, and more tricks.
I invite you to my house ➨ I i.n.vi.te yo-u t0 rny h0u$e
Steps to Deal with Adversarial Texts
The basic steps in a standard NLP model can include segmentation, tokenization, lemmatization, and more. Once you have the basic steps you can improve your model with the following steps.
Step 1 — Proper Preprocessing
Proper preprocessing is one of the key steps. In any NLP model, it is important to know how to clean the text appropriately, and it is particularly crucial in an adversarial text. In many cases preprocessing will be almost similar and include — removing invisible characters, URLs, email addresses, numbers and punctuations.
Step 2 — Switch to “Bag of Characters”
After we have cleared the text as much as possible we can move on to the next key step. If you already have a model, it is most probably that you have a step to build a “Bag of Words” (BoW) representation of the text. However, in the case of adversarial text, it can be better to replace it with a “Bag of Characters” representation. The idea is the same. We will take the text and split it into Ngrams in order to vectorize the text, but instead of splitting it by words, we will split it by characters in each word. For example “this is my text” will turn into “th hi is is my te ex xt” for bigrams.
The following is the Python implementation of that logic.
It will still find repeating pairs, only this time it will be less sensitive to misspellings and other changes that may appear in the text. Pay attention, it will cost more memory and computation time than the regular BoW, since we have a lot more pairs to consider. So don’t use this when you don’t have a good reason.
Step 3 — Extract Important Features
Next, if you didn’t do it already, you should extract features from the original text that may be relevant to your specific model. Let me remind you that in the previous step we removed some of the textual information. Information that might still be valuable to your model. So extract it along with some other features that may also be interesting, and analyze it in order to decide whether it should be part of your model or not. For example, in many spam cases, one of the important features is the presence of a URL in the text. Therefore, there should be a preprocessing step that extracts this information from the text and create a feature out of it.
In the end your model can look like -
Potential Model
Is It Enough?
I discovered that with these slight modifications to existing models we can overcome some of the problems that arrive with an adversarial text. It can help stabilize your model. But from my own experience, it won’t be enough, and you will have to combine it with other non ML ways like those in the pyramid of pain.
The Pyramid of Pain
For example, in the past, I contributed to a spam-labeling system. We tried to prevent malicious behavior with different methodologies. At some point, we created an ML model that labels this kind of behavior.
The tactics I suggested above were helpful and enabled the creation of a stable model that could be fitted to the data. It reduced the amount of spam drastically. But the spammers were still adapting their behavior, and we found ourselves labeling new samples every day.
Eventually, we decided that it will be easier to change the product’s behavior in a way that will hurt the spammers the most — we allowed URL insertions only for paying customers and it killed the spam.
If you want to hear more about the spam war story, you can watch this short presentation that I gave.
Conclusions
It can be difficult to win this fight. We can always find ways to deal with adversaries. The big question is what will be the price of each way and how much effort it will cost us. From my own experience, I suggest going back to the pyramid of pain, finding the root of the problem and getting rid of it. It will be an easier solution, at least more than continuing to play cat and mouse with your attackers. It is important to remember that NLP models are sensitive, and many times when there is a malicious party involved it will just continue to be a back and forth between the parties.
Special Thanks
This is the time to mention my gratitude to
who I have worked closely with over the past few years and have learned from most of the methods mentioned above
Ethan Brown who were part of our wonderful team to fight and reveal new ideas.