From the Cloud to the Crowd

A robots attempt to blend in with humans....Can you spot?

Jun 26, 2023

Figure 1. A Robot's attempt to blend in with a human crowd. Image credit: Stable Diffusion XL — A Robot's attempt to blend in with a human crowd. Image credit: Stable Diffusion XL

Companies start using fraud detection to protect their digital marketing campaigns because they are experiencing serious fraud problems and the effects fraud cause. Short term effects: wasted marketing budget and potential litigation costs (tcpa), Long term effects: Data pollution, ie. they changed their interface multiple times based on the outcome of experimentations (A/B tests), but once they realized that they had ~15% fraud during these experiments; the outcome is unreliable and they start to wonder if all these incremental changes need to be undone.

One of the questions during an initial call at the onboarding process is: “Do you currently have a fraud detection vendor?”, and if so they’re apparently not happy and are looking for alternatives. The reason they are not happy may be one of these two: They were hit by a brand-new never seen type of fraud (less likely), or their current fraud detection vendor stinks (more likely).

The first step is to start measuring the traffic that hits their landing page(s), filling out their lead generation forms, their e-commerce shop’s checkout, etc. Once enough data is collected the preliminary data exploration starts. In most cases they are hit with already known fraud types, either based on public available knowledge or fraud as a service. But, sometimes it is fraud-hunting time! This means they are hit by something new, something unknown, something interesting!

Before showing how the fraud hunt exactly works, let’s answer the question: How and why can detecting this type of fraud be a problem?

Impression and click fraud, ie. loading or pretending to load ads and clicking on a subset of these, does not make much money per impression or click and means that fraudsters need to run their scheme at a high volume in order to make some money. That means you’ll be able to see outliers easily, for example malicious apps, fake websites, etc. In lead generation, or credit card testing, from a fraudster’s perspective the volume can be 1,000 to 10,000 times less to earn the same dollar amount. This means (unknown) fraud traffic is able to hide itself much much better within the total traffic; a classic needle in a haystack problem.

The reason unknown fraud traffic is (still) unknown is: Fraudsters did their homework well. They know how to, or have bought the tech to bypass existing fraud detection of fraud detection vendors. Reverse engineering has enabled them to exactly know what data is collected and have a solution to bypass or spoof data in order to pass the fraud detection test. Solutions like this can be bought.

Figure 2. Retro way of signal analysis determining traffic’s true nature. Image credit: stable diffusion XL

Fortunately, some things can’t be faked, forged, bended or regenerated: The laws of physics. Measuring well defined challenges using time is the perfect metric to spot outliers and it’s nearly impossible to fake. Of course some might try, but I can assure you it’s hard, really hard, and the faked results will appear as another type of outlier.

Unfortunately, this technique doesn’t enable you to flag and tag individual visits, but it will show you when you have a group unknown fraud visitors, for example visitors from the cloud hiding in your human audience crowd. How? Some examples:

Emulated phones have a different signature compared to real phones
Browsers in the cloud (headless and headful) running in containers have a different signature compared to real devices
Human workers abroad in low wage countries, again, show a completely different signature.

The only prerequisite is: the fraud activity has to be above the noise level in order to spot it. But, that’s not really an issue because once a fraud scheme works fraudsters get sort of greedy, and the average noise level isn’t that high.

The fraud hunting game

Back to the hunt for unknown fraud. A fraud hunt starts by looking at a few charts which break down traffic into different subsets, eg. per source, per campaign, per device type, etc. These charts are histograms and will show how the timings of sessions are distributed within this subset. Figure 1 shows two examples of clean human traffic. Clean traffic generally resembles a log-normal distribution, in this example it’s not 100% exactly a log-normal curve, but close enough.

Figure 3. Histogram showing how pure human traffic looks like this. X-axis represents time

Looking at human data can be quite boring, but luckily when new clients start they mostly have a fraud problem, that means: there HAS to be fraud. So, how does this fraud look like? Figure 4 shows four examples how traffic with fraud looks like. Each of the four charts only shows data from a single (traffic) source. In chart A and D two separate types of fraud can be seen. The fraud in chart C might look low, but this is exactly what happens when fraud traffic blends in well, which you’ll see in the colorized version. Figure 5 will show a colorized version of the same sources but now broken down to human (blue) and fraud (red) traffic.

Figure 4. Four histograms of traffic, each a different source, each has potential fraud. X-axis represents time

The four histograms in the charts in figure 4 don’t tell you yet what and how to detect fraud at the individual visit level, but once you are aware of it, you're able to zoom in on specific traffic to find these sessions and look for clues. Then it’s just a matter of adding detection and the fraud hunt has been successful. It has proven that once isolated it’s quite easy to see what makes a new type of fraud unique and how it does work. This is exactly why Oxford Biochronometrics is able to flag these visits without false positives, also shown in Figure 5.

Now that the fraud is flagged automatically and the client is happy with our service (we know they’re happy because we get goodies instead of legal threats) we’re are able to make a colorized versions of these charts. Figure 5 shows the same 4 examples as shown in Figure 4: Human visitors are blue and fraudulent visitors are colored red. Again, each subchart shows only traffic from a single source. Both the fraud part and the human part have been normalized vertically to clearly show where and how the fraud is present within the total. Depending on the source fraud is generally less than ~20% of the volume, and without normalization it would look small in comparison, making it harder to compare the fraud distribution vs. the human distribution. So, this type of chart enables us very quickly and without any hard mental calculations to compare the human/fraud traffic per specific source and thus determine its quality.

By having a completely separate way of verifying traffic and/or identifying new types of fraud, reconfirms that the fraud detection pretty darn works. New fraud will always emerge and appear and the time between its appearance time and its detection is what we call ‘the gap’. But, we try to keep this gap as short as possible. Because: The shorter the gap, the better the overall quality of your campaigns, and also less money flows to fraudsters.

Figure 5. The chart shows the colorized histograms as shown in Fig 4. X-axis is time. Blue is human. Red is fraud.

Figure 5 contains the same 4 example charts as shown in Figure 4. These are all from well established sources and are based on highly optimized and targeted campaigns. This means that these sources which are companies, some even public listed, are getting -unknowingly?- away with selling inferior trafic to their clients. This is just a friendly reminder: Don’t think you’re good because you buy and get your campaign traffic or lead generation traffic from a well established name.

Fraud hunting at existing clients

Charts like Figure 5 are based on traffic arriving at our client’s landing pages, meaning a click as been made prior to the arrival, also meaning that the click fraud detection of these “sources” is suboptimal. It also proves that fraud will keep on appearing, even if you don’t have to pay for it.

That makes perfectly sense, because without a fraud detection you would pay for these clicks; easy money! That’s why our clients are happy that we flag the individual visits. It enables them to go back and refuse to pay for fraud. Simply said, when they’re not happy with the performance of one of their affiliates or a traffic source, they’ll eventually be looking for a replacement. Or, when they are happy they will try to buy more of the same quality traffic. But unfortunately quality in this case means ‘humans’ and humans in your target audience are finite, so instead bots and fraudsters might be appearing.

This is why continuous monitoring traffic from existing clients is a must. Once the fraud pattern shown in the examples starts to emerge in the human traffic, someone is trying to cheat on you! And in a lot of cases that someone might not even be aware of it, because they again buy traffic from external sources.

So, how do you know you’re sound and safe and secure? You don’t, unless you start measuring your traffic. And, secondly, it also depends which vendor is measuring and validating your traffic. Some vendors report only 1% fraud, which is WAY too low. Other vendors reject way too much and thus have a lot of false positives which also isn’t optimal because rejecting humans means lower (human) conversions and thus lower sales, and thus increasing your customer acquisition costs.

Thirdly, “what is the best fraud detection vendor?” as every vendor will tell you they are the best, they have the best AI and the biggest computers, do trillions of interactions, etc? Simply: A vendor should convince you with data, analytics and show exactly what portion of the traffic is fraud and why it has been flagged. Do you still experience fraud in the remaining portion, eg. when contacting generated leads? Have you ever manually contacted generated leads which were flagged as fraud, to they were a false positive? There’s no TCPA risk when manually calling a generated lead. Doing these checks will allow you to determine the accuracy of the fraud detection, as we encourage our clients to do so, and will tell you whether your fraud detection vendor is doing a good job.

Conclusion

You’ll be surprised to experience the difference when your campaign data suddenly is clean, you know how your real human audience really responds to a campaign and creative (and not the bots who are eager to), your forms are filled out by real humans and thus increasing true conversion rates, your legal department isn’t receiving TCPA violation letters which have to be settled at $500+, you see real impact based on the A/B tests and experiments, you’re retargeting only humans (again not the bots who are eager to), you’re selling to humans who want your digital product (and not to fraudsters testing credit cards), etc.

Does this sound unrealistically attractive? Then you’re seriously have to think of becoming an Oxford BioChronometrics client, because this is how digital marketing is without all these types of digital fraud.

#leadgeneration #frauddetection #digitalmarketing #tcpa #sales #revenue #CMO #CFO #CEO

Discussion about this post

Ready for more?