In 2024 "Can a bot fake .... ?" -- FAQ

Jul 18, 2024

In 2024 "Can a bot fake .... ?" -- FAQ
The typical flow of a conversation during the first call with a new customer having fraud problems can be boiled down to this:
Client: “What are bots capable of in 2024?”
Me: “Quite a lot! With a big laugh.”
Client: “Haha, but... realistically, what exactly?”.
Whenever I talk to new clients the conversation will have this sequence of questions. That’s why I made this FAQ from a technical point of view. What can bots fake, spoof, manipulate, alter, etc. in order to appear as a real genuine browser and a genuine human sitting behind the screen.
After the initial questions made by the marketers, the tech people jump in. They ask the more difficult questions which are very specific and based on research they did to solve their problem or research prior to the call. I have collected and combined the most asked question into a FAQ.
This article contains in-depth and detailed answers to 10 questions about bots, you’ve always wanted an answer to. It has been written for marketing professionals looking for ad-fraud and/or lead generation-fraud solutions where this FAQ can be used as a guideline in order to ask these vendors: "How does your solution detect a bot faking ... ?"-FAQ

1. Are bots able to fake the domain name of the website they are accessing?

The domain name can be read from the browser using JavaScript by reading the values window.location.origin, window.location.host, window.location.hostname and window.location.href properties [1]. Browser automation, ie. advanced bots, are capable to load webpages and execute JavaScript. These bots will remote control the browser in all its glory. This is achieved by starting the browser and controlling each tab using the CDP protocol (Chrome Dev Tools), giving instructions to click, scroll, type, etc. This works for Chromium based browsers, eg. Chrome, MS Edge, etc. but also Firefox and Safari have similar protocols, though less common.
The example in figure 1 is a post-bidding example. But, in pre-bidding the browser will send one or more bid-requests for an advertisement. This request is again fired from the same browser and contains plain text, which can be faked to anything the bot wants.

Financial Times website with Chrome Developer Tools opened. The window.location object in the browser contains the URLs which currently are shown in the tab — Figure 1. The window.location object in the browser contains the URLs which currently are shown in the tab

So, can bots fake the domain they are on? Yes, they can both pre-bid and post-bid.

2. Are bots able to fake the referrer URL?

The referrer URL informs 3rd party web sites which URL is visited when a resource is requested. For example, if you are browsing to www . usatoday . com and a JavaScript is loaded from a different domain, eg. static.adsafeprotected.com, the referrer header is set. Figure 2 shows the referrer header and its value. These values can be changed by bots by intercepting the network traffic within the browser, overriding the value to whatever they want, and send the modified traffic.

The browser requesting the JavaScript https://static.adsafeprotected.com/iasPET.1.js from https://www.usatoday.com will set the Referer HTTP header to https://www.usatoday.com/ — Figure 2. The browser requesting the JavaScript

3. Are bots able to fake the Mobile App name?

Advertisements in Mobile Apps are displayed using a web browser that is embedded within the App. This embedded browser is called WebView. Within this webview an advertising platform loads and refreshing ads based on your cookies, geo-location, etc. For example, Google’s Admob is such a platform [2][3], InMobi, AppLovin, Glispa, Amobee are prominent alternative platforms.

Figure 3. Screenshot of Charles proxy of the communication between the FT App installed on a real Android device and Google in order to display an Ad embedded within the App

When an App accesses a website using WebView the appname is conveyed in the HTTP header x-requested-with. Figure 3 shows the communication captured with a MITM (man in the middle) proxy between the FT App installed on a real Android phone and Google. The highlighted blue line shows the x-requested-with HTTP header. This is of course only a single request made from the App. Another interesting fact which can be seen in Figure 3 is the User Agent (UA). Apps have full control over the UA and in this case you’ll see the Appname and full version are included in the UA as well.

4. Can bots fake utm_ parameters and other forms of link decoration?

Querystring parameters like utm_source, utm_campaign, gclid and/or fbclid are technically nothing more than a ampersand separated string appended to the GET request. When bots load the advertisement in the browser, they already know what the target link will be. They can simply change the parameters upfront, and click and let the browser do its work. Another method is to dynamically change the parameters using request interception. If a bot already uses this technique to change HTTP headers, referrer it is fairly easy to add some rules to rewrite destination URLs. So, yes they can. Easy!

Figure 4. The querystring part of a request is given in blue after the question mark ‘?’ in the request URL

5. Can bots fake cookies in the browser?

Browsing to a website means that all cookies in the browser associated with that URL are sent along with the request. Figure 5 shows how this looks at the website

Figure 5. The cookies sent along with the request when browsing to https://ft.com

So, can bots fake cookies? Yes, they can and will warm browsing session in order to maximize their profit.

6. Can bots fake the fingerprint of the browser?

Using CDP (Chrome Devtools Protocol) any property or value in the browser can be overridden. This enables fraudsters to change values like: Screen resolution, keyboard language, available plugins, time zone, webGL vendor and renderer, etc. Combining these property values (and many others) are called the browser's fingerprint. This also means if a value changes, the fingerprint changes. CreepJS is an open source tool to calculate your browser's fingerprint. When developing a bot creepJS is typically your litmus test.

Figure 6. CreepJS is one of the most extensive open source tool to calculate your browser fingerprint

In addition to fingerprint a series of static values, it is also possible to fingerprint responses to challenges. For example, WebGL shapes are drawn and because different OSes, browsers, videocards and its drivers may use different anti-alias methods, have different IEEE754 floating point implementations, use different rounding modes, etc. the color values of individual pixels may thus differ per video card type [4][5].

7. Can bots fake the TLS fingerprint of the browser?

TLS fingerprints are generated server side. It is based on the client-server handshake prior to the encryption of the communication. In this handshake the browser sends: Hello, I support these encryption cipher suites. The server answers with the selected cipher and key (simplified) [7]. Different browsers on different OSes support different cipher suites. This is most relevant for request based bots as by default their fingerprint does not resemble any browser [23]. That’s why request based bots will have to use special clients and tools. For example curl-cffi (see Figure 7), curl-impersonate, AzureTLS and CycleTLS [8][9][10].

Figure 7. Curl-cffi enables you to send requests to web servers from Python code emulating the TLS handshake of common web browsers

Browser based bots need to connect to a proxy in order to change the fingerprint. In such a setup a proxy server will setup the secure connection to the web server with the publisher site and/or landing page. The proxy will forward the requests on behalf of the automated browser. In this case the web site (and its fraud detection) will fingerprint the proxy server's requests.

8. Can bots fake (prevent) WebRTC from leaking your real IP address?

WebRTC (Web Real-Time Communication) is the technology that enables videoconferencing from a browser [11]. WebRTC uses point to point communication bypassing proxy servers configured in the browser. In fraud detection WebRTC can be used to detect the real internet facing IP address of the client, even if the client is using a proxy or VPN. The detection can be split in two parts: Capturing the IP address server side and extracting the local IP address(es) at the client using JavaScript.

WebRTC Server side

In order to detect bots and fraudsters using residential proxies anti-bot detection companies have setup their own WebRTC infrastructure. Cheap and low quality VPN clients allow anti-bot and fraud detection companies to extract the true external IP address the bot or fraudster uses. Premium quality (non-free) VPN and proxy software typically don’t have this issue.

Browserleaks.com screenshot made while using a VPN client (ProtonVPN). The IP addresses shown are the addresses of the VPN endpoint — Figure 8.

Figure 8 shows a screenshot of the browserleaks’ WebRTC test page [13]. Both the IPv4 and IPv6 addresses in the screenshot are located in New York, United States. I made this screenshot in the Netherlands, so my local IP address did not leak while I was using ProtonVPN. Other VPN clients or residential proxy services may have different results.

WebRTC Client Side

In 2015 Daniel Roesler exposed a WebRTC vulnerability on his github page [12], see also Figure 9. This vulnerability enables code running at the client to know its external IP address, even if the client is part of a local infrastructure with local addresses, eg. a corporate network or your home network, behind a firewall.

Figure 9. Screenshot from Daniel Roesler's github that explains what Javascript code can do to determine your local (ISP facing) IP address

Depending on your network configuration the JavaScript code on the github will be able to extract your local IP addresses of your device. In case of IPv4 the code will typically extract your internal network IP address, which is in most cases is a NAT (Network Address Translation) address, eg. 192.168.x.x or 10.x.x.x. However, with IPv6 the need for NAT disappeared. That means in many cases this technique reveals your true IPv6 address, and thus your true location, even though you are connected through a VPN, and behind a router/ firewall.

9. Can bots solve CAPTCHAs?

With the rise of AI you would expect that bots will be able to solve all CAPTCHAs automatically. That is correct up to a certain degree [14][15]. Images containing text, or simple image recognition is achievable with high accuracy. Figure 10 shows examples of CAPCHAs which can be solved automatically.

Figure 10. Text CAPTCHAs do not deter bots. This type of CAPTCHA can be solved automatically as bots are able to read using OCR (optical character reading)

More recent CAPTCHAs have become puzzles based on knowledge, where you need to have some subject knowledge in order to solve the CAPTCHA. Sometimes they even resemble an IQ test. Figure 11 contains a few example CAPTCHAs that require general knowledge, eg. animals that lay eggs, the usage of objects eg. vehicles on paved roads, and/or the monetary value of goods.

Examples of CAPTCHAs that are not based on OCR, but require a deeper level of interpretation and knowledge to solve — Figure 11. Examples of CAPTCHAs that aren't based on OCR, but require a deeper level of interpretation and knowledge to solve

Figure 12. Animated examples of the sliding CAPTCHA.

10. Can bots fake human interactions like mouse movements, clicks, scrolls and/or touches ?

When a browser is controlled by browser automation software it is able to move the mouse to new locations. In CDP (Chrome Devtools Protocol) mouse movements, clicks and scrolls are controlled by dispatchMouseEvent [16]. This enables a developper to fully control the mouse and its buttons and wheels. The same accounts for touch events by using dispatchTouchEvent, which can be used to emulate mobile behavor [17].

Figure 13. More complex mouse movements can be simulated with bezier curves or b-splines. The simulated mouse movements shown in this figure are made by mouse synthesizer[19]

In order to simulate human behavior humanlike mouse paths must be generated instead of straight lines. This can be achieved using b-splines [18] or bezier curves. The software is able to generates a series of X,Y points based on a starting and an end-point based on the coordinates of elements in a page. The second step is to calculates a spline curve and timestamps how fast the mouse should move from the starting coordinate to the destination coordinate and at what time resolution. This technique enables fraudsters to perform humanlike mouse movements. This is exactly what mouse synthesizer [19] (see Figure 13) and ghost cursor do [20]. But, don't worry, of course this can be detected as no human is able to make perfect round curves using a mouse.

Conclusion

You might think why didn’t you write something about blacklisting IP addresses? If you’re able to create a sophisticated bot able to buy #taylorswift tickets then you’ll KNOW how to spoof, fake and/or emulate browser functionality. You are well aware that you have to use residential proxies, which means IP blacklisting will cause false positives. Filtering on IP addresses only works to exclude traffic from outside your country, but don’t forget if you’re in the US to include overseas territories like: Guam, American Samoa, Virgin Islands, etc.

You are aware that bots are able to fake (almost) anything. They can’t fake dynamic WebGL / GPU challenges but they will poison these challenges with noise in order to hide their headless appearance. CAPTCHAs do work to a certain degree, but they do annoy humans and thus cause friction to their journey.

So, can these sophisticated bots be detected? Sure they can. Once you know how bots override properties, fake answers, and hide their true appearance you know where and what to look for. Lastly, browser automation will cause “browser automation”-leakage. Spotting these leakages and traces of automation will reveal the true nature of the bot accessing your website.

Questions? Corrections? Remarks? Need help with bots and fraud? Feel free to connect, comment or DM

#adfraud #bots #CMO #digitalmarketing #browserautomation #clickfraud
[1] https://developer.mozilla.org/en-US/docs/Web/API/Location
[2] https://developers.google.com/admob/ios/browser/webview/api-for-ads
[3] https://developers.google.com/admob/android/browser/webview
[4] https://elie.net/static/files/picasso-lightweight-device-class-fingerprinting-for-web-clients/picasso-lightweight-device-class-fingerprinting-for-web-clients-paper.pdf
[5] https://cdn.elie.net/static/files/picasso-lightweight-device-class-fingerprinting-for-web-clients/picasso-lightweight-device-class-fingerprinting-for-web-clients-slides.pdf
[6]
https://privacybadger.org/
[7] https://en.wikipedia.org/wiki/Cipher_suite#TLS_1.0%E2%80%931.2_handshake
[8] https://github.com/lwthiker/curl-impersonate
[9] https://github.com/Danny-Dasilva/CycleTLS
[10] https://github.com/Noooste/azuretls-client
[11] https://en.wikipedia.org/wiki/WebRTC
[12] https://github.com/diafygi/webrtc-ips
[13] https://browserleaks.com/webrtc
[14] https://arxiv.org/abs/2307.12108
[15] https://arxiv.org/abs/2307.10239
[16] https://chromedevtools.github.io/devtools-protocol/tot/Input/#method-dispatchMouseEvent
[17] https://chromedevtools.github.io/devtools-protocol/tot/Input/#method-dispatchTouchEvent
[18] https://en.wikipedia.org/wiki/B-spline
[19] https://github.com/MIMIC-LOGICS/Mouse-Synthesizer/tree/main
[20] https://github.com/Xetera/ghost-cursor
[21] https://www.mimic.sbs/antibot/On-Anti-Bot-Biometric-Protections.md/
[23] https://www.linkedin.com/posts/kouwenhovensander_taylorswift-adfraud-adfraud-activity-7199384350369431552-t8EZ
[24] https://github.com/salesforce/ja3
[25] https://chromestatus.com/feature/5124606246518784
[26] https://github.com/yifeikong/curl_cffi