General and sophisticated invalid traffic filtration methodology
To identify and filter (exclude) invalid click activity — including, but not limited to, non-human click activity and suspected click fraud — Microsoft employs techniques based on identifiers, activity and patterns found in web log data. Microsoft Advertising receives sufficient signals to apply a 100% invalid traffic decision rates on click activity including traffic from both Microsoft and partner properties. However, because user identification and intent cannot always be known or discerned by publishers, advertisers or their respective agents, it is unlikely that all invalid click activity can be identified and excluded from the reported results. Microsoft Advertising engineering will apply updates to the filtration methods below on an as needed basis.
Microsoft Advertising uses the “multiple-click-per-impression method” in which a click is discarded when the time between the click and the previous click on an ad impression or search-result content is less than the repeat-click-refractory period specified by Microsoft. This rule is meant to correct for navigational mistakes such as unintentional double-clicking on an impression.
User frequency caps
Microsoft Advertising limits the number of click events that can be billed per search user for a given period of time. If a user is found to exceed the limits, all activity from the user within the period of time is considered invalid. The definition of a search user is proprietary and varies by context, and the frequency caps employed are also proprietary and not disclosed.
A click is considered invalid if the time between the ad impression (or search result) and the click is more than the impression-staleness window. For example, if a user clicks an ad outside the impression-staleness window, Microsoft Advertising considers the click “out of context” and invalid.
Impression-click refractory period
The click will be filtered as invalid if the time between the ad impression (or search result) and the click is less than a minimum amount called the impression-click refractory window.
Repeat-click refractory period
Rapid repeat-clicks can also signify robotic or non-commercial intent behavior. For a second click to be considered billable, a refractory period — that is, a minimum delay between repeated ad clicks — must be met.
Microsoft Advertising uses machine learning systems to predict the quality of traffic and determine commercial intent. Microsoft Advertising invalidates traffic that fails to meet a minimum level of quality as assessed through these techniques.
Served ad location
Microsoft Advertising might choose to not bill for a click that originates from a source other than the source that requested the ad.
Each click request must have an appropriate type — GET or POST, for example — and response code — 300-series, for example.
Microsoft Advertising leverages different techniques including upfront filtration to detect invalid traffic and exclude it. Upfront filtration techniques may impact up to a third of the traffic and include industry standard techniques like blocklists and allowlists from risky endpoints.
Self-announced prefetch activity
Mozilla-based prefetchers should set an X-MOZ HTTP header to a value of "prefetch." To encourage counting clicks only upon direct user request for the ad and the required ad interaction, Microsoft Advertising blocks prefetch ad requests by using a 403 denied return code.
IAB/ABCe International Spiders & Robots List
The IAB/ABCe International Spiders & Robots List is used to identify and remove known bots.
IAB/ABCe International Known Browsers List
The IAB/ABCe International Known Browsers List (included in the Spiders & Robots List) is used to identify browsers that conform to known browser types. Browsers that are not on this list are filtered.
Internal Microsoft Advertising filter lists
Microsoft Advertising maintains various internal house filter lists, including:
- The House User Agent (“Crawler”) Filter List is a list of user agents that Microsoft Advertising has determined are robotic.
- The House IP Filter List helps track IPs that are identified as demonstrating robotic or unauthorized activity, so that the IPs can be removed from future billing. The IPs on this filter list are monitored to determine whether they should be removed from the list.
- The House Formcode Filter List identifies formcodes that should not be used for billing. Formcodes refer to a parameter that is added to Microsoft-owned webpages FORM=x. One of these formcodes, MONITR, is used to identify an internal test process. Any incoming request with the query string FORM=MONITR is deemed non-billable by Microsoft Advertising.
External filter lists
Microsoft Advertising also uses many external filter lists collected by the Blacklist Capture System.
Defect and data integrity checks
Microsoft Advertising analyzes click and impression records to determine whether they are properly formed. Any records found to be defective are not billed. Note that defective records are not ordinarily reported in customer-facing reports. As a result, more clicks might be observed from an advertiser landing page than are reported by Microsoft Advertising. Here are two examples of defect and data integrity checks:
- Missing listingIDs: A click that has a listingID that does not match any known active campaign is regarded as defective and is filtered as invalid.
- Improper click links: A click that fails certain tests for integrity, which might indicate tampering or some other error, is rejected as defective and is filtered as invalid.
Test traffic is any traffic that uses FORM=MONITR, originates from a Microsoft or Yahoo internal IP address, or is a recognized benign crawler or scraper. Test traffic is not billed.
Microsoft Advertising uses a variety of bots, such as msnptc for page-ad analysis. These bots navigate directly to the advertiser landing page, bypassing the paid links, and do not affect click counts. The search engine index bingbot is an exception to this rule. It navigates through paid links to reach the advertiser page. This bot is critical for ensuring that the advertiser page appears in organic listings as well as paid listings. Activity from this bot is classified as test traffic and is not billed.
Robot instruction files
Microsoft Advertising maintains robots.txt files on critical servers to discourage robotic activity from affecting click counts. However, because internet bots might not comply with the robots.txt file, they might still reach the advertiser's webpage.
- Redirection server: A robots.txt file on its redirection server contains the following: User-agent: * Disallow: /
- This file notifies bots that they should not crawl through the paid links on the search engine results page to reach the advertiser’s site.
- Bing: To discourage inflated impressions, a robots.txt file is maintained on Bing.com to notify bots that they should not request search results.
IP address lookups are used to determine the geographical location of search users. Because of limitations in geographical lookups, geotargeted ads might be served outside the requested geographical area. Also, there might be opportunities to serve the ad within the location, but it is not because of an error in the lookup. For example, the IP might be a proxy located somewhere other than the location of the search user.
Prohibition of bot activity
The use of any unauthorized automated process to access Microsoft Advertising is prohibited. Unauthorized bots that ignore this prohibition and robots.txt protocols might click paid links. Microsoft Advertising makes every effort to filter out this traffic when it occurs.
Sophisticated invalid traffic
Microsoft Advertising has implemented Sophisticated Invalid Traffic (SIVT) detection procedures in line with those defined by the MRC SIVT filtration requirements. Sophisticated invalid traffic consists of more difficult to detect situations that require advanced analytics, multi-point corroboration/coordination, significant human intervention, etc., to analyze and identify. SIVT filtration methods are based on activity-based rules, machine learning, and the results of human investigations.
The amount of data used to train Microsoft Advertising machine learning models differ depending on the nature of the model and data availability. For some machine learning models, 100% of user activities is currently used. For other machine learning models, user activities are sampled. Currently, both approaches are done over a period of 1 week. Microsoft Advertising employs both supervised and unsupervised machine learning models, such as deep neural networks, tree-based models, and more. Systems and processes are in place to ensure machine learning models are performing properly. These rely on human activities including ongoing alert-based investigation, quarterly machine learning model review, and machine learning training label corrections. New machine learning model development occurs as frequently as weekly.
Nature and scope of process and transaction auditing exercised
On a periodic basis, internal and external auditors perform procedures to get comfort over the transactions and processes employed in invalid traffic detection.
Back to top