In simple terms, user tracking describes the practice of collecting data about users over time (Kraft, Miller, and Skiera 2021). User tracking gathers data that reveal insights into various characteristics of the user, such as the user’s demographics (e.g., female), interests (e.g., high interest in fashion), brand preferences (e.g., Adidas), or purchase intentions (e.g., being in the market for sports shoes). Publishers and advertisers can use such tracking of a user over time to generate a profile for the user to target him or her with unique advertising or content.
Numerous technologies exist for user tracking; in what follows, we discuss some of the most important ones. For clarity of presentation, in our discussion, we classify the various technologies along two main dimensions:
The number of devices on which the user is tracked: just one device (i.e., single-device tracking) versus multiple devices (i.e., cross-device tracking);
The number of websites on which the user is tracked: a single publisher’s website (i.e., first-party website) versus multiple websites (i.e., third-party websites).
Figure 7 presents this classification.
Single-device user tracking technologies track a user only on one specific device (e.g., on a desktop computer, mobile phone, or tablet). Moreover, in most cases, single-device user tracking technologies only track a user within one browser (e.g., Google Chrome) on that device. The most popular single-device tracking technologies are first- and third-party cookies. Additional technologies include digital fingerprinting, advertising identifiers, local storage, and tracking pixels.
Digital fingerprinting involves gathering information about a user’s device, and exploiting this information to identify the user. Fingerprinting can be either passive or active. Passive fingerprinting involves gathering information about the configuration of a user’s device. Such a configuration has many attributes—e.g., CPU type, computer clock skew, display settings, scripts that are used, browser and operating system information, IP address, or language settings—and a passive fingerprint is essentially a string that contains all of this information. For example, the string “intel:00:00:01:chrome:windows” would be a passive fingerprint that includes CPU type, computer clock skew, browser, and operating system. Because there are so many different ways to configure a device, the specific combination that a particular user has is likely to be unique, thereby providing a means of identifying the user. Still, there is no guarantee that there are no other devices with the same combination of these attributes. Active fingerprints, in turn, are digital fingerprints that include information that is guaranteed to be unique to the user’s device (e.g., the media access control (MAC) address provided by the chipmaker). To get an active and thus unique fingerprint, the publisher or the advertiser interested in tracking the user installs executable code on the user’s device and reads its MAC (or another unique serial number).
A publisher can use active or passive fingerprinting to track a user on its first-party website. Advertisers can use active or passive fingerprinting to follow a user on third-party websites. Advertisers can obtain the information required to generate a fingerprint by displaying an ad on the publisher’s website. When the user accesses the publisher’s website, the user’s browser loads the website’s content from the publisher’s server, and it loads the content of the ad from the advertiser’s server (see Figure 6). When the user’s browser accesses the advertiser’s server to display the ad, the advertiser’s server can generate the digital fingerprint. In addition to tracking a user on a third-party website, advertisers (such as Adidas) can also track a user on their own (first-party) websites (e.g., Adidas.com) using active or passive fingerprinting.
Behavioral biometric features, namely dynamics that occur when typing, moving, and clicking the mouse, or touching a touch screen, can provide further information to improve active digital fingerprinting and, hence, user identification.
Another single-device user tracking technology used on mobile devices (so-called mobile apps) relies on advertising identifiers, called mobile ad IDs (MAIDs). An advertising identifier is a string of hexadecimal digits assigned to a given device by the device’s operating system, e.g., Apple’s iOS or Google’s Android. Apple’s MAID is called Identifier for Advertisers (IDFA), and Google’s MAID is called Google Advertising Identifier (GAID). The identifiers are device-specific. Thus, all ad networks in all apps running on the same device will get the same ID. In mobile browsers, the advertising IDs are not usable. Advertising identifiers are nowadays also used for other connected devices such as for example voice assistants, connected television (CTV), or over-the-top (OTT) devices.
Local storage–based tracking relies on the possibility to store data in the so-called local storage of the user’s browser. Publishers and advertisers can use the local storage to save text-based information such as a unique user ID and other information to track a user’s online behavior. The browser’s local storage is a place to store items that are usually not passed back and forth constantly to publishers’ or advertisers’ servers. Also, first- and third-party websites can access and use local storage to identify a user. The local storage is usually part of the user’s browser and allows publishers and advertisers to save data with up to 5 MB in the user’s browser. There is no expiration date for the data stored. Thus, data items within the local storage are available until the website or the user deletes them. One downside of local storage is that it is not very secure. Therefore, unencrypted private or personal information should not be stored in the local storage.
Tracking pixels are invisible to the user and do not store on a user’s computing device. Accordingly, without inspecting a website’s underlying HTML code, users cannot know whether they are being tracked by a pixel. Tracking pixels can also document how far a user scrolls down a page.
Cross-device user tracking technologies enable a user’s online behavior to be tracked across multiple devices. One means by which firms accomplish cross-device user tracking is by asking a user to log in to a personal account from any device connected to the internet. For example, if a user uses multiple devices—e.g., her mobile phone, her laptop, and her desktop computer—to access a particular website (e.g., her favorite news website, e-mail service, or social networking site), the website can easily and accurately track her activities across all those devices (and across multiple browsers within those devices) on the basis of her login. As will be elaborated in what follows, a login can facilitate cross-device tracking not only on first-party websites but also on third-party websites.
Technically, a user login on a first-party website is accomplished using a single-device user tracking technology such as a cookie. Suppose a user accesses a website through a web browser on her device. In that case, the website can implement the user login by placing a cookie on the device to remember the user in the future. In this case, the cookie enables a so-called automatic login so that the user does not have to reenter her password every time she visits the website. Such a login identifies a user across multiple visits to the same website.
However, firms also use a cookie to keep a user logged in while the user browses multiple webpages during a single visit to a website. Other devices may allow similar tracking tools to enable the website to recognize the device in the future, such as a device-specific advertising identifier on smartphones. The user’s data is then typically stored on the server of the website that provides the login to the user.
Another form of user login that tracks the user across multiple third-party websites is the single sign-on (SSO). Here the user login is forwarded by the provider of the user login to other websites. From the user’s perspective, only one login exists. With this user login, the user can quickly log in to all websites that support the SSO. Examples of SSO providers are Facebook, Google, and the German provider netID. NetID was established in March 2018 as a foundation to offer an independent alternative to the SSO offerings of Google and Facebook (see also Section 9.2).
Table 1 presents a comparison of the user tracking technologies discussed in the previous subsections. We compare the various technologies by the following six criteria:
User Identification: Describes whether a user tracking technology identifies a user on a first-party website (e.g., the publisher’s website) or third-party website (e.g., other publishers’ websites), and whether a user tracking technology identifies a user on a single device (e.g., only on a desktop computer) or on multiple devices (e.g., on a desktop computer and a mobile phone).
Storage of User Identifier: Describes whether a user tracking technology stores a user’s identifier (e.g., a cookie) on the user’s side (i.e., the user’s client, for example, a user’s browser) or on the firm’s side (i.e., the firm’s server).
Storage of Information on User: Describes whether a user tracking trechnology stores a user”s information on the user’s side (i.e., the user’s client, for example, a user’s browser) or on the firm’s side (i.e., the firm’s server).
Expiration of User Identifier: Describes whether a user identifier (e.g., a cookie) expires after some pre-defined date (e.g., after one year of setting the user identifier).
Deletability of User Identifier and Information on User: Describes whether the user can delete the user’s identifier (e.g., a cookie) or the information about the user (e.g., by deleting the user’s browser cache).
Alteration of User Identifier: Describes whether a user can alter the user identifier, for example, by changing the user’s browser configuration (e.g., choosing a different browser font or language) or by changing the user’s login information (e.g., the user’s email address).
Table 1 illustrates that the most favorable user tracking technology from a firm’s perspective is the SSO. An SSO allows a firm to track a user on its own (first-party) website and on other (third-party) websites, as well as across multiple devices (e.g., on the user’s desktop and mobile device). Because the user’s identifier and information are stored on the firm’s side (i.e., the firm’s server), a firm has more control over the identifier and the data collected on a specific user in the past. In addition, the SSO does not expire (unlike cookies, for example). However, the user can delete or alter the SSO, preventing a firm from connecting existing data to new data from the same user.
Despite their advantages, SSOs are difficult for a firm to obtain, as not all firms provide users with sufficient value to justify their signing up for an SSO. In such a case, firms have to rely on other tracking technologies such as cookies, which may also explain cookies’ enduring popularity as a user tracking technology despite their disadvantages.
In this subsection, we discuss the practical applications of the technologies discussed above, from advertisers’ and publishers’ perspectives. Following Kraft, Miller, and Skiera (2021), we distinguish between tracking, profiling, and targeting (Figure 8). Loosely speaking, as noted above, tracking refers to collecting data about users over time (which might include personal data). Profiling involves identifying the data that are valuable for the firm, and using these data to create information about individual users (e.g., characterizing users according to demographic information such as age and gender). This step can enable a firm to distinguish between users that it views as more valuable versus less valuable. Finally, targeting refers to using these profiles to treat some users differently from others. For advertisers, targeting involves selecting profiles of users who are likely to be suitable audiences for a specific ad (e.g., women with kids), or conversely, selecting ads that are likely to be suitable for a specific user. For a publisher, targeting generally involves presenting users with content (e.g., news content for a news publisher) that suits their interests.
We note that herein, we focus on targeting users on the basis of data that have been collected about them through tracking technologies; this form of targeting is referred to as “behavioral targeting” in Figure 9. It contains “retargeting,” also referred to as “remarketing” or “behavioral retargeting” (Lambrecht and Tucker 2013; Bleier and Eisenbeiss 2015; Sahni, Narayanan, and Kalyanam 2019). A typical setting for retargeting is an online shop where a user puts a product into a shopping basket but does not purchase it. The online shop can now inform a retargeting provider such as Criteo about this behavior. The retargeting provider then puts up ads of the online shop and the abandoned product on many other websites. So, the user suddenly observes an ad about the specific product on another website (e.g., an online newspaper) even if this website is unrelated to the online shop (Miller and Skiera 2022).
Figure 9 outlines that “contextual targeting” is the other major form of targeting in online advertising. It uses the context in which the user appears (e.g., viewing a news forum on investment advice) to draw conclusions about the user’s interests and the ads that are likely to be relevant for her. For example, a user reading an article about investment advice might be interested in financial products.
The capacity to accurately target users benefits advertisers in enabling them to avoid wastage, i.e., displaying ads to irrelevant users (users who are unlikely to wish to purchase the advertised products). For example, all else being equal, if an advertiser decreases wastage from 90% to 50%, then for any 10 users viewing the ad, the number of relevant users viewing the ad is expected to increase from 1 to 5. Advertisers are willing to pay for such a decrease in wastage: In our example, since the advertiser becomes five times more likely to reach a relevant user, she might be willing to pay five times more for the ad.
A common prerequisite for being willing to pay more for an ad is the ability to measure the success of an ad, and thereby to confirm that the ad is indeed reaching a relevant audience (in our example, this would mean confirming that the number of relevant users has increased from 1 to 5). Many success measures exist, with the most common being the following:
users’ probability of clicking on the ad, referred to as the “click-through rate” (i.e., the number of clicks divided by the number of impressions of the ad);
users’ probability of converting, referred to as the “conversion rate” (i.e., the number of conversions divided by the number of clicks on the ad); in many cases, a conversion is defined as a purchase, but the term can also refer to a wide range of other actions that benefit the advertiser, such as subscribing to an online newsletter or signing up for a product demo;
the product of the click-through rate and the conversion rate.
Advertisers’ use of the success metrics above is not contingent on user tracking and profiling. That is, advertisers can use these metrics to compare the success rates of different ads, or of the same ad across different contexts—even without possessing knowledge of the specific behavior of individual users. Consider, for example, an advertiser who displays two ads for the same product, ad X and ad Y. If the advertiser observes that 50% of users who viewed ad X clicked on it, whereas 0% of users who viewed ad Y clicked on it, then she can determine that ad X was more successful than ad Y, even without knowing which specific users clicked on each ad.
Yet, information about individual behavior—obtained through the tracking technologies discussed above—enables the advertiser to analyze ad success on a more granular level. Going back to our example, let us assume that the advertiser can observe that ad X was viewed by user A—a male—and by user B—a female, and that user A clicked on the ad, whereas user B did not. In turn, ad Y was viewed by users C and D—both female—neither of whom clicked on the ad. This information might suggest to the advertiser that females are a less relevant audience for the advertised product, and that ad X was only more successful than ad Y because it was also shown to males.
The more detailed the information at the advertiser’s disposal, the greater the capacity of the advertiser to link users’ characteristics—e.g., demographics and interests such as being female and interested in running shoes —to their reactions towards the ad (e.g., their likelihood of clicking the ad). After establishing these links, the advertiser can derive the characteristics of users who are most likely to click, and subsequently target those users, i.e., ensure that ads are displayed only to them. For example, on the basis of user responses, an advertiser selling a protein shake may determine that its target audience is male users between the ages of 30 and 40 who are interested in sports. The advertiser can then ensure that she displays her ad only to users whose profiles match those characteristics. However, it is important to note that though user profiles may contain an enormous amount of information, this information is not always accurate or consistent (Neumann, Tucker, and Whitfield 2019; Kraft, Miller, and Skiera 2021). Erroneous profiles decrease advertisers’ success with targeted ads and, thus, their willingness to pay for ads.
In addition to facilitating user targeting, tracking can enable advertisers to ensure that the same user is not exposed to the same ad too many times. Limitation of exposure can be achieved either through “frequency capping,” i.e., limiting the number of times a user sees a particular ad, or through “recency capping,” i.e., making sure that a minimum amount of time has elapsed since the user last saw the ad. Such capping could save the advertiser money and might also avoid annoying the user too much with the ad.
Finally, tracking enables the advertiser to conduct attribution modeling. It determines how much value each of several actions (also referred to as events or touchpoints) contributes to the desired outcome. Suppose, for example, that the user clicked on two ads and then purchased a product. The question is then whether both ads contributed equally to that purchase (so an attribution of 50% each) or not.
Publishers also have an interest in tracking, profiling, and targeting. First, a publisher may offer a wide range of content, with different levels of appeal for each user. In these cases, the publisher may want to present each user with the content that is most suited to the user’s interests. For example, a news website could prioritize displaying news about the user’s favorite sports team or show the weather forecast for the particular area where the user lives. Profiling users can enable publishers to personalize their content in this manner.
Second, a publisher can track users to observe what they are doing on the website, and then use this knowledge for various purposes—such as improving the website. For example, user behavior might lead a publisher to make changes to the user interface (e.g., the publisher observes that users often leave the website on a particular page and then realizes that links were missing from the page), to the content of the website (e.g., by recognizing that certain topics of news articles are more attractive than others) or the presentation of the content (e.g., more pictures versus more text and vice versa). The improved website could then attract more users.
Third, user tracking enables publishers to document their websites’ reach. While it is possible to measure a website’s overall number of page impressions without tracking individual users, tracking is necessary in order to measure the number of unique (or different) users who visit a website—for the simple reason that such measurement requires observing whether a given user has visited the website before.
Advertisers often prefer publishers with an extensive reach. Accordingly, publishers are interested in reporting a high reach, and advertisers fear that publishers might over-report their reach. To avoid a lack of trust, publishers often ask a (trusted) third party to conduct the measurement, such as AGOF in Germany, which relies on information provided by INFOnline. AGOF also provides examples of their reports on their website (e.g., at www.agof.de/studien/daily-digital-facts/monatsberichte/, however, only in German).
The fourth benefit of tracking relates to the fact that the price that a publisher realizes from an ad impression is a function of the advertiser’s willingness to pay (WTP) to display an ad to a particular user on the publisher’s website. As discussed above, advertisers value the capacity to target specific users; thus, information that the publisher obtains about the user from tracking can, in theory, increase or decrease ad prices (e.g., Board 2009, Chen and Stallaert 2014, Levin and Milgrom 2010).
The following example illustrates how information gathered about a user can influence the ad prices that a publisher commands. Suppose there are two advertisers: Advertiser A, a producer of sports cars, and advertiser B, a producer of SUVs. Both advertisers prefer to target male users over female users. The advertisers are informed that an ad impression is available on a car review website. If the advertisers are informed that the user who will view the ad impression is male—as opposed to being provided with no information about the user’s gender—their WTP may increase, leading to higher bids in the auction. As a result, the final price that the publisher receives for this impression can increase.
Yet, the influence of individual information on ad prices is not always straightforward. Suppose that, in addition, to determining that the user is male, the publisher has also determined that the user is interested in sports cars. This information might increase the WTP of advertiser A even more—but the WTP of advertiser B could decrease. In this case, the auction could lead to a lower price than in the situation where both advertisers only had information about the user’s gender. Thus, theoretically, information about a user can increase or decrease ad prices. Empirical evidence suggests, however, that, on average, more information leads to higher ad prices (Johnson, Shriver, and Du 2020; Laub, Miller, Skiera 2021).
There is little doubt that tracking, profiling and targeting benefit advertisers and publishers. It is, however, much more challenging to evaluate how users are affected by such tracking. In particular, users face a trade-off between certain benefits—such as utility from a more personalized browsing experience—and drawbacks, particularly a loss in privacy. In what follows, we elaborate on these benefits and drawbacks.
Personalization of content on publishers’ websites—such as news feeds on social networks or product recommendations on online shopping or streaming platforms—can increase users’ utility by providing them with experiences more directly related to their interests (Celis et al. 2019). It is important to acknowledge, however, that in providing personalized content, a publisher may seek to enhance not only the user’s utility but also its own profit. Though these two perspectives may be aligned—because it is hard to make a profit by exposing a user to products or content that she is not interested in—they can still differ. For example, if product A provides slightly more utility to a user than product B, but the firm makes much more profit on product B, the firm is tempted to recommend product B.
Content personalization may also have certain disadvantages from the user’s perspective. because an algorithmic recommendation can result in a “filter bubble,” also referred to as an “echo chamber.” In a filter bubble, a user is exposed only to content that is aligned with the user’s own cultural background or ideology—a situation that is, at least in the long-run, in the interest of neither the user nor society. Furthermore, personalized content also discloses the preferences of the user, which the user might not appreciate.
In addition to seeking to persuade users, advertising serves an informative function—for example, making users aware of products they may wish to purchase. Consequently, personalized (targeted) advertising may benefit users in providing them with information that is relevant to them, thereby helping them make better purchase decisions (for additional information on how users benefit from personalized ads, see a review by Boerman, Kruikemeier, and Borgesius 2017).
Personalized advertising can also benefit users in more indirect ways. The most obvious is that better targeting of ads enables the publisher to command higher prices for ads—and this revenue may enable the publisher to provide more content for “free,” i.e., without charging a fee to users. Notably, as discussed in Section 2.1, such content is not actually “free,” as users “pay” for it with their data and with the attention they devote to ads. Still, some users (e.g., those with little income) may prefer this mode of payment instead of paying with actual money.
From the user’s perspective, the major drawback of tracking is a violation of privacy. In general, surveys consistently suggest that users are uncomfortable with the idea of firms tracking their behavior over time and building up profiles. For example, a survey in 2019 showed that 79% of users worldwide are concerned about how firms use their data (Pew Research Center 2019). In another survey, 40% of users indicated feeling they have no control over their data (Presthus and Sørum 2018), partly due to an inability to oversee which data firms collect.
These reports notwithstanding, it is complex to evaluate users’ actual preferences with regard to their privacy, because there is a significant disparity between users’ stated preferences concerning privacy and the actual steps they take to protect it. This disparity is referred to as the “privacy paradox” (Aguirre et al. 2015, Norberg, Horne, and Horne 2007, Beke, Eggers, and Verhoef 2018). Usually, stated preferences reflect a desire for high privacy levels, whereas revealed preferences reflect an acceptance of substantially lower levels. For example, Gross and Acquisti (2005) scraped users’ real-world social media privacy settings and found that few (1.2%) had altered permissive default settings. Athey, Catalini, and Tucker (2017) also documented a large gap between stated and revealed preferences for privacy.
Regardless of what users actually prefer, and as we will see below, regulators have concluded that a user’s loss of privacy outweighs the potential benefits that come with better personalization of content and ads.
The main takeaways from Section 3 are:
Cookies are a popular tracking technology, but they only track a user’s behavior on one device and in one browser. Furthermore, users can easily delete cookies or prevent that firms set cookies.
Single-Sign-On (SSO) is the most favorable user tracking technology from a firm’s perspective. However, not all firms provide users with sufficient value to justify signing up for an SSO service. This shortcoming of SSOs may explain the enduring popularity of other tracking technologies, such as cookies.
Advertisers and publishers benefit from tracking, profiling, and the resulting ability to target users with personalized content or ads. Tracking also enables advertisers to measure the success of advertising.
It is challenging to evaluate whether users benefit from tracking, profiling and targeting. It requires trading-off between certain benefits—such as utility from a more personalized browsing experience—and drawbacks, particularly a loss in privacy.
Measuring users’ preference for privacy is tricky, because most users say that privacy is important—but their actions reveal a far more lenient stance toward privacy, a phenomenon called the “privacy paradox.”