Web measurement studies can shed light on not yet fully understood phenomena and thus are essential for analyzing how the modern Web works. This often requires building new and adjustinng existing crawling setups, which has led to a wide variety of analysis tools for different (but related) aspects. If these efforts are not sufficiently documented, the reproducibility and replicability of the measurements may suffer—two properties that are crucial to sustainable research. In this paper, we survey 117 recent research papers to derive best practices for Web-based measurement studies and specify criteria that need to be met in practice. When applying these criteria to the surveyed papers, we find that the experimental setup and other aspects essential to reproducing and replicating results are often missing. We underline the criticality of this finding by performing a large-scale Web measurement study on 4.5 million pages with 24 different measurement setups to demonstrate the influence of the individual criteria. Our experiments show that slight differences in the experimental setup directly affect the overall results and must be documented accurately and carefully.
Third-party tracking is a common and broadly used technique on the Web. Different defense mechanisms have emerged to counter these practices (e.g. browser vendors that ban all third-party cookies). However, these countermeasures only target third-party trackers and ignore the first party because the narrative is that such monitoring is mostly used to improve the utilized service (e.g. analytical services). In this paper, we present a large-scale measurement study that analyzes tracking performed by the first party but utilized by a third party to circumvent standard tracking preventing techniques. We visit the top 15,000 websites to analyze first-party cookies used to track users and a technique called “DNS CNAME cloaking”, which can be used by a third party to place first-party cookies. Using this data, we show that 76% of sites effectively utilize such tracking techniques. In a long-running analysis, we show that the usage of such cookies increased by more than 50% over 2021.