Can you trust every browser to generate globally unique identifiers at scale? At Teads, we have tried, and the answer is yes, with a few caveats. This article describes the experiments we’ve run and the discoveries we made along the way. By Matthieu Wipliez, senior software engineer @ Teads.
Generating unique identifiers is a common need that third-party scripts integrated on Web pages and e-commerce sites have for analytics, marketing purposes, or advertising.
These scripts are almost always loaded from a CDN (Content Delivery Network) whenever they get used at a big enough scale to get optimal response times and to reduce the load on origin servers.
This means that scripts cannot be generated on-the-fly. A workaround could be (or used to be) to have the CDN generate a unique identifier and store it in a cookie, except that user privacy legislation like GDPR and ePrivacy directives in Europe or the CCPA in the USA prevent cookies to be set until the user has given their unambiguous consent.
The article then deals with:
- Uniquely identifying advertising experiences
- Universally Unique IDentifiers
- Pick your version (4 versions of UUID)
- Let’s generate a UUID in the browser
- Experiments for UUID generation
- Analysis of generated UUIDs
- Collisions
The vast majority of browsers (99.9%) provide the APIs needed to generate random (version 4) UUIDs, either with URL.createObjectURL or crypto.getRandomValues. From what we have seen in the source code of major browsers, the implementation of these functions is of a similar quality to what can be found on servers. It is therefore highly surprising that they generate a significant number of collisions with 5 non-unique identifiers per million.
Upon closer look, the APIs are not at fault, rather these collisions seem to be mainly (92%) due to Googlebot and some other Google-related services. The rest of collisions (8%) are either coming from a fringe browser (PS Vita), automated browser agents (HTML to PDF converters) or are associated with fraudulent activity, most likely because of man-in-the-middle agents/proxies.
What author found initially was that close to 2 requests per thousand carried a duplicate UUID. This is sobering, to say the least. _The theory says that there’s a 50% chance of having one collision if you generate 1 billion UUIDs per second for 85 years. In our case, we will be generating about 1 billion UUIDs per day, so we should be safe for about 7 million years. _. The difference is that we were looking at duplicated requests instead of colliding identifiers. Plenty of charts explaining various concepts and also links to further reading are provided. Excellent read!
[Read More]