Randomization in Online Experiments

Most scientists consider randomized experiments to be the best method available to establish causality. On the Internet, during the past twenty-five years, randomized experiments have become common, often referred to as A/B testing. For practical reasons, much A/B testing does not use pseudo-random number generators to implement randomization. Instead, hash functions are used to transform the distribution of identifiers of experimental units into a uniform distribution. Using two large, industry data sets, I demonstrate that the success of hash-based quasi-randomization strategies depends greatly on the hash function used: MD5 yielded good results, while SHA512 yielded less impressive ones.

Data and Resources


Golyaev, Konstantin (2018): Randomization in Online Experiments. Version: 1. Journal of Economics and Statistics. Dataset. http://dx.doi.org/10.15456/jbnst.2018192.235844