View on GitHub

sample

Performs reservoir sampling on very large input files delimited by newline characters, minimizing memory usage by storing byte offsets to elements

Download this project as a .zip file Download this project as a tar.gz file

sample

Performs reservoir sampling (Vitter, "Random sampling with a reservoir"; cf. http://dx.doi.org/10.1145/3147.3165) on very large text files that are delimited by newline characters. Sampling can be done with or without replacement. The approach used in this application reduces memory usage by storing a pool of 8-byte offsets to the start of each line, instead of the line elements themselves.