A model of genome length estimation based on k-mers detection

Mateusz Garbulowski, Andrzej Polański


The genome length estimation at raw sequencing data level gives a practical knowledge about size of the DNA sequence at early stage of analysis. In our research, we created a model based on random sampling of k-mer (very short DNA fragments), that we used to predict genome size. Furthermore, we made the comparison of model results with empirical whole-genome sequencing data.


genome length estimation; genome size; sequencing model

DOI: http://dx.doi.org/10.21936/si2015_v36.n4.739