IJHPCA 2016 datasets

Introduction

The following datasets were used on the article “A Framework for Genomic Sequencing on Clusters of Multicore and Manycore Processors” submitted to the International Journal of High Performance Computing Applications.

DNA dataset

D40M Single-end dataset consisting of 40 million reads of 100 nts, generated with the dwgsim simulator from the SAMtools. The mutation rate was set to 0.1%, with 10% of these being indels.

RNA datasets

R10M0.1 Single-end dataset consisting of 10 million reads of 100 nts, generated with the beers simulator. The mutation rate was set to 0.1%. The indel frequency was fixed to the default value (0.05%).
R10M2.0 Single-end dataset consisting of 10 million reads of 100 nts, generated with the beers simulator. The mutation rate was set to 2.0%. The indel frequency was fixed to the default value (0.05%).
R80M0.1 Single-end dataset consisting of 80 million reads of 100 nts, generated with the beers simulator. The mutation rate was set to 0.1%. The indel frequency was fixed to the default value (0.05%).