Introduction
The following datasets were used on the article “A Framework for Genomic Sequencing on Clusters of Multicore and Manycore Processors” submitted to the International Journal of High Performance Computing Applications.
DNA dataset
D40M | Single-end dataset consisting of 40 million reads of 100 nts, generated with the dwgsim simulator from the SAMtools. The mutation rate was set to 0.1%, with 10% of these being indels. |
---|
RNA datasets
R10M0.1 | Single-end dataset consisting of 10 million reads of 100 nts, generated with the beers simulator. The mutation rate was set to 0.1%. The indel frequency was fixed to the default value (0.05%). |
---|---|
R10M2.0 | Single-end dataset consisting of 10 million reads of 100 nts, generated with the beers simulator. The mutation rate was set to 2.0%. The indel frequency was fixed to the default value (0.05%). |
R80M0.1 | Single-end dataset consisting of 80 million reads of 100 nts, generated with the beers simulator. The mutation rate was set to 0.1%. The indel frequency was fixed to the default value (0.05%). |