GRASS, a novel assembly scaffolder
07 Jun 2012
Sequencing a genome may have become a routine activity, it still remains a serious puzzle to put the millions of small DNA fragments or short reads, back together to form the complete genome. The first step is assembling the short reads into longer sequences, the contigs. Using additional information, such as reference sequences of related organisms, the contigs are then put in the right order with the right orientation and at the right distance in even longer sequences called scaffolds.
In this scaffolding process, researchers are confronted with the Contig Scaffolding Problem (CSP), which simply is about finding the best ordering and orientation of the contigs without violating (too many of) the constraints related to contig order, orientation and distance that are derived from existing data. Several scaffolding algorithms are available, each with their own approach to addressing the CSP. Alexey Gritsenko and colleagues from Delft University of Technology add a new scaffolding algorithm to the repertoire. GRASS (GeneRic ASsembly Scaffolder) tackles the CSP by combining the contig order, distance and orientation in a single optimization objective. Compared to SSPACE, OPERA and MIP, three established scaffolding algorithms, GRASS generated comparable or lower number of scaffolds with higher accuracy.
GRASS source code is freely available: http://code.google.com/p/tud-scaffolding
Gritsenko AA, Nijkamp JF, Reinders MJT and de Ridder D
GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies
Bioinformatics 2012, 28(11):1429-1437, doi:10.1093/bioinformatics/bts175