Many small improvements
22 May 2012
First described in 1993, Nicolaides-Baraitser syndrome (NBS) is rare, severe disorder that causes a variety of symptoms. Most notable characteristics of NBS are sparse hair, typical facial morphology, distal-limb anomalies and intellectual disability. The lack of known familial cases of NBS suggests that the disorder is caused by de novo mutations. Employing exome sequencing in ten individuals with NBS and extended molecular screening in a larger group, Jeroen van Houdt (Catholic University Leuven) and a large group of colleagues from various institutes identified mutations in SMARCA2 as a cause of NBS. SMARCA2 is a member of the family of Snf2 helicase proteins. A nice study, according to Antoine van Kampen who, together with shared first author Barbera van Schaik (Academic Medical Centre, Amsterdam), performed the bioinformatics analysis. "For us, this was the first exome sequencing study. We identified SMARCA2 to be linked to NBS and we were able to start functional studies into this protein."
Genome of the Netherlands
The ever-increasing amount of sequencing data proved helpful, says van Kampen. "In one of the validation steps, we compared our data to those of more than 200 genomes sequenced as part of the Genome of the Netherlands-project. In contrast to the NBS exomes with common SMARCA2 mutations clustered in the ATPase region we found no mutations in this region in the genomes, which supports our findings about the role of SMARCA2 in causing NBS." Although this study did not result in radically new insights from a bioinformatics perspective, projects like these are worthwhile, Van Kampen feels. "Here, the biological question and the biological data are in the lead. We implement and apply a bioinformatics analysis pipeline, which to a large extent is similar to the pipelines used by other groups. Nevertheless, every exome study allows us to further extent and refine the methods used in our pipeline."
Filters
Crucial to accuracy and usability of the analysis results are the filter steps. Starting with filtering out low-quality sequencing data, every step in the pipeline comes down to choosing which data to remove. Van Kampen: "We try to generate a manageable, high quality dataset for the biologists to process further. For bioinformaticians, the challenge is to choose the best filters for each particular question and dataset. What you find depends on how you search. We deliver suggestions for candidate-genes, which the biologists try to validate. Depending on discussions with the biologists and/or the outcome of the validation, we have to adjust the filters and go through the data again. It sometime happens that your filters remove the crucial data."
Imperfections and unknowns
With the outcomes being so dependent on the selection method applied, how reliable is the information generated by such an analysis? Van Kampen: "The bioinformatics workflow is certainly not perfect yet. Chances are that when you apply a different workflow to the same dataset, you come up with different or partially different results. Studies that compare the effect of different components of the workflows could potentially help in identifying the best way to set up an analysis pipeline. However, such studies are scarce." He mentions the lack of a golden standard. "That is a major problem. Work in that direction is ongoing, but we are not there yet." Imperfections in bioinformatics are however not the only problem as sequencing data are also not perfect. Van Kampen: "We are dealing with many unknowns. You're looking for an unknown gene using experimental and analysis methods that are both still subject to continuous development."
Van Houdt JKJ, Nowakowska BA, Sousa SB, van Schaik BDC, et al.
Heterozygous missense mutations in SMARCA2 cause Nicolaides-Baraitser syndrome
Nature Genetics 44, 445-449 (2012); doi:10.1038/ng.1105



