Dutch Bioinformatics Product showcase
On this page NBIC lists a selection of software and database projects that you should know about. These are projects created by NBIC or in collaboration with NBIC for which we invite anyone to start using them. There is a more complete list of tools that you should check out if you are searching for anything in particular.
The Leiden Open Variation Database provides a flexible, freely available tool for Gene-centered collection and display of DNA variations. LOVD 3.0, the latest version, also provides patient-centered data storage and storage of NGS data, even of variants outside of genes. LOVD, in all its versions, has been installed at hundreds of sites all over the world, in many cases the data in such an installation is maintained and curated by world experts in a specific genetic disease.
MRS, a search engine for biological and medical databanks, is used to search well over a terabyte of indexed text in all the major bioinformatics resources. Included are e.g. EMBL/TrEMBL, Genbank, SwissProt, Refseq, PDB/pdbfinder2, GO, Interpro and PubMed Central.
The MRS software provides the tools to rapidly and reliably download, store, index, and query flat-file databanks. Data stored and indexed by MRS take considerably less space on disk than the raw data, nevertheless the raw data are included in their entirety. The MRS index information is part of the stored data. Public data can be combined with private data by simple concatenation without any computational overhead (Hekkelman M.L., Vriend G. Nucleic Acids Research 2005 33: W766-W769; doi:10.1093/nar/gki422).
Rite is a pilot job framework written in Java, that allows you to submit jobs to various compute resources (e.g. cluster, grid). It consists of a robust pilot job framework client and a server with an integrated MongoDB database. Key features of the system are:
- Robust pilot framework that will retry failed or timed-out jobs
- Recipes describing reusable jobs via json documents or a java API
- On the fly resolution of files through indirection
- Central storage of console output and status of jobs
- Querying of job status and results with the MongoDB's native query language or through the MongoDB web services.
Rite is an open source project released under the GNU Lesser General
Public License version 3 and can be downloaded from the NBIC trac
BreeDB is a relational database which aims to support breeding for quantitative agronomical traits. The database can be explored through a web-based interface, which offers tools to present basic statistical overviews such as box plots, histograms, but also multivariate tools. Graphical genotyping tools are available to show molecular marker data and QTL data in relation to genetic linkage maps. In addition, photos of each accession can be shown together with a detailed report of observations made on this accession.
BreeDB is designed to store data from both inbreeding and out-breeding crop species and the analysis and visualization methods adapt automatically to the type of population on hand. For some features of BreeDB, integration with third party database is required.
ConceptWiki is a universal open access repository of editable concepts. The ConceptWiki features, for each specific concept, an Also Known As table containing identifiers, URI’s (from various ontologies and databases) and the associated lingual-terminological information. Additional information is available by clicking on a "More about this concept" link. The terminology and identifiers are freely downloadable from the ConceptWiki and can be used to as a thesaurus to identify concept-denoting tokens in text and databases. ConceptWiki is a core recognized element in the Identiy Management part of the Open PHACTS project and is further developed with core partners such as Uniprot/NextProt, Chemspider/RSC and NCBO. A very important feature of the ConceptWiki is the separation between Community and Authority. Authorities on certain terminology areas, several of them mentioned above fill the ConceptWiki with approved terminology and mappings, while the community can add addtional symbols used to refer to the same concepts. Users can decide to either include or exclude the community contributions.
WikiPathways is an open, collaborative platform dedicated to the curation of biological pathways that allows for participation by the community. This approach also shifts the bulk of peer review, editorial curation, and maintenance to the community. WikiPathways presents a new model for pathway databases that enhances and complements ongoing efforts, such as KEGG, Reactome and Pathway Commons. Building on the software that powers Wikipedia, a custom graphical pathway editing tool and integrated databases covering major gene, protein, and small-molecule systems were added.
BridgeDB is a software package that can be used to translate between two different sets of database identifiers, or search references by id or symbol. The BridgeDB software consists of two parts. The first part is a (web) service that does the actual translation based on a mapping file, a database of identifiers or other mapping web services. The second part is a (java) library that can be used to extend a variety of software (such as cytoscape) with a generic translation capability. BridgeDB is used for the Identifier Mapping Service (IMS) in the Open PHACTS project.
NBIC Galaxy server
NBIC Galaxy is built based on the Galaxy system developed by Penn State University. BioAssist task forces use this server to build and publish their workflows. This server is maintained as an academic best effort and anyone is welcome to use it.
We try keep this machine as stable as possible, but beware that we can not guarantee to keep your data sets indefinitely, so make sure you keep backups of your precious data. Each registered user is entitled to a disk quota of 10GB, an anonymous user has 10MB disk quota on the system.
Peregrine is a very fast software package used to recognize interesting multi-word terms in human text. Peregrine was originally developed by Martijn Schuemie at the department of Medical Informatics of the Erasmus University Medical Center (EMC) in Rotterdam. The package was the first project in 2009 to be taken up by NBIC's BioAssist Engineering team, who have been preparing the open source release together with the EMC by making the program easy to use, and the code more easy to extend and maintain.
Peregrine can now be found at https://trac.nbic.nl/data-mining/ and downloaded under an AGPL license.
The CMBI provides a series of facilities for protein structure bioinformatics that run parallel to the PDB, the world-wide repository of macromolecular structure information. Each database holds one entry, if possible, for each PDB entry. The components are: DSSP: the secondary structure of the proteins. PDBREPORT: the structure quality and errors. HSSP: a multiple sequence alignment for all proteins. PDBFINDER: easy to parse summaries of the PDB file content, augmented with essentials from the other systems. PDB_REDO: re-refined, and often improved, copies of all structures solved by X-ray. WHY_NOT: summarizes why certain files could not be produced. All these data bases are updated weekly. The data sets can be used for the analysis of properties of protein structures in areas ranging from structural genomics to cancer biology and protein design.
CLI-mate: Galaxy tool generator
CLI-mate is a service to facilitate developers in creating user-friendly interfaces for a command line tool.
In the agile development environment of bioinformatics, many command line tools are created quickly to fill in gaps between complex information processes. A command line interface (CLI) is sometimes sufficient for the task, but it limits adoption by a broader audience. Therefore it's often necessary for the developer to create a wrapper that provides a more user friendly interface. The CLI-mate interface generator makes this easy: it can generate different wrappers: one of them is turning the program into a Galaxy tool.
CLI-mate was developed at the Department of Human Genetics, Leiden University Medical Center (LUMC).
PMID2DOI is a service that provides the conversion between two types of identifiers for scientific publications: the PubMed Identifier (PMID) which is a unique number assigned to PubMed citations of life science journal articles, and the Digital Object Identifier (DOI™) which is used for identifying digital content and is maintained by CrossRef. DOI™’s are used to provide current information, including where the content (or information about it) can be found on the Internet. DOI™’s can be used as part of the provenance information for each nanopublication. Pmid2doi provides SOAP and REST web services available for this conversion. In addition, a SPARQL endpoint can be used to query the conversion system.
Taverna-Galaxy Tool Generator
Galaxy and Taverna are two widely-used tools for combining bioinformatics tools to perform a larger analysis. Taverna is the more sophisticated workflow system, while Galaxy is popular among genomics researchers and used by many bioinformaticians to make scripts available for colleagues. Each has its own strengths. Therefore, we built a generator that constructs a Galaxy tool from a Taverna workflow, enabling it to run seamlessly in Galaxy.
The generator is available for download, and is part of http://myExperiment.org/, a community web site for computational scientists. Here, you can simply download a workflow as a Galaxy tool and install it into a Galaxy server.
Generic Study Capture Framework
The Generic Study Capture Framework (GSCF) [originally developed under the name Nutritional Phenotype Database (DbNP)] helps biologists to interpret the results of biology studies which involve multiple 'omics' techniques. Initially, it was aimed at medium sized nutrigenomics intervention studies, but it is in essence much more generic and is now also used for storing studies from different biology areas, such as environmental plant studies. GSCF can be used to store detailed information about the design of your studies, to link those study designs to actual 'omics' data, and to interpret the measured data along the axes of your study design.
CitedIn is a web service with API to find citations of scientific publications in online public data. CitedIn contains literature citations from a broad selection of online resources, including bibliographic databases (Pubmed, Google Scholar, etc), biomedical databases (Uniprot, Kegg), Wikis (Wikipedia, Wikipathways, Brede Wiki) , social networks (Connotea, CiteULike), or Blogs (Nature Blogs, Google Blogs).
CitedIn is available at http://www.citedin.org/
Warp2D is a tool containing a new algorithm for time alignment of multiple MS spectra, mainly in proteomics. Since it can take quite a long time to run pairwise time alignment on a large set of spectra, NBIC's BioAssist program and the NPC have created a web service that allows users to run warp2d on the life science grid. This is the first tool that is made available using the DAF software under development in BioAssist.
The Warp2D web service is available through the web site of the Netherlands Bioinformatics for Proteomics Platform, NBPP.
CytoscapeRPC is an extension to Cytoscape that allows your own software to use it as a graphical front end for your data visualisation. CytoscapeRPC is used through a standard XML-RPC interface, and can therefore be used from almost any imaginable programming language.
R/GPU is a package that allows programmers to use a GPU (graphical processor in a computer) to speed up bioinformatics analysis using R. It behaves like magic: once R/GPU is installed, your R scripts will automatically use it and achieve much higher speeds. Large matrix multiplications may e.g. run 50x faster.
R/GPU is available as open source project. It is in beta release and can be found on the NBIC project development site.
MOLGENIS is a system that takes a relatively simple description of the kind of information you would like to store, and at the push of a button generates a complete database system with associated web site that allows you to add data to the database and query it.
MOLGENIS is open source software written in Java, and it is available through its own web site.
GPCRDB is an information system for G-protein coupled receptors (GPCRs). It collects, combines, validates, and disseminates large amounts of heterogeneous data. The GPCRDB contains experimental data on sequences, ligand binding constants, mutations, and oligomers, as well as many different types of computationally derived data such as multiple sequence alignments and homology models.
GPCRDB is a web resource that is providing different access methods. The authors are open to collaboration on the data.
StatQuant is an analysis toolbox for quantitative mass spectrometry. It offers a set of statistical tools to process, filter, compare and represent data from several quantitative proteomics software packages such as MSQuant. StatQuant offers the researcher post processing methods to achieve improved confidence on the obtained protein ratios.
StatQuant runs on Windows, Mac and Linux and is available as Open Source from the NBIC project repository.