Changes between Version 14 and Version 15 of ProjectOverview


Ignore:
Timestamp:
Sep 5, 2010 8:50:07 PM (14 years ago)
Author:
Morris Swertz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ProjectOverview

    v14 v15  
    1313
    1414All these resources will be made publically available both as __centralized, secured, web accessible national services__, i.e. central hubs assembled in partnership to support the rainbow projects, as well as __downloadable and customizable ‘tools-in a-box’__ meant for local installation by biobanks and their local projects (local hubs). This project will develop in parallel the scientific, professional and physical infrastructures needed to effectively communicate expertise, procedures and tools between all Dutch biobanks as well as the provision of bioinformatics experts building on the infrastructure organized in the Netherlands Bioinformatics Center (NBIC) !BioAssist program. This group will work in coordination with the BBMRI-NL ethical-legal working group to develop a code of practice and guidelines for large scale harmonized data pooling and for the use of data from multiple biobanks.
    15 == Approach ==
    16 This project will combine a hub-and-spoke research & development organization to harmonize data between biobanks together with the provision of experts who will provide innovative model-driven software methods to efficiently produce ready-to-use software infrastructures needed by biologists and researchers. This includes:
    17 === Agile hub-and-spoke organization ===
    18 At the core of BBMRI there is the vision to develop all resources in a hub and spokes manner such that we maximize use of local expertise and innovation and minimize duplicated efforts and barriers to integration via centralized harmonization and enrichment. The smallest hubs within the Dutch biobank landscape are the individual biobanks, the larger hubs the participating institutes, and the largest hubs are central deployment of key data and analysis resources (which again can connect to pan-European hubs).  This project will mirror this organization to bridge between biomedical researchers, bioinformaticians and hardcore software engineers to ensure the multi-disciplinary interplay needed:
    19 
    20  * A __central engineering team of hardcore programmers__ is responsible for the overarching infrastructure and will ensure harmonization of tools, pipelines and databases between working groups. This group will function as one of the eight NBIC task forces and will meet every week to ensure knowledge and method transfer.
    21 
    22  * __Participating experts will host programmers and scientific staff to pilot the planned tools and pipelines__ in close support to (their) BBMRI-NL complementation and rainbow projects. These bioinformaticians will be organized in themed working groups as described in appendix 1. Each working group will have a lead programmer that is part of the central engineering team. All members will meet monthly and will have weekly Skype meetings.
    23 
    24  * __This project is strongly linked with leading international sister projects to avoid duplicated efforts __and efficiently achieve these aims by having project members participating in, or staying at, institutes like European Bioinformatics Institute (1KG, EGA, !ArrayExpress), Netherlands Bioinformatics Center (NGS, eScience, CWA), projects like EU-GEN2PHEN, EU-BIOSHARE, OMII-UK, ESFRI/ELIXIR, Parelsnoer, Mondriaan, CTMM, TIFN, NPC, NMC, P3G, Human Variome Project and open source collaborations like !ObiBa, MOLGENIS/XGAP, ABEL and Concept Web Alliance.
    25 === Model driven software ===
    26 Flexible model driven software development as described in Swertz & Jansen (2007) has proven to be an efficient method to rapidly produce harmonized software infrastructures for life scientists while sharing the best models, software and tools notwithstanding large variation in research aims. This project will build and extend upon open source implementations of these methods such as MOLGENIS and Galaxy focusing on:
    27 
    28  * Implementing __extensible standard data models and software components__ developed internationally (we co-piloted data models for microarrays, QTLs, GWAS studies [Swertz 2010], and phenotypes in EU consortia like GEN2PHEN and EBI and participated in international GWAS and sequencing initiatives like the 1KG project).
    29 
    30  * Making tools and protocols reusable in __a user-friendly catalog of bioinformatics tools and workflows __that captures all necessary inputs, outputs, optimization properties and user interactions in models to automatically incorporate existing tools (building or inspired on Taverna and Galaxy).
    31 
    32  * __Generating automatically from these data and tool models the scalable back-ends and front-ends needed__. This automatic procedure ensures harmonized software results building on industry standard databases for metadata and innovative approaches like cloud computing activities at SARA/Amsterdam, CIT/Groningen and BigGRID/Rotterdam to connect to the scalable compute power and storage needed.
    33 
    34  * __Ease finding and integration of resources using semantic and ontology technologies__ such as developed at EBI and NBIC/Concept Web Alliance to build bridges between data and tools, tapping into existing ontologies for data (e.g. HPO for human phenotype ontology) and for analysis protocols to help user and systems developers to bring tools together.
    35 === Ready-to-use databases and tools ‘in-a-box’ that can federate into national resources ===
    36 As detailed below in the description of work  section, this project aims to develop novel or incorporate internationally proven key bioinformatics tools, databases, models and software such can be re-used by the smallest hubs (to accommodate and improve local research and complementation projects) up to the larges hubs (supporting rainbow projects, starting with Genoom van Nederland). By sharing the same components between all hubs we provide an effective path to
    37 
    38  * harmonize and enrich available data management, exchange and analysis protocols
    39 
    40  * avoid duplicated efforts between local hubs
    41 
    42  * make it more likely that everyone’s needs are supported
    43 
    44  * improve quality because more users test the available bioinformatics infrastructure
    45 
    46  * preserving flexibility to go beyond standardization and accommodate specific local needs.
    47 == Roadmap ==
    48 Total duration 3 years
    49 
    50 ||Planning (matching GvNL planning where appropriate) ||||
    51 ||Short read archive                  ||                                                                       Month 0 – 8||
    52 
    53 ||Biobank catalog pilot                ||                                                                    Month 0 – 6||
    54 
    55 ||Sequence analysis Phase 1 (GvNL)           ||                                             Month 4 – 16||
    56 
    57 ||Harmonized exchange formats                    ||                                                           Month 6 - 24||
    58 
    59 ||Establish variation QC and analysis pipeline     ||                                  Month 8 – 20||
    60 
    61 ||Sequence analysis Phase 2 (GvNL)          ||                                              Month 8 – 20||
    62 
    63 ||Variation catalog/Dutch !HapMap     ||                                                      Month 20||
    64 
    65 ||GWAS data release server        ||                                                                Month 0 – 12||
    66 
    67 ||GWAS QC and imputation protocols     ||                                                Month 6 – 20||
    68 
    69 ||Dutch GWAS Control Cohort (DGCC)      ||                                              Month 12 –24||
    70 
    71 ||Imputation of available GWA data (GvNL)  ||                                        Month 20 – 30||
    72 
    73 ||Make sequence data available (GvNL)   ||                                               Month 12 – 30||
    74 
    75 ||GWAS analysis tools catalog       ||                                                              Month 12 – 36||
    76 
    77 ||Web access tools             ||                                                                              Month 22 – 30||
    78 
    79 ||Integrated DCGG and Variation catalog web access tools ||           Month 24 – 36||
    8015
    8116== Deliverables ==
     
    11752 * __Web access tools__ – harmonized user interfaces and programmers interfaces to provide a single point of access to all the resources developed in this project.
    11853·         __Web access tools__ – harmonized user interfaces and programmers interfaces to provide a single point of access to all the resources developed in this project.
     54
     55== Approach ==
     56This project will combine a hub-and-spoke research & development organization to harmonize data between biobanks together with the provision of experts who will provide innovative model-driven software methods to efficiently produce ready-to-use software infrastructures needed by biologists and researchers. This includes:
     57=== Agile hub-and-spoke organization ===
     58At the core of BBMRI there is the vision to develop all resources in a hub and spokes manner such that we maximize use of local expertise and innovation and minimize duplicated efforts and barriers to integration via centralized harmonization and enrichment. The smallest hubs within the Dutch biobank landscape are the individual biobanks, the larger hubs the participating institutes, and the largest hubs are central deployment of key data and analysis resources (which again can connect to pan-European hubs).  This project will mirror this organization to bridge between biomedical researchers, bioinformaticians and hardcore software engineers to ensure the multi-disciplinary interplay needed:
     59
     60 * A __central engineering team of hardcore programmers__ is responsible for the overarching infrastructure and will ensure harmonization of tools, pipelines and databases between working groups. This group will function as one of the eight NBIC task forces and will meet every week to ensure knowledge and method transfer.
     61
     62 * __Participating experts will host programmers and scientific staff to pilot the planned tools and pipelines__ in close support to (their) BBMRI-NL complementation and rainbow projects. These bioinformaticians will be organized in themed working groups as described in appendix 1. Each working group will have a lead programmer that is part of the central engineering team. All members will meet monthly and will have weekly Skype meetings.
     63
     64 * __This project is strongly linked with leading international sister projects to avoid duplicated efforts __and efficiently achieve these aims by having project members participating in, or staying at, institutes like European Bioinformatics Institute (1KG, EGA, !ArrayExpress), Netherlands Bioinformatics Center (NGS, eScience, CWA), projects like EU-GEN2PHEN, EU-BIOSHARE, OMII-UK, ESFRI/ELIXIR, Parelsnoer, Mondriaan, CTMM, TIFN, NPC, NMC, P3G, Human Variome Project and open source collaborations like !ObiBa, MOLGENIS/XGAP, ABEL and Concept Web Alliance.
     65=== Model driven software ===
     66Flexible model driven software development as described in Swertz & Jansen (2007) has proven to be an efficient method to rapidly produce harmonized software infrastructures for life scientists while sharing the best models, software and tools notwithstanding large variation in research aims. This project will build and extend upon open source implementations of these methods such as MOLGENIS and Galaxy focusing on:
     67
     68 * Implementing __extensible standard data models and software components__ developed internationally (we co-piloted data models for microarrays, QTLs, GWAS studies [Swertz 2010], and phenotypes in EU consortia like GEN2PHEN and EBI and participated in international GWAS and sequencing initiatives like the 1KG project).
     69
     70 * Making tools and protocols reusable in __a user-friendly catalog of bioinformatics tools and workflows __that captures all necessary inputs, outputs, optimization properties and user interactions in models to automatically incorporate existing tools (building or inspired on Taverna and Galaxy).
     71
     72 * __Generating automatically from these data and tool models the scalable back-ends and front-ends needed__. This automatic procedure ensures harmonized software results building on industry standard databases for metadata and innovative approaches like cloud computing activities at SARA/Amsterdam, CIT/Groningen and BigGRID/Rotterdam to connect to the scalable compute power and storage needed.
     73
     74 * __Ease finding and integration of resources using semantic and ontology technologies__ such as developed at EBI and NBIC/Concept Web Alliance to build bridges between data and tools, tapping into existing ontologies for data (e.g. HPO for human phenotype ontology) and for analysis protocols to help user and systems developers to bring tools together.
     75=== Ready-to-use databases and tools ‘in-a-box’ that can federate into national resources ===
     76As detailed below in the description of work  section, this project aims to develop novel or incorporate internationally proven key bioinformatics tools, databases, models and software such can be re-used by the smallest hubs (to accommodate and improve local research and complementation projects) up to the larges hubs (supporting rainbow projects, starting with Genoom van Nederland). By sharing the same components between all hubs we provide an effective path to
     77
     78 * harmonize and enrich available data management, exchange and analysis protocols
     79
     80 * avoid duplicated efforts between local hubs
     81
     82 * make it more likely that everyone’s needs are supported
     83
     84 * improve quality because more users test the available bioinformatics infrastructure
     85
     86 * preserving flexibility to go beyond standardization and accommodate specific local needs.
     87
     88== Roadmap ==
     89Total duration 3 years
     90
     91||Planning (matching GvNL planning where appropriate) ||||
     92||Short read archive                  ||                                                                       Month 0 – 8||
     93
     94||Biobank catalog pilot                ||                                                                    Month 0 – 6||
     95
     96||Sequence analysis Phase 1 (GvNL)           ||                                             Month 4 – 16||
     97
     98||Harmonized exchange formats                    ||                                                           Month 6 - 24||
     99
     100||Establish variation QC and analysis pipeline     ||                                  Month 8 – 20||
     101
     102||Sequence analysis Phase 2 (GvNL)          ||                                              Month 8 – 20||
     103
     104||Variation catalog/Dutch !HapMap     ||                                                      Month 20||
     105
     106||GWAS data release server        ||                                                                Month 0 – 12||
     107
     108||GWAS QC and imputation protocols     ||                                                Month 6 – 20||
     109
     110||Dutch GWAS Control Cohort (DGCC)      ||                                              Month 12 –24||
     111
     112||Imputation of available GWA data (GvNL)  ||                                        Month 20 – 30||
     113
     114||Make sequence data available (GvNL)   ||                                               Month 12 – 30||
     115
     116||GWAS analysis tools catalog       ||                                                              Month 12 – 36||
     117
     118||Web access tools             ||                                                                              Month 22 – 30||
     119
     120||Integrated DCGG and Variation catalog web access tools ||           Month 24 – 36||
     121