DataFlow project
The DataFlow Project is building a two-stage cloud-deployable data management infrastructure for researchers, that can be used across national Higher Education Institutions: (a) DataStage, to manage their research data locally, and (b) DataBank, to preserve and publish valuable research.
For local data management, rather than storing datasets on external hard drives in the lab, DataFlow lets researchers save their work to a DataStage file system that appears as a mapped drive on their computer, a lightweight system requiring them to install no special software on their computers. DataStage will allow specification of specific read/write permissions for Principal Investigators and individuals within a research group, to ensure appropriate levels of data confidentiality. The system will be lightweight, and will adopt best-practice standards to make sure data is secure and easy to retrieve.
The DataFlow architecture will create modular web services linked via RESTful APIs.
DataStage is based on the local research data management infrastructure developed within the JISC ADMIRAL project, a secure personalized ‘local’ file management environment for use at the research group level, appearing as a mapped drive on the user’s PC.
DataBank is an institutional-level research data repository, which will expose both human- and machine-readable RDF metadata describing datasets, and will assign Digital Object Identifiers (DOIs) to hosted datasets, obtained automatically using the DataCite API, to aid discovery and citation.
Both VMware virtualized services may be deployed locally or on a variety of cloud infrastructures, and both will be SWORD-compliant, using the SWORD 2 communication protocol to wrap datasets for repository submission. DataStage will use SWORD to submit valuable datasets to any compliant institutional or subject-specific repository, while DataBank will provide a SWORD-compliant ingest service for datasets from DataStage or similar SWORD-compliant clients. Both the SWORD communication protocol and the DataStage data packaging protocol can be used with any data types.