Research Infrastructure

Hosted Data Sets

In cases where a data set is used by multiple University research groups, we may be able to create a local mirror which will enable fast access from local computational platforms.

Currently-Hosted Data Sets

UK BioBank Full Release

The UK BioBank Genotyping and Imputation Data Release v3 (data for all 500,000 participants in UK BioBank) is now available for use on central compute platforms (the CSF and iCSF). It is also available as a storage share that can be mapped as a network drive on campus PCs / desktops.

Full details are provided here on: requesting access to datasets, available formats, and accessing the data on central compute and campus PCs.

UK BioBank Activity Data

The UK BioBank Activity Data is now available for use on central compute platforms (the CSF and iCSF). This is the objective physical activity data recorded from over 100,000 UK adults.

GnomAD Dataset

The Genome Aggregation Database (gnomAD) is available for use on central compute platforms (the CSF, DPSF and iCSF). Please see the Broad Institute’s overview of gnomeAD and this blog post for a detailed description of the data. The Broad Institute’s download page shows exactly what has been downloaded.

Access on the CSF, DPSF, and iCSF can be made immediately via the path:

/mnt/data-sets/gnomeAD/

All users of the data should read the Broad Institute’s Terms of Use and follow the citation request on that page.

Download Tools

Downloading to Research Data Storage shares should be done on the RDS-SSH servers as this can be done entirely over fast data-centre networks. Downloading to a storage share mapped on your desktop will usually be slower because the campus network to your desktop is slower.

A number of download tools from the UK BioBank and European Genome Archive (EGA) are available on the RDS-SSH servers. These are automatically in your PATH upon login – simply login and run the commands you would normally run. The tools installed are:

# UKBioBank tools
ukbmd5        # Calculate size and MD5 of a file
ukbconv       # Convert unpacked UKB data to other formats
ukbunpack     # Unpack (decrypt and decompress) UKB data
ukbfetch      # The bulk data download tool
ukblink       # Download Returned-datasets and link between Applications
ukbgene       # Download approved genetic data. This tool supercedes a tool named gfetch.

egaclient     # EGAdemoClient tools (will automatically load the EgaDemoClient.jar file)
egacryptor

ascp          # Aspera downloader tools (uses the default aspera private key)
ascp_noid     # You should add the '-i PRIVATE-KEY-FILE' flag to supply a key

basemount     # Illumina BaseSpace tools

For more information on what is installed on the rds-ssh.itservices.manchester.ac.uk server, please see our RDS-SSH server download tools documentation.

To obtain an account on the RDS-SSH service please email its-ri-team@manchester.ac.uk. This system has access to your Research Data Storage areas and CSF / iCSF home directory. But it also has access to common UKBioBank data-download sites.

Last modified on January 10, 2020 at 11:19 am by Site Admin