{"id":4045,"date":"2017-07-25T14:18:57","date_gmt":"2017-07-25T14:18:57","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/?page_id=4045"},"modified":"2017-07-26T08:31:15","modified_gmt":"2017-07-26T08:31:15","slug":"bgen","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/software\/applications\/bgen\/","title":{"rendered":"BGEN"},"content":{"rendered":"<h2>Overview<\/h2>\n<p><a href=\"https:\/\/bitbucket.org\/gavinband\/bgen\">BGEN<\/a> provides utility programs to process files in the  <a href=\"http:\/\/www.well.ox.ac.uk\/~gav\/bgen_format\/bgen_format_v1.2.html\">BGEN format<\/a>.<\/p>\n<p>These tools will be of interest to researchers working with the <a href=\"http:\/\/ri.itservices.manchester.ac.uk\/hosted-data-sets\/\">UK BioBank full-release<\/a> dataset.<\/p>\n<p>The <em>master<\/em> branch from the repository (commit id d1f03a2c308a, downloaded 24-July-2017) is installed on the CSF.<\/p>\n<h2>Restrictions on use<\/h2>\n<p>This version is released under the <a href=\"http:\/\/www.boost.org\/users\/license.html\">Boost Software License v1.0<\/a> &#8211; there are no restrictions on access on the CSF.<\/p>\n<h2>Set up procedure<\/h2>\n<p>To access the software you must first load the modulefile:<\/p>\n<pre>\r\nmodule load apps\/gcc\/bgen\/latest\r\n\r\n# The <em>latest<\/em> modulefile currently give you:\r\n# apps\/gcc\/bgen\/d1f03a2c308a\r\n<\/pre>\n<p>The following tools are available:<\/p>\n<pre>\r\nbgenix\r\ncat-bgen\r\nedit-bgen\r\n<\/pre>\n<p>You can add the <code>-help<\/code> flag to each tool to see the command-line flags (you may do this on the login node). For example:<\/p>\n<pre>\r\nbgenix -help\r\n<\/pre>\n<p>You may also wish to load the utility modulefile:<\/p>\n<pre>\r\nmodule load tools\/env\/ukbiobank-full-release\r\n<\/pre>\n<p>This will set some <a href=\"\/csf-apps\/software\/applications\/ukbiobank\">environment variables<\/a> used in the examples below to make accessing the folder where the UK Bio Bank datasets are kept a little easier.<\/p>\n<h2>Running the application<\/h2>\n<p>Please do not run BGEN tools on the login node. Jobs should be submitted to the compute nodes via batch.<\/p>\n<h3>Serial batch job submission<\/h3>\n<p>Make sure you have the modulefile loaded then create a batch submission script, for example:<\/p>\n<pre>\r\n#!\/bin\/bash\r\n#$ -cwd             # Job will run from the current directory\r\n#$ -V               # Inherit settings from modulefile loaded on login node\r\n\r\nbgenix -g <em>path\/to\/filename.bgen<\/em> <em>arg2<\/em> <em>arg3<\/em> ... \r\n<\/pre>\n<p>Submit the jobscript using: <\/p>\n<pre>qsub <em>scriptname<\/em><\/pre>\n<p>where <em>scriptname<\/em> is the name of your jobscript.<\/p>\n<h2>Working with the UK BioBank Full Release Dataset<\/h2>\n<p>The BGEN website offers <a href=\"https:\/\/bitbucket.org\/gavinband\/bgen\/wiki\/Using_the_UK_Biobank_full_release_index_files\">some advice<\/a> on using <code>bgenix<\/code> with the UK BioBank data. In summary, there is a problem in that the UK BioBank filenames do <em>not<\/em> match the expected format used by <code>bgenix<\/code>.<\/p>\n<p>For example, the EGAD00010001225\/001\/ dataset contains files of the form:<\/p>\n<pre>\r\nBioBank BGEN filename    BioBank INDEX filename     INDEX filename expected by bgenix\r\n---------------------    ----------------------     ---------------------------------\r\nukb_imp_chr1_v2.bgen     ukb_bgi_chr1_v2.bgi        ukb_imp_chr1_v2.bgen.bgi\r\nukb_imp_chr2_v2.bgen     ukb_bgi_chr2_v2.bgi        ukb_imp_chr2_v2.bgen.bgi\r\n...                      ...                        ...\r\nukb_<strong>imp<\/strong>_chr<em>N<\/em>_v2.<strong>bgen<\/strong>     ukb_<strong>bgi<\/strong>_chr<em>N<\/em>_v2.<strong>bgi<\/strong>        ukb_<strong>imp<\/strong>_chr<em>N<\/em>_v2.<strong>bgen.bgi<\/strong>\r\n<\/pre>\n<p>If you simply run:<\/p>\n<pre>\r\nbgenix -g ukb_imp_chr1_v2.bgen -list\r\n  #\r\n  # Let bgenix generate the index filename based\r\n  # on the name of the input bgen filename.\r\n  # THIS WILL FAIL!\r\n<\/pre>\n<p>It will try to find an index file named <code>ukb_imp_chr1_v2.bgen.bgi<\/code> but this is incorrect for the UK BioBank dataset. You will receive an error message:<\/p>\n<pre>\r\n!! Error opening index file \"ukb_imp_chr1_v2.bgen.bgi\":\r\nCould not open the index file \"ukb_imp_chr1_v2.bgen.bgi\"\r\n<\/pre>\n<p>It is possible to add the <code>-i<\/code> flag to the <code>bgenix<\/code> command-line to specify explicitly a different index filename use. For example:<\/p>\n<pre>\r\nbgenix -g ukb_imp_chr1_v2.bgen <strong>-i ukb_bgi_chr1_v2.bgi<\/strong> -list\r\n  #\r\n  # Tell bgenix what index filename to use.\r\n  # THIS WILL SUCCEED!\r\n<\/pre>\n<p>If you wish to script the generation of index filenames from bgen filenames inside a jobscript you can use commands such as:<\/p>\n<pre>\r\nBGENFILE=ukb_imp_chr1_v2.bgen\r\nINDEXFILE=`echo $BGENFILE | sed 's\/imp\/bgi\/g;s\/bgen\/bgi\/g'`\r\nbgenix -g $BGENFILE -i $INDEXFILE <em>args...<\/em>\r\n<\/pre>\n<h3>Job Array Example<\/h3>\n<p>The following <a href=\"http:\/\/ri.itservices.manchester.ac.uk\/userdocs\/sge\/job-arrays\/\">job array<\/a> will process all of the files:<\/p>\n<pre>\r\nukb_imp_chr1_v2.bgen\r\nukb_imp_chr2_v2.bgen\r\n...\r\nukb_imp_chr22_v2.bgen\r\n<\/pre>\n<p>Create a batch submission script similar to:<\/p>\n<pre>\r\n#!\/bin\/bash --login\r\n#$ -cwd\r\n### Note we will load the modulefiles in the jobscript hence\r\n### no '#$ -V' line and we've added --login above.\r\n\r\n### Automatically run 22 copies of this job (each uses 1 core)\r\n#$ -t 1-22\r\n\r\n# We load the modulefiles in the jobscript (hence no #$ -V line)\r\nmodule load apps\/gcc\/bgen\/latest\r\nmodule load tools\/env\/ukbiobank-full-release\r\n\r\n### ${SGE_TASK_ID} is automatically replaced by the number 1, 2, 3, ..., 22 \r\nBGENFILE=${UKBB_IMPUTATION_DIR}\/ukb_imp_chr${SGE_TASK_ID}_v2.bgen\r\n\r\n### Generate the correct index filename for bgenix\r\nINDEXFILE=`echo $BGENFILE | sed 's\/imp\/bgi\/g;s\/bgen\/bgi\/g'`\r\n\r\n### Run bgenix\r\nbgenix -g $BGENFILE -i $INDEXFILE -list\r\n<\/pre>\n<p>Submit the jobscript using: <\/p>\n<pre>qsub <em>scriptname<\/em><\/pre>\n<p>where <em>scriptname<\/em> is the name of your jobscript.<\/p>\n<h2>Further info<\/h2>\n<ul>\n<li><a href=\"https:\/\/bitbucket.org\/gavinband\/bgen\">BGEN website<\/a><\/li>\n<li><a href=\"\/csf-apps\/software\/applications\/ukbiobank\">The CSF UK BioBank modulefile<\/a>\n<li><a href=\"http:\/\/www.ukbiobank.ac.uk\/wp-content\/uploads\/2017\/07\/UKB-Genotyping-and-Imputation-Data-Release-FAQ.pdf\">UK BioBank Full Release FAQ<\/a>\n<\/ul>\n<h2>Updates<\/h2>\n<p>None.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview BGEN provides utility programs to process files in the BGEN format. These tools will be of interest to researchers working with the UK BioBank full-release dataset. The master branch from the repository (commit id d1f03a2c308a, downloaded 24-July-2017) is installed on the CSF. Restrictions on use This version is released under the Boost Software License v1.0 &#8211; there are no restrictions on access on the CSF. Set up procedure To access the software you must.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/software\/applications\/bgen\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":15,"featured_media":0,"parent":31,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-4045","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/4045","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/users\/15"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/comments?post=4045"}],"version-history":[{"count":14,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/4045\/revisions"}],"predecessor-version":[{"id":4066,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/4045\/revisions\/4066"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/31"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/media?parent=4045"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}