{"id":2493,"date":"2015-05-22T10:52:52","date_gmt":"2015-05-22T10:52:52","guid":{"rendered":"http:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/?page_id=2493"},"modified":"2015-05-29T09:12:22","modified_gmt":"2015-05-29T09:12:22","slug":"bioawk","status":"publish","type":"page","link":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/software\/applications\/bioawk\/","title":{"rendered":"bioawk"},"content":{"rendered":"<h2>Overview<\/h2>\n<p>Bioawk is an extension created by\u00a0<a href=\"http:\/\/lh3lh3.users.sourceforge.net\/\">Heng Li<\/a>\u00a0of\u00a0<a href=\"http:\/\/www.cs.princeton.edu\/%7Ebwk\/btl.mirror\/\">Brian Kernighan&#8217;s awk<\/a>\u00a0\u00a0which\u00a0adds the support of several common biological data formats, including optionally gzip&#8217;ed BED, GFF, SAM, VCF, FASTA\/Q and TAB-delimited formats with column names. It also adds a few built-in functions and an command line option to use TAB as the input\/output delimiter. When the new functionality is not used, bioawk is intended to behave exactly the same as the original BWK awk.<\/p>\n<p>Latest version dated 27-08-2013\u00a0is installed on the CSF. Bioawk does not have a version number hence date of github commit is used to identify versions.<\/p>\n<h2>Restrictions on use<\/h2>\n<p>No licence information available.<\/p>\n<h2>Set up procedure<\/h2>\n<p>To access the software you must first load the modulefile:<\/p>\n<pre>module load apps\/gcc\/bioawk\/27-08-2013<\/pre>\n<h2>Running the application<\/h2>\n<p>Please do not run bioawk on the login node. Jobs should be submitted to the compute nodes via batch.<\/p>\n<h3>Serial batch job submission<\/h3>\n<p>Make sure you have the modulefile loaded then create a batch submission script, for example:<\/p>\n<pre>#!\/bin\/bash\r\n#$ -S \/bin\/bash\r\n#$ -cwd             # Job will run from the current directory\r\n#$ -V               # Job will inherit current environment settings<\/pre>\n<pre><code>bioawk -Hc sam '!and($flag,4)'<\/code><\/pre>\n<p>Submit the jobscript using:<\/p>\n<pre>qsub <em>scriptname<\/em><\/pre>\n<p>where <em>scriptname<\/em> is the name of your jobscript.<\/p>\n<h3>Parallel batch job submission<\/h3>\n<p>bioawk is not designed to run in parallel. If you need to run multiple similar jobs then please consider using <a title=\"Job Arrays\" href=\"http:\/\/ri.itservices.manchester.ac.uk\/csf2\/csf-user-documentation\/sge-job-arrays\/\">job arrays<\/a>.<\/p>\n<h3>Examples of usage:<\/h3>\n<ol>\n<li>List the supported formats:\n<pre><code>bioawk -c help\r\n<\/code><\/pre>\n<\/li>\n<li>Extract unmapped reads without header:\n<pre><code>bioawk -c sam 'and($flag,4)' aln.sam.gz\r\n<\/code><\/pre>\n<\/li>\n<li>Extract mapped reads with header:\n<pre><code>bioawk -Hc sam '!and($flag,4)'\r\n<\/code><\/pre>\n<\/li>\n<li>Reverse complement FASTA:\n<pre><code>bioawk -c fastx '{print \"&gt;\"$name;print revcomp($seq)}' seq.fa.gz<\/code><\/pre>\n<\/li>\n<\/ol>\n<p>Further examples can be found on the <a title=\"Bioawk Help Page\" href=\"https:\/\/github.com\/ialbert\/bioawk\/blob\/master\/README.bio.rst\">bioawk help page<\/a>.<\/p>\n<h3>Recognized Formats<\/h3>\n<p>These formats may be passed as the -c flag:<\/p>\n<dl>\n<dt>bed<\/dt>\n<dd>1:chrom 2:start 3:end 4:name 5:score 6:strand 7:thickstart 8:thickend 9:rgb 10:blockcount 11:blocksizes 12:blockstarts<\/dd>\n<dt>sam<\/dt>\n<dd>1:qname 2:flag 3:rname 4:pos 5:mapq 6:cigar 7:rnext 8:pnext 9:tlen 10:seq 11:qual<\/dd>\n<dt>vcf<\/dt>\n<dd>1:chrom 2:pos 3:id 4:ref 5:alt 6:qual 7:filter 8:info<\/dd>\n<dt>gff<\/dt>\n<dd>1:seqname 2:source 3:feature 4:start 5:end 6:score 7:filter 8:strand 9:group 10:attribute<\/dd>\n<dt>fastx<\/dt>\n<dd>1:name 2:seq 3:qual<\/dd>\n<\/dl>\n<p>The fastx flag can handle both FASTA and FASTQ formats.<\/p>\n<h2>Further info<\/h2>\n<ul>\n<li><a title=\"Github\" href=\"https:\/\/github.com\/lh3\/bioawk\">Github<\/a><\/li>\n<li><a href=\"http:\/\/lh3lh3.users.sourceforge.net\/\">Heng Li<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/ialbert\/bioawk\/blob\/master\/README.bio.rst\">Bioawk help page<\/a><\/li>\n<\/ul>\n<h2>Updates<\/h2>\n<p>None.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview Bioawk is an extension created by\u00a0Heng Li\u00a0of\u00a0Brian Kernighan&#8217;s awk\u00a0\u00a0which\u00a0adds the support of several common biological data formats, including optionally gzip&#8217;ed BED, GFF, SAM, VCF, FASTA\/Q and TAB-delimited formats with column names. It also adds a few built-in functions and an command line option to use TAB as the input\/output delimiter. When the new functionality is not used, bioawk is intended to behave exactly the same as the original BWK awk. Latest version dated 27-08-2013\u00a0is.. <a href=\"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/software\/applications\/bioawk\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":31,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2493","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/2493","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/comments?post=2493"}],"version-history":[{"count":5,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/2493\/revisions"}],"predecessor-version":[{"id":2553,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/2493\/revisions\/2553"}],"up":[{"embeddable":true,"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/pages\/31"}],"wp:attachment":[{"href":"https:\/\/ri.itservices.manchester.ac.uk\/csf-apps\/wp-json\/wp\/v2\/media?parent=2493"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}