Azure Batch
===========
`Azure Batch `__ has
been created to provide a simple way of running containers and simple
commands on Azure without you having to closely manage the underlying VM
infrastructure (although a knowledge of the underlying infrastructure
will always be useful). While Azure Batch does not have an understanding
of CWL like a full-on workflow engine, it does provide a very simple way
to run a large number of Dockstore tools at scale.
Azure Batch also provides a client-side tool called `Batch
Shipyard `__ which provides a
number of features including a simple command-line interface for
submitting batch jobs.
Of course, keep in mind that if you have a knowledge of CWL and/or do
not need the Dockstore command-line to do file provisioning, you can
decompose the underlying command-line invocation for the tool and use
that as the command for your jobs, gaining a bit of performance. This
tutorial focuses on using cwltool and using the Dockstore command-line
to provide an experience that is more akin to running Dockstore or
cwltool :ref:`on the
command-line ` out of
the box.
1. Run through Azure Shipyard's `Linux Installation
Guide `__
and then the
`Quickstart `__
guide with one of the sample tools such as Torch-CPU.
2. With the shipyyard CLI setup, get the md5sum sample recipes from
GitHub
::
$ git clone https://github.com/dockstore/batch_wrapper.git
$ cd batch_wrapper/azure/
3. Fill out your ``config.json``, ``credentials.json``, and
``jobs.json`` in ``config.dockstore.md5sum``. If you have trouble
finding your access keys, take a look at this
`article `__.
In ``jobs.json`` note that we use AWS keys to provision or save the
final output files. You will also need to modify the parameter json
file ``md5sum.s3.json`` to reflect the location of your S3 bucket.
4. Create a compute pool. Note that this pool is not setup to
automatically resize. You may also need to pick a larger VM size with
a larger dataset.
::
$ ./shipyard pool add --configdir config.dockstore.md5sum
5. Submit the job and watch the output (this should take roughly a
minute if the pool already exists)
::
$ ./shipyard jobs add --configdir config.dockstore.md5sum --tail stdout.txt
2017-05-24 14:19:21.543 INFO - Adding job dockstorejob to pool dockstore
2017-05-24 14:19:21.989 INFO - uploading file /tmp/tmp7lgz7_j7 as 'shipyardtaskrf-dockstorejob/dockertask-00012.shipyard.envlist'
2017-05-24 14:19:22.027 DEBUG - submitting 1 tasks (0 -> 0) to job dockstorejob
2017-05-24 14:19:22.090 INFO - submitted all 1 tasks to job dockstorejob
2017-05-24 14:19:22.090 DEBUG - attempting to stream file stdout.txt from job=dockstorejob task=dockertask-00012
Creating directories for run of Dockstore launcher at: ./datastore//launcher-e849c691-cc47-4bfa-a443-b8830794ae0a
Provisioning your input files to your local machine
Downloading: #input_file from https://raw.githubusercontent.com/briandoconnor/dockstore-tool-md5sum/master/md5sum.input into directory: /mnt/batch/tasks/workitems/dockstorejob/job-1/dockertask-00012/wd/./datastore/launcher-e849c691-cc47-4bfa-a443-b8830794ae0a/inputs/ce735ade-8c46-4736-a7d8-2fc0cb7d2e87
[##################################################] 100%
Calling out to cwltool to run your tool
...
Final process status is success
Saving copy of cwltool stdout to: /mnt/batch/tasks/workitems/dockstorejob/job-1/dockertask-00012/wd/./datastore/launcher-e849c691-cc47-4bfa-a443-b8830794ae0a/outputs/cwltool.stdout.txt
Saving copy of cwltool stderr to: /mnt/batch/tasks/workitems/dockstorejob/job-1/dockertask-00012/wd/./datastore/launcher-e849c691-cc47-4bfa-a443-b8830794ae0a/outputs/cwltool.stderr.txt
Provisioning your output files to their final destinations
Uploading: #output_file from /mnt/batch/tasks/workitems/dockstorejob/job-1/dockertask-00012/wd/./datastore/launcher-e849c691-cc47-4bfa-a443-b8830794ae0a/outputs/md5sum.txt to : s3://dockstore.temp/md5sum.txt
Calling on plugin io.dockstore.provision.S3Plugin$S3Provision to provision to s3://dockstore.temp/md5sum.txt
[##################################################] 100%
6. You can repeat the process with ``config.dockstore.bwa`` which is a
more realistic bioinformatics workflow from the `PCAWG
project `__ and takes
roughly seven hours.
See Also
--------
- :doc:`AWS Batch `
- :doc:`Terra Launch With `
.. discourse::
:topic_identifier: 1282