Running CWL in Command Line

Prerequisite

If this is your first time running a cwl, we recommend that you follow the initial setup guide first.

Initial Setup
  • A CWL

  • Access to Iris cluster

    • We use islogin01.mskcc.org

  • Load Singularity version 3.7.1

    • module load migration-testing/singularity/3.7.1

  • Load python 3.10.13

    • module load migration-testing-spack/python/3.10.13

  • A python virtualenv with the latest version of toil installed

  • Node.js installed

    • Check if you have node installed with node -v ; if not, see Install Node.js

Get/git ARGOS CWL

Get the CWL and git clone it to your desired working directory from github:

git clone https://github.com/mskcc/argos-cwl

This will put argos-cwl into a subdirectory in your work directory named argos-cwl.

Setup the singularity cache

Follow this link to setup the singularity cache

Setup Singularity cache

Running CWL commands with toil-cwl-runner

Toil-cwl-runner is an executor for cwl that can run the workflow in single-machine (single node), on the LSF or SLURM scheduler

Prerequisite

  • A python virtual envionment/venv; assuming it's set to VENV_PATH -activate with toil installed:

    • source $VENV_PATH/bin/activate

  • A CWL in your working directory; we will use CMO-Fillout as an example, retrieved from above

  • Input files for the CWL

If you are running argos-cwl, you can find an example input.yaml for alignment-pair.cwl or project-workflow-sv.cwl located below.

Option 1: Running with SBATCH command on SLURM

  1. Create a SLURM script such as, my_script.slurm . For alignment-pair.cwl the below script can be used:

#!/bin/bash

#SBATCH --job-name=toil_cwl

#SBATCH --partition=test01        # Replace with the correct partition name

#SBATCH --time=02:00:00            # Wall time (hh:mm:ss)

#SBATCH -o single.%j.out

#SBATCH -e single.%j.err



#module load python/3.8             # Load necessary modules

#sudo systemctl start docker.service             #start docker service

source venv/bin/activate           # Activate your virtual environment



#Export queue name

export TOIL_SLURM_ARGS='--partition test01' ;



# Navigate to your workflow directory (if needed)

#cd /path/to/your/workflow


export TMPDIR=$PWD/tmp
export CWL_SINGULARITY_CACHE=$PWD/singularity_cache
export SINGULARITYENV_LC_ALL=en_US.UTF-8


toil-cwl-runner \
--batchSystem slurm \
--jobStore file:job-store-name \
--singularity \
--disableCaching \
--logFile cwltoil15.log \
--logDebug \
--preserve-environment SINGULARITY_DOCKER_USERNAME SINGULARITY_DOCKER_PASSWORD TMPDIR \
--outdir argos_output_directory \
argos-cwl/modules/pair/alignment-pair.cwl argos-cwl/test/modified_input.yaml
  1. Run command sbatch my_script.slurm

  2. The standard output and standard error are directed to single.%j.out and single.%j.err where "%j" is replaced by the job number

Option 2: Running with input yaml generated through cwltool

You can use cwltool to create a template yaml, as defined by the CWL
cwltool --make-template argos-cwl/tools/cmo-utils/1.9.15/cmo-fillout.cwl > my_input.yaml

The my_input.yaml file should look like the following:

ref_fasta:  # type "File"
    class: File
    path: a/file/path
portal_output: a_string  # type "string" (optional)
pairing:  # type "File" (optional)
    class: File
    path: a/file/path
output_format: a_string  # type "string"
output: a_string  # type "string" (optional)
maf:  # type "File"
    class: File
    path: a/file/path
fillout: a_string  # type "string" (optional)
bams:  # array of 
  - a_string  # type "string"
  - # type "File"
    class: File
    path: a/file/path

Configure my_input.yaml with the input data for our test - optional fields can and should be deleted or commented out if they will not be used:

ref_fasta:  # type "File"
    class: File
    path: /juno/work/ci/first_time_setup/test_fillout/GRCh37/b37.fasta
# portal_output: a_string  # type "string" (optional)
pairing:  # type "File" (optional)
    class: File
    path: /juno/work/ci/first_time_setup/test_fillout/sample_pairing.txt
output_format: "1"  # type "string"
# output: a_string  # type "string" (optional)
maf:  # type "File"
    class: File
    path: /juno/work/ci/first_time_setup/test_fillout/sample1.sample2.muts.maf
# fillout: a_string  # type "string" (optional)
bams:  # array of 
  - # type "File"
    class: File
    path: /juno/work/ci/first_time_setup/test_fillout/sample2.rg.md.abra.printreads.bam
  - class: File
    path: /juno/work/ci/first_time_setup/test_fillout/sample1.rg.md.abra.printreads.bam

Note: bams is an array of File OR an array of string, or some combination of both. When specifying a list, make sure it adheres to standard YAML syntax: a list can be denoted by a leading hyphen (-) or the elements in the list can be specified by enclosing brackets.

Running on LSF:

export SINGULARITYENV_LC_ALL=en_US.UTF-8
export CWL_SINGULARITY_CACHE=[cache]
export TOIL_LSF_ARGS="-sla [SLA] -S 1"

export TMP=$PWD/tmp
export TMPDIR=$TMP
export WORKDIR=$PWD/work
mkdir -p $TMP
mkdir -p $WORKDIR

toil-cwl-runner \
    --singularity \
    --batchSystem lsf \
    --logFile toil_log.log \
    --coalesceStatusCalls \
    --disableCaching \
    --doubleMem \
    --disable-user-provenance \
    --disable-host-provenance \
    --not-strict \
    --realTimeLogging \
    --preserve-environment CWL_SINGULARITY_CACHE SINGULARITYENV_LC_ALL \
    --outdir my_output_directory \
    --jobStore $TMP/jobstore \
    --tmpdir-prefix $TMP \
    --workDir $WORKDIR \
    --outdir $PWD/out \
    --maxLocalJobs 500 \
    --clean never \
    argos-cwl/tools/cmo-utils/1.9.15/cmo-fillout.cwl my_input.yaml

Option 3: Running through command line arguments

Usage

To view usage arguments to pass to the CWL, do

toil-cwl-runner argos-cwl/tools/cmo-utils/1.9.15/cmo-fillout.cwl

usage: argos-cwl/tools/cmo-utils/1.9.15/cmo-fillout.cwl [-h] --bams BAMS
                                                        [--fillout FILLOUT]
                                                        --maf MAF
                                                        [--output OUTPUT]
                                                        --output_format
                                                        OUTPUT_FORMAT
                                                        [--pairing PAIRING]
                                                        [--portal_output PORTAL_OUTPUT]
                                                        --ref_fasta REF_FASTA
                                                        [job_order]

If you need to list an array of files, you can do so by specifying the arguments multiple times. See "bams" in this example below

If you are running TOIL on LSF make sure to add the LSF arguments from the "Running on LSF" section above.

toil-cwl-runner \
    --singularity \
    --batchSystem single_machine \
    --disableCaching \
    --preserve-environment SINGULARITY_DOCKER_USERNAME SINGULARITY_DOCKER_PASSWORD \
    --outdir my_output_directory \
    argos-cwl/tools/cmo-utils/1.9.15/cmo-fillout.cwl \
    --bams /juno/work/ci/first_time_setup/test_fillout/sample2.rg.md.abra.printreads.bam \
    --bams /juno/work/ci/first_time_setup/test_fillout/sample1.rg.md.abra.printreads.bam \
    --maf /juno/work/ci/first_time_setup/test_fillout/sample1.sample2.muts.maf \
    --output_format 1 \
    --ref_fasta /juno/work/ci/first_time_setup/test_fillout/GRCh37/b37.fasta \
    --pairing /juno/work/ci/first_time_setup/test_fillout/sample_pairing.txt

Getting outputs

When your run finishes you will get:

{
    "fillout_out": {
        "location": "file:///juno/work/ci/first_time_setup/test_fillout/sample1.sample2.muts.fillout",
        "basename": "sample1.sample2.muts.fillout",
        "nameroot": "sample1.sample2.muts",
        "nameext": ".fillout",
        "class": "File",
        "checksum": "sha1$805e084b6d2db7b90c857d10e7153882ce7d7ce1",
        "size": 1572304
    },
    "portal_fillout": {
        "location": "file:///juno/work/ci/first_time_setup/test_fillout/sample1.sample2.muts.fillout.portal.maf",
        "basename": "sample1.sample2.muts.fillout.portal.maf",
        "nameroot": "sample1.sample2.muts.fillout.portal",
        "nameext": ".maf",
        "class": "File",
        "checksum": "sha1$6ef3d3867d5389b2e9eb0efe1d1e0389bbabb59c",
        "size": 6187721
    }
}[2021-06-03T15:11:49-0400] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/scratch/tmpfbed637q)

Last updated

Was this helpful?