The following examples demonstrate the commands in the BaseSpace CLI tool. For more information about the CLI and a list of commands, see CLI Overview.

Note that in examples where the output is very long it has been contracted to make this document more manageable. Users are encouraged to follow these examples whilst trying the commands for themselves on their own data to see the full output in their own system.

Authentication, listing projects and appresults


See main instructions.

Authenticate using default settings:

$ bs auth                                                           
Please go to this URL to authenticate:                                       
Created config file /home/username/.basespace/default.cfg                                                         

The default settings include:

  • The BSSH instance to use (default: US Virginia)
  • The scopes to request; these dictate the actions that can be performed
  • A timeout - how long to wait for authentication before the command fails

Inspect the token to see what it does:

$ bs whoami
| Name           | User Name                                          |
| Id             | 1234567                                            |
| Email          |                                 |
| DateCreated    | 2014-09-25 16:29:21 +0000 UTC                      |
| DateLastActive | 2021-09-16 10:38:17 +0000 UTC                      |
| Host           |                 |
| Scopes         | READ GLOBAL, CREATE GLOBAL, BROWSE GLOBAL,         |
|                | MOVETOTRASH GLOBAL, WRITE GLOBAL                   |

List projects:

$ bs list projects   
|                       Name                       |    Id    |   TotalSize   |
| NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | 36080093 | 2233311909088 |

List datasets:

$ bs list datasets
|           Name           |                 Id                  |                   Project.Name                   |   DataSetType.Id    |
| NA12878-I13_L002         | ds.184ba3d796f343f4886b4aa7fb43c496 | NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | illumina.fastq.v1.8 |
| NA12878-I13_L001         | ds.c805113ed9884caa8912dafdf8edd63d | NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | illumina.fastq.v1.8 |
| NA12878-I54_L001         | ds.dc5657d91983479eb0dd6abb53b9d60f | NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | illumina.fastq.v1.8 |
| NA12878-I85_L001         | ds.0a7781b4d7684113a4c64c1f2ca3c175 | NovaSeq: TruSeq Nano 550 (Replicates of NA12878) | illumina.fastq.v1.8 |

List all the available headers for datasets:

$ bs dataset headers                                                 

Relist all datasets

Using custom columns selected from the headers list:

$ bs list datasets -F Name -F QcStatus -F TotalSize -F AppSession.Application.Name                                           
|           Name           | QcStatus  |  TotalSize  |   AppSession.Application.Name   |
| NA12878-I13_L002         | Undefined | 4047606231  | FASTQ Generation                |
| NA12878-I13_L001         | Undefined | 4118853581  | FASTQ Generation                |
| NA12878-I54_L001         | Undefined | 3496655065  | FASTQ Generation                |
| NA12878-I85_L001         | Undefined | 2462111619  | FASTQ Generation                |
| NA12878-I87_L001         | Undefined | 2271497152  | FASTQ Generation                |

Filtering list results

BaseSpaceCLI provides several options for filtering the entities that are output by a list command.

Filtering by name

The option to filter results on an entity field is --filter-term. By default, this filters on the Name field.

$ bs list projects --filter-term=examples
|     Name      |   Id    | TotalSize |
| data_examples | 5472467 | 26510     |

The filter term is specified as a regular expression:

$ bs list appsessions --filter-term=" .* "
|                   Name                   |    Id    | ExecutionStatus |
| Illumina's Uploader 2012-11-19 22:06:29Z | 1313312  | Complete        |
| Illumina's Uploader 2012-11-19 22:06:29Z | 1306305  | Complete        |
| BaseSpaceCLI 2017-04-04 11:07:54Z        | 10743733 | Complete        |

Filtering by another field

You can also specify the field on which to filter by using the --filter-term option:

$ bs list datasets --filter-field=Project.Name --filter-term=data
|   Name    |                 Id                  | Project.Name  |   DataSetType.Id    |
| valid     | ds.5c4200d4f52e4a9dae86fd8b166e296d | data_examples | illumina.fastq.v1.8 |
| test_data | ds.46c118551d51497789ddaf84bbc9bff0 | data_examples | common.files        |

This is necessary for entities that do not have a "Name" field, like biosamples:

$ bs list biosamples --filter-term=demo
ERROR: *** Name "Name" not found in object ***
$ bs list biosamples --filter-term=demo --filter-field=BioSampleName
|         BioSampleName         |   Id    | ContainerName | ContainerPosition | Status |
| HiSeq_2500_NA12878_demo_2x150 | 2280211 |               |                   | New    |

Note that only one --filter-field and --filter-term pairing can be used per command. For additional filtering, consider post-processing the CLI results with tools such as grep (see BaseSpaceCLI filtering vs. POSIX)

Filtering by date

You can also specify the age of entities to be displayed by using --older-than and --newer-than.

$ bs list datasets  -F Name -F DateModified --newer-than=400d
|                   Name                   |         DateModified          |
| HiSeq 2500 NA12878 demo 2x150            | 2017-06-30 22:17:17 +0000 UTC |
| BWA GATK - HiSeq 2500 NA12878 demo 2x150 | 2017-07-04 03:09:15 +0000 UTC |

By default, date filtering applies to the DateModified field. You can alter this to another date field by using the --date-field option.

$ bs list datasets --date-field=DateCreated --older-than=1y -F Name -F DateCreated
|   Name    |          DateCreated          |
| valid     | 2017-04-04 11:07:56 +0000 UTC |
| test_data | 2017-04-04 11:20:09 +0000 UTC |

Server side filtering

The --filter-term and --filter-field options use client side filtering - the API returns all entities and they are filtered before they are displayed. This means that even if you only end up listing a handful of results, it can take a long time on a large account.

Some entities have specific filtering options that make use of server side filtering, where the API does the filtering and only returns the matching entities. These are available on an entity-specific basis:

$ bs list appsessions --exec-status=Complete
|                   Name                   |    Id    | ExecutionStatus |
| test_data                                | 10743734 | Complete        |
| Illumina's Uploader 2012-11-19 22:06:29Z | 1313312  | Complete        |
| Illumina's Uploader 2012-11-19 22:06:29Z | 1306305  | Complete        |
| BaseSpaceCLI 2017-04-04 11:07:54Z        | 10743733 | Complete        |

$ bs list datasets --is-type=common.files
|                   Name                   |                 Id                  |         Project.Name          | DataSetType.Id |
| test_data                                | ds.46c118551d51497789ddaf84bbc9bff0 | data_examples                 | common.files   |
| BWA GATK - HiSeq 2500 NA12878 demo 2x150 | ds.2f03151b6c9b4a909d05b1af729a6fc2 | HiSeq 2500 NA12878 2x150 Demo | common.files   |

You can discover the server side filtering options for each entity by using --help

$ bs list datasets --is-type=common.files
[dataset command options]
          --like-type=                       Filter DataSets that are LIKE this type
          --is-type=                         Filter DataSets that are this type
          --not-type=                        Filter DataSets that are NOT this type
          --input-biosample=                 Filter by Input BioSample
          --project-name=                    Name of parent project
          --project-id=                      ID of parent project

You can also combine server-side and client-side filtering:

$ bs list datasets --is-type=common.files --filter-term=data
|   Name    |                 Id                  | Project.Name  | DataSetType.Id |
| test_data | ds.46c118551d51497789ddaf84bbc9bff0 | data_examples | common.files   |

BaseSpaceCLI filtering vs. POSIX

An alternative to using the BaseSpaceCLI filter options it to use the standard POSIX tools such as grep and cut. The advantage of using the BaseSpaceCLI filters is that you can still use other options such as column selection and output formatting to help get you the output you want, which can be more convenient:

#using POSIX tools:
$ bs list datasets -f csv | grep common.files | cut -d, -f2
# the equivalent, with BSCLI filters:
$ bs list datasets -f csv --terse --is-type=common.files

Downloading data for runs, projects and biosamples

The BaseSpace CLI downloader will download files incrementally. If the connection is interrupted, re-running the download command will, by default, check for files that have already downloaded successfully and will avoid unnecessarily downloading them again.

Download all run data:

The example below can be used to download all files from a run which can be used to generate FASTQ files locally as well as inspect run metrics.

$ bs download run -i <RunID> -o <output>

Multiple runs can be downloaded in succession by iterating through a list of run IDs. The example below can be used to generate a list of all run IDs associated with an account and download each run in the list saved to a folder named for the numerical run ID.

$ bs list runs --terse > download.txt
$ while read run; do bs download run -i $run -o $run; done < download.txt

or in a more compact form:

$ bs list run --terse | xargs -I@ bs download run -i @ -o @

Download files associated with a project:

The example below can be used to download all files associated with a project from FASTQ files to analysis results.

$ bs download project -i <ProjectID> -o <output>

A subset of files can be downloaded from a project by specifying the desired file extension. The example below can be used to download all FASTQ files in a project and only the FASTQ files.

$ bs download project -i <ProjectID> -o <output> --extension=fastq.gz

Download all datasets associated with a biosample:

The example below will download all datasets associated with a biosample, even if the datasets are spread across multiple projects and aggregated under the single biosample name.

$ bs download biosample -i <BiosampleID> -o <output>

Drilling down into a dataset

Get details about one dataset

This example is from the VCAT app:

$ bs get dataset -i ds.f45e4fcccbce4fb18dd91bdad7dcb272
| Id                                                | ds.f45e4fcccbce4fb18dd91bdad7dcb272                                                    |
| Name                                              | NA12878-R1S1vcf-38337470                                                               |
| AppSession.Id                                     | 42463886                                                                               |
| AppSession.Name                                   | NA12878-R1_S1.vcf.gz_2                                                                 |
| AppSession.Application.AppFamilySlug              | basespace-labs.variant-calling-assessment-tool                                         |

Get dataset attributes

This example is a FASTQ app:

$ bs list attributes dataset -i ds.2f5b56dddc0440858943246ba4ac9d11
|        Name         |     Value     |
| TotalReadsPF        | 4.1119628e+07 |
| MaxLengthIndexRead1 | 8             |
| MaxLengthRead1      | 151           |
| MaxLengthRead2      | 151           |
| IsPairedEnd         | true          |
| TotalClustersPF     | 2.0559814e+07 |
| TotalClustersRaw    | 2.6711606e+07 |
| TotalReadsRaw       | 5.3423212e+07 |
| MaxLengthIndexRead2 | 8             |

Get file contents for a dataset

This is a VCAT example:

$ bs contents dataset -i ds.f45e4fcccbce4fb18dd91bdad7dcb272
|     Id     |                                        FilePath                                         |
| 7240583239 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.vcf.gz.tbi   |
| 7240583238 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.vcf.gz       |
| 7240583237 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.summary.csv  |
| 7240583236 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.metrics.json |
| 7240583235 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.extended.csv |
| 7240583234 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.counts.json  |
| 7240583233 | happy/NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.counts.csv   |
| 7240583232 | report.log                                                                              |
| 7240583231 | report.json                                                                             |

Download subset of files (by extension) from a dataset:

Simple filtering of files to download by their extension can be done using the --extension flag. For more sophisticated filtering, see the Selective filtering of uploads and downloads section.

# will download into directory /tmp/vcat
$ bs download dataset -i ds.2f5b56dddc0440858943246ba4ac9d11 --extension=json -o /tmp/vcat
NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.metrics.json  27.44 KB / 27.44 KB [============] 100.00% 348.12 KB/s 0s
NA12878-R1_S1-vcf-38337470__NA12878-Platinum-Genomes-v2016-1-0-hg38-.counts.json  5.02 KB / 5.02 KB [================] 100.00% 22.04 MB/s 0s
report.json  2.53 KB / 2.53 KB [=====================================================================================] 100.00% 13.41 MB/s 0s
NA12878-R1S1vcf-38337470.ds.f45e4fcccbce4fb18dd91bdad7dcb272.json  2.71 KB / 2.71 KB [===============================] 100.00% 61.58 MB/s 0s

Working with properties

Many BSSH entities can be tagged with properties, key/value pairs that label those entities. Some entities, like appsessions, come tagged with properties automatically but others can still be added manually. BSCLI lets you inspect existing properties of any type and create string properties.

BSSH entities that can be labelled with properties include projects, runs, biosamples, appsession, appresults and datasets.

Inspecting properties of an appsession

The command to see all the properties of an entity is property list:

$ bs appsession property list -i 46664618
|             Name              |        Description         |    Type     |              Content               |
| Output.Projects               |                            | project[]   | <use `bs get` to obtain more info> |
| Output.Datasets               |                            | dataset[]   | <use `bs get` to obtain more info> |
| Input.snp_vqsr                | SNP VQSR sensitivity       | string      | "99.5"                             |
| Input.sample-id.attributes    | Sample Attributes          | map[]       | <use `bs get` to obtain more info> |
| Input.reference_genome        | Reference genome           | string      | "b37_decoy"                        |
| Input.Projects                |                            | project[]   | <use `bs get` to obtain more info> |
| Input.project-id.attributes   | Save Results To Attributes | map[]       | <use `bs get` to obtain more info> |
| Input.project-id              | Save Results To            | project     | <use `bs get` to obtain more info> |
| Input.indel_vqsr              | Indel VQSR sensitivity     | string      | "99.5"                             |
| Input.Datasets                |                            | dataset[]   | <use `bs get` to obtain more info> |
| Input.BioSamples              |                            | biosample[] | <use `bs get` to obtain more info> |
|        | Analysis Name              | string      | "Sentieon [LocalDateTime]"         |
| BaseSpace.Private.IsMultiNode |                            | string      | "True"                             |

To see an individual property, use property get with the --property-name switch:

$ bs appsession property get -i 46664618 --property-name="Input.snp_vqsr"

Note that many of the properties of this appsession are themselves BSSH entities, which can be listed directly by using property get:

$ bs appsession property get -i 46664618 --property-name="Output.Projects"
|     Name      |    Id    |   TotalSize    |
| sgdp_sentieon | 38827790 | 32372028734815 |

This is particularly useful for finding the inputs and outputs of an app:

$ bs appsession property get -i 46664618 --property-name="Input.Datasets"
|    Name    |                 Id                  |          Project.Name           |   DataSetType.Id    |
| ERR1347692 | ds.4fa74a92d9b04a69a5cb53f603d965fa | Simons Genome Diversity Project | illumina.fastq.v1.8 |

$ bs appsession property get -i 46664618 --property-name="Output.Datasets"
|   Name   |                 Id                  | Project.Name  | DataSetType.Id |
| 36701821 | ds.e599c516419e4470b03f26c480fce45d | sgdp_sentieon | common.files   |

We can use some shell features to see the list of files that were output in a single command:

$ bs contents dataset -i $(bs appsession property get -i 46664618 --property-name="Output.Datasets" --terse)
|     Id     |                  FilePath                  |
| 7776871946 | .basespace/ERR1347692_128_hs37d5.cov.gz    |
| 7776871945 | .basespace/ERR1347692_128_Y.cov.gz         |
| 7776871944 | .basespace/ERR1347692_128_X.cov.gz         |
| 7776871943 | .basespace/ERR1347692_128_NC_007605.cov.gz |
| 7776871942 | .basespace/ERR1347692_128_MT.cov.gz        |
| 7776871941 | .basespace/ERR1347692_128_GL0002491.cov.gz |
| 7776871940 | .basespace/ERR1347692_128_GL0002481.cov.gz |

Setting and inspecting properties on a project

Projects by default do not have properties set:

$ bs projects properties list -i 27932921

We can add string properties:

$ bs projects properties set -i 27932921 --property-name="MyNamespace.TestProperty" --property-content="TestValue"
$ bs projects properties list -i 27932921
|          Name            | Description |  Type  |   Content   |
| MyNamespace.TestProperty |             | string | "TestValue" |

Note that a namespace prefix for a property name is compulsory:

$ bs projects properties set -i 27932921 --property-name="TestProperty" --property-content="TestValue"
ERROR: *** BASESPACE.PROPERTIES.NAME_INVALID: Property name: TestPropery must contain 2 or more segments split by a period. Each segment may contain letters, numbers, '-', and '\_'. First segment must start with a letter or number. ***

Setting lane QC thresholds

Authenticate with the appropriate scopes into a config called "laneqc"

We will refer to this in future commands.

$ bs auth --scopes 'READ GLOBAL,CREATE GLOBAL,CONFIGURE QC' -c laneqc
Please go to this URL to authenticate:
Created config file  /Users/basespaceuser/.basespace/laneqc.cfg
Welcome, BSSH CLI TestUser

Look at lane QC thresholds to confirm they are blank:

$ bs -c laneqc lane threshold export

Create a csv file to define the QC thresholds and set the lane QC thresholds using that file:

$ cat > /tmp/thresholds.txt                                       
$ bs -c laneqc lane threshold import -f /tmp/thresholds.txt          
# should finish without errors!                                      

View lane QC thresholds:

$ bs -c laneqc lane threshold export                                 

Clear lane QC thresholds to remove them and view to make sure they have been cleared:

$ bs -c laneqc lane threshold clear                                  
$ ./bs -c laneqc lane threshold export                               

Creating a biosample and uploading FASTQ data against it

Biosample Creation

# warning! if your project does not already exist it will be implicitly created                                                    
$ bs create biosample -n "MyBioSample" -p "MyProject"                

Note that there are quite a few optional metadata parameters for biosamples. These are primarily designed to help high-throughput labs classify and display biosamples:

$ bs create biosample --help
    BioSample Options:
      -n, --name=                                  Name of the BioSample
      -p, --project=                               Name of the project where FastQs will be stored. Created if not found.
          --container-name=                        Name of container
          --container-position=                    Position within the container
          --analysis-workflow=                     Name of the analysis to schedule
          --prep-request=                          Name of the lab workflow that LIMS should perform
          --required-yield=                        Required yield in Gbp that is needed before launching analysis (required if --prep-request is provided)
          --metadata=                              Key/Value metadata properties to set on the BioSample
          --delivery-mode=[Deliver|Do Not Deliver] Intial delivery mode

Use the --metadata flag to attach arbitrary labels to a new biosample:

$ bs create biosample -n MyBioSample -p MyProject --metadata Type:FFPE --metadata SequencingLab:12

You can preview biosample creation (--preview), to validate that the data provided is correct.

$ bs create biosample -n "MyBioSample" -p "MyProject" --preview      
ERROR:  *** Error in BioSample Name: BioSample 'MyBioSample' already exists and cannot be imported  ***                                  

To attach a new analysis workflow to an existing biosample, use the --allow-existing option.

FASTQ upload

There are two options to associate uploaded FASTQ files with a biosample:

  • Upload files that match the biosample name
  • Force the upload to attach FASTQs to the specified biosample, regardless of their name

Upload FASTQ files that match the biosample name

$ ls                                                                 
# note that you need a project ID here, not a project name as you did when you created the biosample!                                       
$ bs upload dataset -p 27943921 MyBioSample_S1_L001_R1_001.fastq.gz MyBioSample_S1_L001_R2_001.fastq.gz               
Creating sample: MyBioSample
MyBioSample_S1_L001_R1_001.fastq.gz 1.07 GiB / 1.07 GiB [=========================================================] 100.00%
MyBioSample_S1_L001_R2_001.fastq.gz 1.11 GiB / 1.11 GiB [=========================================================] 100.00%
Upload complete

# note that datasets and created asynchronously and there can be a delay
$ bs list dataset --input-biosample="MyBioSample"  
|    Name     |                 Id                  | Project.Name |   DataSetType.Id    |
| MyBioSample | ds.94f7e9663e86473c8582dcf85a830195 | MyProject    | illumina.fastq.v1.8 |

Force the upload to attach FASTQS to the specified biosample, regardless of their name

$ ls
valid_S1_L001_R1_001.fastq.gz   valid_S1_L001_R2_001.fastq.gz
# note that you need a project ID here, not a project name as you did when you created the biosample!
$ bs upload dataset --biosample-name="MyBioSample" -p 27943921 valid_S1_L001_R1_001.fastq.gz valid_S1_L001_R2_001.fastq.gz
MyBioSample_S1_L001_R1_001.fastq.gz 1.07 GiB / 1.07 GiB [=========================================================] 100.00%
MyBioSample_S1_L001_R2_001.fastq.gz 1.11 GiB / 1.11 GiB [=========================================================] 100.00%
Upload complete
$ /tmp/BSCLI/amd64-darwin/bs list dataset --input-biosample="MyBioSample"
|    Name     |                 Id                  | Project.Name |   DataSetType.Id    |
| MyBioSample | ds.94f7e9663e86473c8582dcf85a830195 | MyProject    | illumina.fastq.v1.8 |
| MyBioSample | ds.64706f7d2e504e1c9495c00c468d6640 | MyProject    | illumina.fastq.v1.8 |

Note that even though the dataset has a name to match the biosample, the files within retain their original names:

$ bs dataset contents -i ds.94f7e9663e86473c8582dcf85a830195
|     Id     |              FilePath               |
| 8652306587 | MyBioSample_S1_L001_R1_001.fastq.gz |
| 8652306586 | MyBioSample_S1_L001_R2_001.fastq.gz |
$ bs dataset contents -i ds.64706f7d2e504e1c9495c00c468d6640
|     Id     |           FilePath            |
| 8652572270 | valid_S1_L001_R2_001.fastq.gz |
| 8652572269 | valid_S1_L001_R1_001.fastq.gz |

Uploading directories full of fastq files

The bs upload dataset supports a --recursive option that scans a directory and its subdirectories looking for fastq files:

$ ls
MyBioSample2_S1_L002_R1_001.fastq.gz    MyBioSample2_S1_L002_R2_001.fastq.gz    MyBioSample_S1_L001_R1_001.fastq.gz MyBioSample_S1_L001_R2_001.fastq.gz
$ bs upload dataset -p 21646627 --recursive .
Creating sample: MyBioSample2
MyBioSample2_S1_L002_R1_001.fastq.gz 1.08 GiB / 1.08 GiB [==========================================================] 100.00%
MyBioSample2_S1_L002_R2_001.fastq.gz 1.11 GiB / 1.11 GiB [==========================================================] 100.00%
Upload complete
Creating sample: MyBioSample
MyBioSample_S1_L001_R1_001.fastq.gz 1.07 GiB / 1.07 GiB [==========================================================] 100.00%
MyBioSample_S1_L001_R2_001.fastq.gz 1.11 GiB / 1.11 GiB [==========================================================] 100.00%
Upload complete 

By default, these will be automatically grouped by name and uploaded as many individual fastq datasets, but you can also force these to all be uploaded to the same biosample:

$ bs upload dataset -p 21646627 --recursive . --biosample-name=MyBioSample
Creating sample: MyBioSample
MyBioSample2_S1_L002_R2_001.fastq.gz 1.11 GiB / 1.11 GiB [==========================================================] 100.00%
MyBioSample2_S1_L002_R1_001.fastq.gz 1.08 GiB / 1.08 GiB [==========================================================] 100.00%
MyBioSample_S1_L001_R2_001.fastq.gz 1.11 GiB / 1.11 GiB [==========================================================] 100.00%
MyBioSample_S1_L001_R1_001.fastq.gz 1.07 GiB / 1.07 GiB [==========================================================] 100.00%
Upload complete

Uploading a run

Uploading a run requires an additional authentication scope CREATE RUNS:

$ bs authenticate -c run-upload --scopes "CREATE RUNS"

The upload run command takes a run folder and uploads it to Sequence Hub:

$ bs -c run-upload upload run -n MyNewRunName -t HiSeqX /path/to/runFolder

The run folder must contain standard Illumina run files:

  • RunParameters.xml
  • RunInfo.xml

A run sample sheet file (SampleSheet.csv) is recommended to kick-off automatic FASTQ Generation once the run upload has completed.

Runs often consist of many small files, to optimise the upload you may want to tune concurrency settings, for example by increasing --concurrent-files and possibly decreasing --concurrent-parts.

Instrument types

The upload run command requires a named instrument type (--instrument / -t). Valid options for this flag include:

  • HiSeq family: HiSeq1000, HiSeq1500, HiSeq2000, HiSeq2500, HiSeq3000, HiSeq4000, HiSeqX
  • NovaSeq family: NovaSeq5000, NovaSeq6000
  • Other instruments: MiniSeq, MiSeq, MiSeqDx, NextSeq, NextSeqDx, NextSeq2000, iSeq100

Uploading arbitrary data

You can use bs upload dataset to upload any arbitrary files by supplying the --type common.files option:

$ ls
testfile1.txt   testfile2.txt
$ bs upload dataset -p 21646627 --type common.files --recursive .
Creating dataset: testfile1.txt+
testfile2.txt 2.76 KiB / 2.76 KiB [==========================================================] 100.00%
testfile1.txt 2.31 KiB / 2.31 KiB [==========================================================] 100.00%
Upload complete

If no name for the dataset is specified, a name will be created based on one of the files to be uploaded. You can specify a name with the --name option.

Controlling upload options

The bs upload dataset command contains a number of options to control how the upload is conducted and reported. For example, to upload with high concurrency and no progress bars:

$ bs upload dataset -p 21646627 --type common.files --no-progress-bars --concurrency=high --recursive .

Selective filtering of uploads and downloads

Simple filtering by extension

To only download a set of file extensions, such as BAMs and VCFs, you can supply the --extension flag to download commands multiple times. This will only pull files with names ending in the given suffixes:

$ bs download dataset -n MyDataSetName --extension bam --extension bai --extension vcf.gz --extension vcf.gz.tbi -o downloads

For uploading runs, there's a --skip-ext option for excluding files from the set being uploaded, and --skip-dir for omitting directories:

$ bs upload run -n MyRun -t HiSeqX --skip-ext jpg --skip-dir Thumbnail_Images /path/to/run-folder

For more complex filtering when uploading or download a set of files, see the Advanced filtering section.

Advanced filtering

Upload and download offer --include and --exclude flags for flexible filtering of file sets. Use these flags multiple times to have full control over which subset of files is upload or downloaded for a given command. Include and exclude patterns are not regular expressions but simple UNIX shell patterns as implemented by fnmatch:

  • * matches any string including path separators
  • ? matches any single character
  • [] defines a character set, so [AB] matches one of A or B
  • ! or ^ negates a match, so [!1] matches any single character except 1

Note that to prevent your shell from expanding wildcards, you may need to use single quotes around your include and exclude patterns. Do not use single quotes when running under Windows CMD.

The following examples will use this directory set up to demonstrate how these filters work:

├── NA12877.bam
├── NA12877.bam.bai
├── NA12878.bam
├── NA12878.bam.bai
├── NA12878.vcf.gz
├── NA12878.vcf.gz.tbi
├── metrics
│   ├── exome-coverage.csv
│   └── wgs-coverage.csv
├── plots
│   └── coverage-histogram.png
└── summary.csv

Filters are applied in the order in which they are supplied on the commandline, starting from a position of including all files. That means that only using --include flags will have no effect on the files being uploaded — they're already included. To selectively include files, you may want to exclude everything first.

Command File set
--exclude '*' --include '*.bam*' 
├── NA12877.bam
├── NA12877.bam.bai
├── NA12878.bam
└── NA12878.bam.bai
--exclude '*' --include '*.csv'
├── metrics
│ ├── exome-coverage.csv
│ └── wgs-coverage.csv
└── summary.csv
--exclude 'plots/*' --exclude 'metrics/*'
--exclude '*/*'
├── NA12877.bam
├── NA12877.bam.bai
├── NA12878.bam
├── NA12878.bam.bai
├── NA12878.vcf.gz
├── NA12878.vcf.gz.tbi
└── summary.csv

A given file may be included and excluded by multiple conditions, its final inclusion status is determined after applying all include and exclude patterns in the order supplied from left to right. Note that patterns are relative to the root directory being uploaded and are matched against the full relative path of each file being considered. Here are some more examples to demonstrate how this works:

Command File set
--exclude '*' --include '*.bam' --exclude 'NA12877*'
--exclude '*' --include NA12878.bam
└── NA12878.bam
--exclude '*' --include 'metrics/*' --exclude 'metrics/exome*'
└──  metrics
  └── wgs-coverage.csv

You can also use negative match characters to filter a file set:

Command File set
--exclude '*' --include 'NA1287[!8]*'
--exclude '*' --include 'NA1287[^8]*'
├── NA12877.bam
└── NA12877.bam.bai

Deleting data

Basic deletion

Many core BSSH entities can be deleted with a bs delete command

# delete dataset by ID
$ bs delete dataset -i ds.123
# delete project by name
$ bs delete project -n MyProject

You can see which entities support deletion by running bs delete:

$ bs delete
Please specify one command of: appresult, appsession, dataset, lane, project, property, run or workflow

Note that some examples are deletion of configuration for the automated workflow feature, rather than deletion of entities (bs delete lane, bs delete workflow)

Deleting properties

To delete a property, you need to specify the entity that owns the property, as well as the property name you want to delete:

# show the properties in a project
$ bs list properties project --name="Project test"
|          Name           | Description |  Type  |   Content    |
| myproperty.testproperty |             | string | "testvalue2" |
# delete a property - specify both project and property name
$ bs delete properties project --name="Project test" --property-name="myproperty.testproperty"
# list again - the property has disappeared
$ bs list properties project --name="Project test"

Deleting bulk files but retaining other information

Some entities (runs and datasets)support deleting the space-consuming files from an entity, but retaining other information. This is achieved with the --preserve-metadata switch.

# this will delete the files in ds.123, but keep the metrics,
$ bs delete dataset -i ds.123 --preserve-metadata
# after deleting with preserve metadata, you'll still be able to see dataset attributes 
$ bs list attributes dataset -i ds.123
# but the file contents will be gone
$ bs contents dataset -i ds.123

If you delete a run preserving metadata, a number of key files are also retained, including XML files that describe how a run was configured and the interop files, which allow the graph views of a run to be viewed. This is ideal if you want to hugely reduce the data footprint of a run by deleting BCL files but leave behind the metrics for long-term trending and record-keeping

Bulk deletion by age

A common pattern that can be useful is to delete BCL files in runs above a certain age. This can be achieved with the following combination:

# delete BCL files in runs older than 30 days,
# whilst retaining interops and other metadata
$ bs list runs --older-than=30d --terse | xargs -n1 bs delete run --preserve-metadata -i

Archiving data

Archiving is a Sequence Hub feature which migrates data to lower-cost cold storage, useful for data that cannot be deleted but does not need to be accessed in the near future. Archived data must be restored before it can be used either as an input to a Sequence Hub application or downloaded for local use. For more information about storage costs, see the iCredits information page.

Archiving runs and datasets

Use the archive command to send data to long-term storage:

$ bs archive run -i 123456
$ bs archive dataset -i ds.123

Use the IsArchived field to check whether runs or datasets have been archived:

# show all archived runs
$ bs list runs --filter-field IsArchived --filter-term true

Similar to deletion, archival will only move the Data/ directory of run files, while InterOps and other metadata will remain accessible. For datasets, all files will be archived but metadata including any dataset attributes will remain available.

Restoring data from the archive

To regain access to archived data, use the unarchive command:

$ bs unarchive run -i 123456
$ bs unarchive dataset -i ds.123

Note that the restore process can take up to several days to complete. The IsArchived field will remain true until the data has been fully restored.

Bulk archival

As with other CLI commands, it's easy to combine archive with the list command for powerful results:

# archive all runs which are over a year old
$ bs list runs --older-than 1y --terse | xargs -n1 bs archive run -i

# archive all datasets in a given project
$ bs list datasets --project-id 123456 --terse | xargs -n1 bs archive dataset -i

Configure Automated Workflows

The analysis workflow feature of BSSH allows apps to be launched automatically when they meet a set of conditions or dependencies. The feature also allows automated quality control to be applied so that the appsession is automatically marked as "QCPassed" or "QCFailed" based on the metrics it generated.

Before an app can be launched automatically in this way, a "workflow" needs to be created which wraps the app and describes its dependencies and (optionally) any QC thresholds. The workflow takes as input a template appsession, an app launch that has been configured and launched manually using the desired settings; automated launches for this workflow will be based on these settings.

It is also possible to chain automated app launches by creating another workflow for the downstream app and creating a dependency on the upstream step.

Generate a Token

The MANAGE APPLICATIONS scope is required to create and work with workflows. If you do not already have a CLI configuration with this scope, generate a new one. In the following example commands the new config is named docs-demo:


Create an Analysis Workflow

Creating an analysis workflow uses the create workflow command. The below example commands contain example values for appsession IDs and other fields which will need to be changed for your usage.

Fixed parameters for the analysis workflow will be taken from a template appsession provided via the --appsession-id option. This appsession should be one previously launched with each of the settings required by your analysis workflow; for example, if you want your analysis workflow to use the hg19 human reference genome setting, that is what this template appsession must have been launched with.

$ bs -c docs-demo workflow create -n TestWorkflow -d CLICreated --appsession-id 123456789

The value returned is the ID of the newly-created workflow. You can see all available workflows using the list applications command:

$ bs -c docs-demo list applications --category=workflow
|     Name     |   Id    | VersionNumber |
| TestWorkflow | 3978975 | 1.0.0         |

Set Dependencies

The below commands use the analysis workflow ID from the example in the previous section. For your usage you will need to supply your own analysis workflow ID.

This example will demonstrate adding two dependencies, both of which will then gate the launching of the analysis workflow: 1. A biosample with at least 1 megabase pairs of yield 2. A completed DRAGEN Germline appsession

Create a biosample yield dependency

In order to launch the analysis workflow with a biosample input, we must add a "BioSample Yield" dependency:

$ bs -c docs-demo workflow dependency add biosample-yield --chooser-id=automation-sample-id --can-use-primary-biosample --required-yield 100000 -i 3978975

An optional --required-yield parameter is used to specify that the analysis workflow should only be launched once the biosample has a yield of at least 10 megabase pairs. It's a good idea to set a required yield to prevent launching with empty or in-progress biosamples.

Create an app completion dependency

In this example, our analysis workflow also requires a file generated by an upstream Sequence Hub application, specifically DRAGEN Germline v3.8.4 which has application ID 11786775 on US Sequence Hub.

Look up the AppVersionSlug value of the upstream application you want to use:

$ bs -c docs-demo get application -i 11786775
| AppFamilySlug              | illumina-inc.dragen-germline                                                           |
| AppVersionSlug             | illumina-inc.dragen-germline.3.8.4                                                     |
| Id                         | 11786775                                                                               |
| VersionNumber              | 3.8.4                                                                                  |

Find the name of the parameter that will accept the file from the upstream application.

$ bs launch application -i 123456 --list
|             Option             |    Type     | Choices | Default | Multiselect | Required |
| vcf-file                       | FileChooser | <File>  |         | false       | false    |

The input file parameter is named vcf-file. We can add an "App Completion" dependency to our existing workflow as follows:

$ bs -c docs-demo workflow dependency add app-completion -i 3978975 --application-id illumina-inc.dragen-germline.3.8.4 --chooser-id vcf-file --qc-pass --file-selector='.*\\.vcf\\.gz$|.*\\.vcf$'

A regular expression supplied using the --file-selector argument will be used to find which single file will be used from the set of output files generated by the upstream application. This pattern must match exactly one file otherwise the launch will fail.

The --qc-pass flag ensures that the dependency will only be met if the upstream appsession automated QC has passed successfully.

Review the analysis workflows dependencies using the workflow dependency export command:

$ bs -c docs-demo workflow dependency export -i 3978975
    "Type": "BioSampleYield",
    "Attributes": {
      "BioSampleChooserId": "sample-id",
      "CanUsePrimaryBioSample": false,
      "Label": "",
      "LibraryPrepId": "",
      "MixLibraryTypesAllowed": false,
      "RequiredYield": 100000
    "Dependencies": null
    "Type": "AppCompletion",
    "Attributes": {
      "ApplicationId": "illumina-inc.dragen-germline.3.8.4",
      "CanUsePrimaryResource": false,
      "ColumnId": "",
      "Label": "",
      "RequireQcPass": true,
      "ResourceChooserId": "vcf-file"
    "Dependencies": null

In summary, our analysis workflow now has two launch conditions: 1. A biosample with at least 1 megabase pairs of yield 2. A completed DRAGEN Germline appsession, run on that same biosample, which has written a VCF and has passed QC

When both of those conditions are met, our analysis workflow will launch.

Create a specific resource dependency for fixed file inputs

Most analysis workflow settings are copied from their template appsession. One exception is any user-provided file set via File Chooser app controls. These need to be set explicitly as "Specific Resource" dependencies, such that if these files are deleted the dependency will no longer be met and the analysis workflow will not launch successfully.

$ bs workflow dependency add specific-resource -i <analysisWorkflowID> --chooser-id <appOptionID> --resource-reference v1pre3/files/<fileID>

Note that this is used for files which will be fixed for every execution of the analysis workflow, for files or datasets that are output by a previous step in a chain of analysis workflows you should instead use an App Completion dependency.

Set QC Thresholds

Automated QC makes use of registered dataset metrics, so can be set on any value shown by the dataset attributes list command. For example, to auto-QC an analysis workflow wrapping a DRAGEN Germline application, the available metrics can be viewed with:

$ bs -c docs-demo dataset attributes list -i ds.abcdef12345678910987654
|                          Name                          |       Value       |
| number_of_duplicate_marked_reads_pct                   | 12.19             |
| paired_reads_different_chromosomes_mapq_gt_eq_10_pct   | 0.78              |
| pct_of_genome_with_coverage_10x_inf                    | 96.11             |
| secondary_alignments                                   | 0                 |

View the dataset type ID for a dataset using get datatset:

$ bs -c docs-demo get dataset -i ds.abcdef12345678910987654 -F DataSetType.Id
| DataSetType.Id | illumina.dragen.complete.v0.3.1 |

This type ID is used to reference dataset metrics in the threshold definitions. In this example, the thresholds are configured in the following CSV file:

$ cat /tmp/qcthresholds.csv

Available operators are:

  • Equal / NotEqual
  • LessThan / LessThanOrEqual
  • GreaterThan / GreaterThanOrEqual
  • Between / Outside (requires two threshold values)

To register these thresholds with your analysis workflow, use the workflow threshold import command:

$ bs -c docs-demo workflow threshold import -f /tmp/qcthresholds.csv -i 3978975

Accession Biosample

To create a new instance of your workflow, create a biosample with that workflow. Note that you need to use the project name not the ID here - this is to match the manifest import mechanism.

$ bs -c docs-demo biosample create -p "MyProject" -n "MyBiosample" --analysis-workflow "TestWorkflow"

Inspect the Biosamples you've just created

$ bs -c docs-demo list biosamples --newer-than=1d

Find the appsession that's been created as your workflow

Setting different fields here shows you extra information about the status. Note that you need to use the biosample ID and not the name here.

$ bs -c docs-demo appsession list --input-biosample=$BIOSAMPLEID -F Id -F Status -F StatusSummary

Upload Fastq files against that biosample

Uploading data against your new biosample can be carried out as listed in a previous example. This will register yield against this biosample that can trigger an app launch if the workflow has a yield dependency.

Launching Apps

This section provides some examples for manual app launch at the command line. This is distinct for configuring apps for automated apps, as provided in an earlier example.

The CLI parsing app forms giving the user access, via the command line, to view and configure options for the relevant app.

View the available options for an app

For a given app name and version, the --list flag will return a large table showing all available options for a commandline launch:

$ bs launch application -n "Whole Genome Sequencing" --app-version 7.0.1 --list
|         Option         |      Type      |                                   Choices                                   |                                  Default                                   | Multiselect | Required |
| app-session-name       | TextBox        | <String>                                                                    | Example [LocalDateTime]                                                    | false       | true     |
| project-id             | ProjectChooser | <Project>                                                                   |                                                                            | false       | true     |
| sample-id              | SampleChooser  | <Sample>                                                                    |                                                                            | true        | false    |
| bam-file-id            | FileChooser    | <File>                                                                      |                                                                            | true        | false    |
| reference-genome       | Select         | /data/scratch/hg19/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta,        | /data/scratch/hg38/Homo_sapiens/NCBI/GRCh38Decoy/Sequence/WholeGenomeFasta | false       | true     |
|                        |                | /data/scratch/GRCh37/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta, |                                                                            |             |          |
|                        |                | /data/scratch/hg38/Homo_sapiens/NCBI/GRCh38Decoy/Sequence/WholeGenomeFasta  |                                                                            |             |          |
| enable-variant-calling | CheckBox       | 1                                                                           | 1                                                                          | false       | false    |
| enable-sv-calling      | CheckBox       | 1                                                                           | 1                                                                          | false       | false    |
| enable-cnv-calling     | CheckBox       | 1                                                                           | 1                                                                          | false       | false    |
| annotation-source      | RadioButton    | ensembl, refseq, both, none                                                 | ensembl                                                                    | false       | false    |

Build a launch command

To launch an app, supply the app name and version along with any settings using the --option / -o flag in the format optionName:value. The launch command expects "New" entities for all inputs, such as biosamples and datasets rather than samples and appresults. Sequence Hub entities can be referred to in an app launch by their unique ID, while other launch arguments can accept plain text:

$ bs launch application -n 'Whole Genome Sequencing' --app-version 7.0.1 \
-o project-id:1232 -o bam-file-id:3534333 -o annotation-source:both \
-l "My test appsession"

You can also launch apps by their ID:

$ bs launch application -i 5143138 -o project-id:1232 -o bam-file-id:3534333

If an option accepts multiple arguments (check with --list), you can supply these as comma-separated values:

$ bs launch application -i 5143138 -o project-id:1232 -o bam-file-id:3534333,232321

Launching an app with biosamples

In most cases, biosamples can be passed to a launch command by just their ID:

$ bs launch application -n 'Whole Genome Sequencing' --app-version 7.0.1 \
-o project-id:1232 -o sample-id:2323244

If a biosample contains FASTQ datasets with a mix of library preps however, you will need to specify the library prep ID for the FASTQ datasets you wish to launch with in the format:

-o sample-id:343432/librarypreps/1014015

Launching an app with advanced form controls


Apps such as Tumor Normal v5 use a ResourceMatcher to submit matched pairs of WGS datasets. For CLI launch, the format for these fields is:

-o input-id:'col1_dataset1,col1_dataset2;col2_dataset1,col2_dataset2'

Here commas seperate multiple inputs and a semi-colon delimits a column, so the above string would render as the following table if launched through the Sequence Hub web user interface:

Normal            Tumor
col1_dataset1     col2_dataset1
col1_dataset2     col2_dataset2


TabularFieldsets are sophisticated form controls shown as an expandable table of sub-controls when launching an application through the web user interface. To set options for this type of controls through the CLI, the format is: -o tabularFieldsetControlName.subControlName:value

For example, VCAT v2.3.0 uses a TabularFieldset named sample-pairs containing a FileChooser named file-id and a TextBox named file-label (as shown by --list). This application can be launched as shown:

$ bs launch application -n "Variant Calling Assessment Tool" --app-version 2.3.0 \
-o sample-pairs.file-id:11232444,11232445 -o sample-pairs.file-label:vcf1,vcf2 \


DisplayFields are additional controls that pop-up underneath SampleChoosers and AppResultChoosers — these are not yet supported by the launch API. The CLI will warn you if the app you're trying to launch uses these, for example:

$ bs launch application -n 'SPAdes Genome Assembler' --app-version 3.5.0 --list
WARNING: Input field 'sample-id' uses DisplayFields which are not yet supported, you may not be able to launch this app !

Application launch FAQs

How can I find biosample/project/dataset IDs?

The CLI can get many entities by name and return their ID:

$ bs get project -n MyProjectName --terse
$ bs get biosample -n MyBioSampleName --terse
$ bs get dataset -n MyDatasetName --terse

Alternatively you can retrieve IDs via list commands, for example:

$ bs list datasets --project-name ProjectContainingMyDataset

Project and Biosample IDs are also visible in URLs when browsing Sequence Hub through the web user interface.

How can I find a file ID?

You can list the contents of an appresult or dataset to get the file IDs:

$ bs contents appresult -i 12224237 | head
|     Id     |                               FilePath                               |
| 8618632807 | Plots/s_0_1_2212_MismatchByCycle.png                                 |
| 8618632806 | Plots/s_0_1_2113_MismatchByCycle.png                                 |
| 8618632805 | Plots/sorted_S1_G1_chr2_MismatchByCycle.png                          |
| 8618632804 | Plots/s_0_1_1115_MismatchByCycle.png                                 |
| 8618632803 | Plots/s_1_1_2108_MismatchByCycle.png                                 |
| 8618632802 | Plots/s_0_2_1107_MismatchByCycle.png                                 |
| 8618632801 | Plots/s_0_1_1112_NoCallByCycle.png                                   |

You can also use BaseMount to find the file by navigating to the project and appresult which generated it:

${BASEMOUNT}/Projects/<project>/appresults/<appsession name>/Files/file.vcf

Then use the Files.metadata directory to get the ID:

$ cat ${BASEMOUNT}/Projects/<project>/appresults/<appsession name>/Files.metadata/file.vcf/.id  

How can I change my analysis name?

Analysis name is a special parameter, best practice is to set your argument through both the form control and via the --appsession-label / -l argument:

bs application launch -n "Whole Genome Sequencing" --app-version 7.0.1 \
-o app-session-name:"Your appsession name" -l "Your appsession name" \

How can I fix an 'AppSession name is required' error?

This error is due to an outdated CLI version which is no longer compatible with application launch. Please update to the latest CLI version.