API Data Model Overview

The API Data model consists of the following major entities:

  • User: A user is the individual performing the analysis in BaseSpace.
  • Runs: A run is a collection of .bcl files associated with an individual flow cell and contains metrics and reads generated by the instrument on which the flow cell was sequenced
  • Projects: A project is a logical grouping of Samples and AppResults for a user.
  • Samples: A sample is a collection of .fastq files of Reads. Sample contains metadata about the physical subject. Samples files are generally used as inputs to applications.
  • AppResults: An AppResult is an output of an App. It contains BAMs, VCFs and other output file formats produced by an App.
  • AppSessions: AppSession extends information about each Sample and AppResult and allows grouping by showing the instance of an application. AppSessions are called Analyses in the BaseSpace UI, that is the name that the user will see.
  • Files: The files associated with a Run, Sample, or AppResult. All files for each resource are in the File resource.
  • Genomes: These are the reference genomes that exist in BaseSpace, this resource gives information about the origin and build for each genome.

Users

This resource exists to show the user who is accessing information on BaseSpace. A user's permissions to different projects and files on BaseSpace are found within their user Id. Refer to Users in the API Reference for more information.

Runs

Processing a flow cell on a sequencing instrument produces a variety of files, collectively referred to as a Run. A Run contains logfiles, instrument health data, and run metrics as well as the base call and other information used in secondary analysis. Runs contain .bcl files, which are demultiplexed in BaseSpace to create Samples files.

DataModel Runs

The BaseSpace API makes all these files available. Please refer to the API Reference page under Runs for more information.

Projects

Projects are the fundamental unit of organization in BaseSpace. Users may create projects, place items into projects, and share projects (and their contents) with other users. Project sharing has permission levels including READ and WRITE. There are no files within a Project, it contains only Samples and AppResults which each have files within them. Think of Projects as a folder that contains multiple subfolders of Samples and AppResults.

DataModel Projects

Projects can contain Samples and AppResults. More information is found in the API Reference Page under Projects.

Samples

Samples are the result of demultiplexing the output of a sequencing instrument flow cell run. Run .bcl files are demultiplexed to create FASTQ files, which are Samples. Samples are generally the inputs for an app. A Sample in BaseSpace contains metadata about how it was produced (i.e. from the sample sheet) and a collection of files representing all the reads in compressed FASTQ format. If specified in the sample sheet, the Sample will also have a field called HrefGenome indicating the preferred genome build. Please refer to the API Reference page under Samples for more information.

AppResults

AppResults is a general term for any result that is produced from other inputs. For example, if an app performed a resequencing workflow, the corresponding AppResults would be a collection of aligned reads (stored as BAM files) and variants (stored as VCF files). AppResults are the files written back to BaseSpace by an application after it has run on a set of files. The inputs for AppResults are generally Samples files but may also be Run files.

Every tool can produce a variety of output files, and the BaseSpace API makes these accessible from the AppResults. Please refer to the API Reference page under AppResults for more information.

AppSessions

AppSession extends information about each Sample and App Result and allows grouping. Accessed by calling GET: appsessions/{id}. AppSession shows the instance of an application, listing the inputs from BaseSpace to an application and references the output. Each Sample and AppResult references an AppSession as a means of grouping the creation of multiple Samples and App Results together. The AppSession contains information about how an App was started (including what inputs were used) as well as what Samples and AppResults were produced. It contains a status field which applies to all the Samples and AppResults. Each App Result/Sample inherits the status from the AppSession to allow for sorting and filtering. Please refer to the API Reference page under AppSessions for more information.

Files

A File in BaseSpace is what one would expect - a collection of attributes (date created, size, etc) and a data stream. The BaseSpace API lets an app read any file it have access to. In addition, certain files (VCF and BAM files) have additional information available. Runs, Samples, and App Results have references to their files. There are no files associated with Projects. Please refer to the API Reference page under Files for more information.

Genomes

In BaseSpace exist a few reference genomes for a user to compare their data with, or for an application to reference regarding a particular genome that is going to be analyzed. All of these genomes are archived on BaseSpace. They exist within the Genomes resource, which has more information about the genome's name, build, species name, and source. Sample responses will show which genome was referenced in the HrefGenome response parameter. Please refer to the Genomes portion of the API Reference for more information.

Properties

For every resource listed above (except Genomes), the API response also includes Properties that are attached to that resource. Within a resource's API response, there is always a Properties section that may or may not be populated. Properties are a way to tag the resource (File, Project, Run, Sample, AppResult, AppSession) with more metadata in order to make them more unique in some way. In addition, applications will generally store addition metadata as Properties for every resource that is created by the app. Please refer to the Properties portion of the API Reference for more information.