NASA GeneLab Open API

View level

This part of the URL is required.

The "assays" view generates overview tables; each column represents a property (such as a factor value), and each row represents one assay, followed by boolean values (True/False) denoting whether a property is associated with the assay.
- The fields id.accession and id.assay name are always included;
- There are two levels of column names:
  - The bottom one is the target nested field;
  - the top one is the preceding nested fields, joined by a period, e.g.:
    id → accession, study.characteristics → age.
- This tabular data can be represented in CSV, TSV, JSON, and interactive formats.
The "samples" view generates annotation (metadata) tables; each column represents a property (such as a factor value), and each row represents one sample, followed by its property values.
- The output format is analogous to the "assays" view.
- This tabular data can be represented in CSV, TSV, JSON, and interactive formats.
The "data" view outputs data associated with sample(s).
- All tabular formats (CSV, TSV, JSON) are supported if the underlying data is a table, and contain three levels of column names, from top to bottom:
  - accession (e.g., GLDS-1);
  - assay name (e.g., E-GEOD-53196_GeneChip_assay);
  - sample name (e.g., Dmel_OR_wo_FLT_uninfd_Rep1).
  Thus, the header corresponds to the leftmost three columns (index) of the "samples" view – the "id" fields.
- Otherwise, if a single non-tabular file matches the query, it can be returned raw (see formats below).

Example: /samples/ (we will build upon this below).

Retrieval of entries from specific datasets and/or assays

& id= .

This part of the URL, as well as all following parts, can be specified any number of times (including zero times).

The search can be constrained to only the samples from specific datasets (id.accession=ACCESSION) and/or assays (id.assay name=ASSAY)
As a shorthand, it is also possible to pass a mixed query, where accessions and assay names are separated by a forward slash, and multiple accessions / assay names are joined by a vertical pipe (id=ACCESSION_1/ASSAY_1A|ACCESSION_2)

Example: /samples/?id.accession=GLDS-38
Example: /samples/?id=GLDS-38/proteomics|GLDS-276

Retrieval of entries with metadata categories

This interactive widget is temporarily disabled pending performance improvements and backend refactoring.
Please refer to the examples below.

Information on all values under a given ISA-Tab category can be retrieved:

Querying directly by any of these categories (e.g., assay.parameters) will constrain the results to only the datasets, assays, and samples that have this category annotated.
Under view levels "assays" and "samples", the values of fields in the category will be reported in table columns.

Example: /samples/?study.factor value

Retrieval of entries with metadata fields

This interactive widget is temporarily disabled pending performance improvements and backend refactoring.
Please refer to the examples below.

Each ISA category contains multiple fields:

The logical AND (inclusion of several fields) is achieved by passing them at the same time (=x.a.b&=y.c.d).
The logical OR of several fields within a single category is achieved by passing the target fields joined by a vertical pipe (=x.a.b|d).
Under view levels "assays" and "samples", the values of any such field will be reported in table columns.
The leading "=" (=x.a.b) effectively queries for existing values of the field(s), i.e. constrains the results to to only the datasets, assays, and samples that have this field annotated (with non-NaN values).
Without the leading "=" (x.a.b), the columns may contain NaN values (i.e., this constrains to the columns, but not to values within columns).

Example: /samples/?=study.factor value.spaceflight Example: /samples/?study.factor value.spaceflight Example: /samples/?=study.factor value.radiation dose|absorbed radiation dose Example: /samples/?study.factor value.radiation dose|absorbed radiation dose Example: /samples/?study.factor value.radiation dose|absorbed radiation dose&=assay.factor value.radiation type

Retrieval of entries with metadata field values

This interactive widget is temporarily disabled pending performance improvements and backend refactoring.
Please refer to the examples below.

The search can be constrained to only the samples that are annotated with specific value(s) of an ISA field.

The logical AND of several conditions is achieved by passing them at the same time (x=a&y=b).
The logical OR for a single condition is achieved by passing the target values joined by a vertical pipe (x=a|b).
Under view levels "assays" and "samples", the values of target field(s) will be reported in table columns.

Example: /samples/?study.characteristics.genotype=WT
Example: /samples/?study.characteristics.genotype=WT|TK6

Retrieval of files or data

& file.filename=

& file.datatype

& file.datatype=

For the resultant queried datasets, assays, and samples, there may be files associated.

All files with recognized datatypes can be queried by passing file.datatype.
The query can be constrained to files of particular datatypes by specifying these datatypes in the query (file.datatype=pca).
Finally, the query can be constrained by specifying a full file name of interest or a regular expression pattern enclosed in forward slashes.

Examples:
/samples/?file.datatype
/samples/?file.datatype=differential expression
/samples/?file.datatype=visualization table|pca
/samples/?file.filename=/trimmed\.fastq\.gz$/

Display formats

& format=

& schema=

Data returned for the query can be formatted in multiple ways, depending on the output type.
Refer to the table below for a matrix of valid requested formats.

Note that annotation columns only appear in the "assays" and "samples" views, while files are only sourced from for the "data" view.

View	Number of annotation columns in output	Number of files data is sourced from	Resultant output type	&format=						&schema=
View	Number of annotation columns in output	Number of files data is sourced from	Resultant output type	csv^*	tsv	json	raw	cls	gct	0^*	1
/assays/	1		table	yes	yes	yes	no	yes	no	yes	yes
/assays/	>1		table	yes	yes	yes	no	no	no	yes	yes
/samples/	1		table	yes	yes	yes	no	yes	no	yes	yes
/samples/	>1		table	yes	yes	yes	no	no	no	yes	yes
/data/		1	table	yes	yes	yes	yes	no	maybe¹	yes	yes
/data/		>1	table	maybe²	maybe²	maybe²	no	no	maybe^1,2	maybe²	maybe²
/data/		1	other	no	no	no	yes	no	no	no	no
/data/		>1	other	no	no	no	no	no	no	no	no

^* Default
¹ Only for transcription profiling data
² Only for data that can be merged across assays (currently only unnormalized counts RNA-seq data)

Character-separated formats (&format=csv, &format=tsv):
- Tabular results can be represented in comma-separated (CSV) or tab-separated (TSV) formats.
- Header lines (column names) are preceded with a hash sign (#):
  - Annotation headers (i.e., for views "assays" and "samples") contain two levels of column names:
    - The bottom one is the target nested field
    - The top one is the preceding nested fields, joined by a period, e.g.:
      id → accession, study.characteristics → age
  - Headers of tabular data (for view "data") contain three levels of column names, from top to bottom:
    - accession (e.g., GLDS-1);
    - assay name (e.g., E-GEOD-53196_GeneChip_assay);
    - sample name (e.g., Dmel_OR_wo_FLT_uninfd_Rep1).
    Thus, the header corresponds to the leftmost three columns (index) of the "samples" view – the "id" fields.
- In all remaining lines,
  - string values are always quoted with double quotes ("value"),
  - numeric and boolean values are never quoted,
  - and missing values are always displayed as NaN (unquoted).
Example: /data/?study.characteristics.organism=Homo sapiens&file.datatype=unnormalized counts&format=tsv

JSON format (&format=json):
- JSON-formatted output is derived from the tabular output in split orientation, placing:
  - column names under the "columns" key,
  - index names under the "index" key,
  - cell values under the "data" key,
  - and additionally providing the "meta" key with index names.
- The following are treated as indices:
  - The leftmost two columns of the "assays" view (i.e., the "id" columns);
  - The leftmost three columns of the "samples" view (i.e., the "id" columns);
  - The leftmost column of the "data" view (e.g., ENSEMBL gene names).
- String values are always quoted in double quotes ("value"),
- numeric and boolean values are never quoted,
- and missing values are always displayed as NaN (unquoted).
Example: /samples/?study.factor value.spaceflight&id=GLDS-15&format=json

Raw format (&format=raw):
- Any /data/ output (tabular or not) can be requested raw, i.e. exactly as it is stored in the GeneLab database.
- Such requests are possible if the query resolves to exactly one underlying file (e.g. if the request is for a specific file name, or for a single data type from a single assay, etc).
Example: /data/?id=GLDS-4&file.filename=/.*normPCA.png$/&format=raw

CLS format (&format=cls):
- Annotation output (from views "assays" and "samples") can be represented in the CLS format if it contains exactly one non-id column (e.g., study.factor value.genotype).
Example: /samples/?study.factor value.genotype&format=cls

GCT format (&format=gct):
- Data output (from the "data" view) can be represented in the GCT format if it contains transcriptomics data (i.e., microarrays or RNA-seq counts);
- in the output, the id information (accession, assay name, sample name) is joined with a forward slash (/).
Example: /data/?file.datatype=normalized counts&id=GLDS-38&format=gct

Schema (&schema=1):
- Rather than retrieving the entire table, a description of tabular data can be requested.
- The output contains the same header as the full table, followed by a single row, where each cell contains a string of form type[(minimum)..(maximum)|NaN].
  - If type is "str", minima and maxima are omitted;
  - if the column does not have any missing values, NaN is omitted.
Example: /data/?id=GLDS-4&file.datatype=differential%20expression&format=tsv&schema=1

Table of contents

Structure

URL and query components