HTAN Bulk DNA Sequencing Data Standard

Overview

This page describes the data levels, metadata attributes, and file structure for bulk DNA sequencing.

Description of Assay

Bulk DNA sequencing produces the DNA sequence of a biological sample. The sequence is summarized into a list of variants in comparison to a given reference genome. This data model should be applicable to assays including bulk tumor Whole Genome Sequencing (WGS), bulk tumor Whole Exome Sequencing (WES), bulk cfDNA WES (cell free), bulk tumor targeted DNA sequencing, and bulk ctDNA targeted DNA sequencing.

Metadata Levels

The defined metadata leverages existing common data elements from the Genomic Data Commons (GDC). The HTAN data model currently supports Level 1, 2 and 3 DNA sequencing data:

Level Number

Definition

Example Data

1

Raw unaligned read data

FASTQ

2

Genome aligned reads

BAM

3

Sample level summary

VCF/ MAF

Data Schema:
Attribute
Label
Description
Bulk DNA Level 1
BulkDNALevel1
Bulk Whole Exome Sequencing raw files
Bulk DNA Level 2
BulkDNALevel2
Bulk Whole Exome Sequencing aligned files and QC
Bulk DNA Level 3
BulkDNALevel3
Bulk Whole Exome Sequencing called variants