VDJML: A File Format with Tools for Capturing the Results of Inferring Immune Receptor Rearrangements

The genes that produce antibodies and the immune receptors expressed on lymphocytes are not germline encoded; rather, they are somatically generated in each developing lymphocyte by a process called V(D)J recombination, which assembles specific, independent gene segments into mature composite genes. The full set of composite genes in an individual at a single point in time is referred to as the immune repertoire. V(D)J recombination is the distinguishing feature of adaptive immunity and enables effective immune responses against an essentially infinite array of antigens. Characterization of immune repertoires is critical in both basic research and clinical contexts. Recent technological advances in repertoire profiling via high-throughput sequencing have resulted in an explosion of research activity in the field. This has been accompanied by a proliferation of software tools for analysis of repertoire sequencing data. Despite the widespread use of immune repertoire profiling and analysis software, there is currently no standardized format for output files from V(D)J analysis. Researchers utilize software such as IgBLAST and IMGT/High V-QUEST to perform V(D)J analysis and infer the structure of germline rearrangements.  However, each of these software tools produces results in a different file format and can annotate the same result using different labels. These differences make it challenging for users to perform additional downstream analyses. 

To help address this problem, we propose a standardized file format for representing V(D)J analysis results. The proposed format, VDJML, provides a common standardized format for different V(D)J analysis applications to facilitate downstream processing of the results in an application-agnostic manner.  The VDJML file format specification is accompanied by a support library, written in C++ and Python, for reading and writing the VDJML file format. 

The VDJML suite will allow users to streamline their V(D)J analysis and facilitate the sharing of scientific knowledge within the community. The VDJML suite and documentation are available from https://vdjserver.org/vdjml/. We welcome participation from the community in developing the file format standard, as well as code contributions.

Publications

BMC bioinformatics. 2016-10-06; 17.Suppl 13: 333.
VDJML: a file format with tools for capturing the results of inferring immune receptor rearrangements
Toby IT, Levin MK, Salinas EA, Christley S, Bhattacharya S, Breden F, Buntzman A, Corrie B, Fonner J, Gupta NT, Hershberg U, Marthandan N, Rosenfeld A, Rounds W, Rubelt F, Scarborough W, Scott JK, Uduman M, Vander Heiden JA, Scheuermann RH, Monson N, Kleinstein SH, Cowell LG
PMID: 27766961

Funding

This work is funded by the National Institute of Allergy and Infectious Diseases (NIH/DHHS) under grant no. AI097403.

Principal Investigator

Collaborators

Inimary Toby, Scott Christley, Mikhail Levin, Nancy Monson, William Rounds, Edward Salinas, and Lindsay Cowell
University of Texas Southwestern Medical Center

John Fonner and Walter Scarborough
Texas Advanced Computing Center

Sanchita Bhattacharya
University of California San Francisco

Felix Breden, Nishanth Marthandan, and Jamie Scott
Simon Fraser University, Canada

Adam Buntzman
University of Arizona School of Medicine

Brian Corrie
University of Auckland, New Zealand

Namita Gupta, Steven Kleinstein, Mohamed Uduman, and Jason A. Vander Heiden
Yale University

Uri Hershberg and Aaron Rosenfeld
Drexel University

Florian Rubelt
Stanford University School of Medicine

Related

Related Research