VDJML: a file format with tools for capturing the results of inferring immune receptor rearrangements
Toby IT, Levin MK, Salinas EA, Christley S, Bhattacharya S, Breden F, Buntzman A, Corrie B, Fonner J, Gupta NT, Hershberg U, Marthandan N, Rosenfeld A, Rounds W, Rubelt F, Scarborough W, Scott JK, Uduman M, Vander Heiden JA, Scheuermann RH, Monson N, Kleinstein SH, Cowell LG
The genes that produce antibodies and the immune receptors expressed on lymphocytes are not germline encoded; rather, they are somatically generated in each developing lymphocyte by a process called V(D)J recombination, which assembles specific, independent gene segments into mature composite genes. The full set of composite genes in an individual at a single point in time is referred to as the immune repertoire. V(D)J recombination is the distinguishing feature of adaptive immunity and enables effective immune responses against an essentially infinite array of antigens. Characterization of immune repertoires is critical in both basic research and clinical contexts. Recent technological advances in repertoire profiling via high-throughput sequencing have resulted in an explosion of research activity in the field. This has been accompanied by a proliferation of software tools for analysis of repertoire sequencing data. Despite the widespread use of immune repertoire profiling and analysis software, there is currently no standardized format for output files from V(D)J analysis. Researchers utilize software such as IgBLAST and IMGT/High V-QUEST to perform V(D)J analysis and infer the structure of germline rearrangements. However, each of these software tools produces results in a different file format, and can annotate the same result using different labels. These differences make it challenging for users to perform additional downstream analyses.