PAST PROJECT

NSforest: A Machine Learning Method to Identify Marker Genes from Single Cell/Single Nuclei RNA Sequencing Data

Cells are fundamental functional units of multicellular organisms, with different cell types playing distinct physiological roles in the body. The recent advent of single cell transcriptional profiling using RNA sequencing is producing "big data," enabling the identification of novel human cell types at an unprecedented rate.

NSforest is a method based on random forest machine learning for identifying sets of necessary and sufficient marker genes, which can be used for quantitative PCR and multiplex FISH, and to assemble consistent and reproducible cell type definitions for incorporation into the Cell Ontology (CL). The representation of defined cell type classes and their relationships in the CL using this strategy will make the cell type classes findable, accessible, interoperable, and reusable (FAIR), allowing the CL to serve as a reference knowledgebase of information about the role that distinct cellular phenotypes play in human health and disease.

Funding

This work is funded by the Chan Zuckerberg Initiative DAF under grant no. 2018-182730.

NSforest: A Machine Learning Method to Identify Marker Genes from Single Cell/Single Nuclei RNA Sequencing Data

Funding

Principal Investigator

Key Staff

Collaborators

Related Research

In the News

Genomics pioneer J. Craig Venter launches Diploid Genomics, Inc. (DGI), ushering in a new era in human genomics

What's Happening

Reading the blueprint of life

Recently Published

Salivary Proteome Role in Infection and Immunity.