Discovering Patterns in Sparse Datasets

Lung Cancer

Discovering hidden patterns in sparse mutation data from lung cancer patients

Extracting useful information from sparse, unlabeled datasets is an entrenched problem in machine learning. We have developed a novel method to tackle this problem and applied it to sparse, unlabeled, lung cancer data. In this dataset, more than 99% of the values are zero. We developed standardized representations of all participating genes (genes that are mutated in at least 1% of the population), found hidden patterns in the dataset, and built a 3D network graph showing associations between different genes. Our method is validated and it provides a detailed understanding of the genomic landscape of lung cancer. It can also be applied generally to other forms of human cancer ― all from datasets where more than 99% of the values are zero and no labels are available.

Read our paper [here](../../img/pdf/Sonar - Pattern Discovery in Oncogenomics 3-11-2019 Final.pdf). Watch our video here.

Do you need to extract insights from sparse, unlabeled datasets? We can help you find them. To learn more about Pattern Computer and how to partner with us, e-mail us at