publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2024
- ISMBIntegrating patients in time series clinical transcriptomics dataEuxhen Hasanaj, Sachin Mathur , and Ziv Bar-JosephApr 2024
Analysis of time series transcriptomics data from clinical trials is challenging. Such studies usually profile very few time points from several individuals with varying response patterns and dynamics. Current methods for these datasets are mainly based on linear, global orderings using visit times which do not account for the varying response rates and subgroups within a patient cohort. We developed a new method that utilizes multi-commodity flow algorithms for trajectory inference in large scale clinical studies. Recovered trajectories satisfy individual-based timing restrictions while integrating data from multiple patients. Testing the method on multiple drug datasets demonstrated an improved performance compared to prior approaches suggested for this task, while identifying novel disease subtypes that correspond to heterogeneous patient response patterns.
2023
- PreprintOptimal Transport for Mapping Senescent Cells in Spatial TranscriptomicsNam D. Nguyen , Lorena Rosas , Timur Khaliullin , and 10 more authorsAug 2023
Spatial transcriptomics (ST) provides a unique opportunity to study cellular organization and cell-cell interactions at the molecular level. However, due to the low resolution of the sequencing data additional information is required to utilize this technology, especially for cases where only a few cells are present for important cell types. To enable the use of ST to study senescence we developed scDOT, which combines ST and single cell RNA-Sequencing (scRNA Seq) to improve the ability to reconstruct single cell resolved spatial maps. scDOT integrates optimal transport and expression deconvolution to learn non-linear couplings between cells and spots and to infer cell placements. Application of scDOT to existing and new lung ST data improves on prior methods and allows the identification of the spatial organization of senescent cells, the identification of their neighboring cells and the identification of novel genes involved in cell-cell interactions that may be driving senescence.
2022
- NatureAgingNIH SenNet Consortium to Map Senescent Cells throughout the Human Lifespan to Understand Physiological HealthSenNet ConsortiumNature Aging, Dec 2022
Cells respond to many stressors by senescing, acquiring stable growth arrest, morphologic and metabolic changes, and a proinflammatory senescence-associated secretory phenotype. The heterogeneity of senescent cells (SnCs) and senescence-associated secretory phenotype are vast, yet ill characterized. SnCs have diverse roles in health and disease and are therapeutically targetable, making characterization of SnCs and their detection a priority. The Cellular Senescence Network (SenNet), a National Institutes of Health Common Fund initiative, was established to address this need. The goal of SenNet is to map SnCs across the human lifespan to advance diagnostic and therapeutic approaches to improve human health. State-of-the-art methods will be applied to identify, define and map SnCs in 18 human tissues. A common coordinate framework will integrate data to create four-dimensional SnC atlases. Other key SenNet deliverables include innovative tools and technologies to detect SnCs, new SnC biomarkers and extensive public multi-omics datasets. This Perspective lays out the impetus, goals, approaches and products of SenNet.
- PMLR NeurIPSAutoML Decathlon: Diverse Tasks, Modern Methods, and Efficiency at ScaleNicholas Roberts , Samuel Guo , Cong Xu , and 24 more authorsIn Proceedings of the NeurIPS 2022 Competitions Track , Nov 2022
The vision of Automated Machine Learning (AutoML) is to produce high performing ML pipelines that require very little human involvement or domain expertise to use. Competitions and benchmarks have been critical tools for accelerating progress in AutoML. However, much of the prior work on AutoML competitions has focused on well-studied domains in machine learning such as vision and language—these are domains which have benefited from several years of ML pipeline design by domain experts, which brings the usage of AutoML into question in the first place. Recently, AutoML for diverse tasks has emerged as an important research area that aims to bring AutoML to the domains where it can have the most impact: the long tail of ML tasks beyond vision and language. We present a retrospective report of the AutoML Decathlon—an AutoML for diverse tasks competition hosted at NeurIPS 2022. The AutoML Decathlon presented participants with a set of 10 machine learning tasks that are diverse along several axes: domain, input dimension, output dimension, output type, objective function, and scale. Participants were tasked with developing AutoML methods that performed well on a separate set of 10 hidden diverse test tasks within a certain time budget, so as to discourage overfitting to the initial set of tasks and to encourage efficiency. In this report, we outline the details of the competition, discuss the top-5 submissions, analyze the results, and compare top submissions to additional state-of-the-art baselines designed specifically for diverse tasks. We conclude that the combination of existing efficient AutoML techniques with modern advancements in ML such as large-scale transfer learning, modern architectures, and differentiable Neural Architecture Search (NAS) is a promising direction for AutoML for diverse tasks.
- Cell R. M.Multiset multicover methods for discriminative marker selectionEuxhen Hasanaj, Amir Alavi , Anupam Gupta , and 2 more authorsCell Reports Methods, Oct 2022
Markers are increasingly being used for several high throughput data analysis and experimental design tasks. Examples include the use of markers for assigning cell types in scRNA-seq studies, for deconvolving bulk gene expression data, and for selecting marker proteins in single cell spatial proteomics studies. Most marker selection methods focus on differential expression (DE) analysis. While such methods work well for data with a few non-overlapping marker sets, they are not appro- priate for large atlas-size datasets where several cell types and tissues are considered. To address this, we define the phenotype cover (PC) problem for marker selection and present algorithms that can improve the discriminative power of marker sets. Analysis of these sets on several marker se- lection tasks suggests that these methods can lead to solutions that accurately distinguish different phenotypes in the data.
- NatureCommInteractive single-cell data analysis using CellarEuxhen Hasanaj, Jingtao Wang , Arjun Sarathi , and 2 more authorsNature Communications 13:1, Apr 2022
Cell type assignment is a major challenge for all types of high throughput single cell data. In many cases such assignment requires the repeated manual use of external and complementary data sources. To improve the ability to uniformly assign cell types across large consortia, platforms and modalities we developed Cellar, a software tool that provides interactive support to all the different steps involved in the assignment and dataset comparison process. We discuss the different methods implemented by Cellar, how these can be used with different data types, how to combine complementary data types and how to analyze and visualize spatial data. We demonstrate the advantages of Cellar by using it to annotate several HuBMAP datasets from multi-omics single-cell sequencing and spatial proteomics studies. Cellar is open-source and includes several annotated HuBMAP datasets. Availability https://cellar.cmu.hubmapconsortium.org/app/cellar