Daily Digest | April 13, 2024

Unsupervised ensemble-based phenotyping enhances discoverability of genes related to left-ventricular morphology | Nature Machine Intelligence

Recent genome-wide association studies have successfully identified associations between genetic variants and simple cardiac morphological parameters derived from cardiac magnetic resonance images. However, the emergence of large databases, including genetic data linked to cardiac magnetic resonance facilitates the investigation of more nuanced patterns of cardiac shape variability than those studied so far. Here researchers propose a framework for gene discovery coined unsupervised phenotype ensembles. The unsupervised phenotype ensemble builds a redundant yet highly expressive representation by pooling a set of phenotypes learnt in an unsupervised manner, using deep learning models trained with different hyperparameters. These phenotypes are then analysed via genome-wide association studies, retaining only highly confident and stable associations across the ensemble. They applied this approach to the UK Biobank database to extract geometric features of the left ventricle from image-derived three-dimensional meshes.

Research paper

 

A comparison of methods for detecting DNA methylation from long-read sequencing of human genomes | Genome Biology

Long-read sequencing can enable the detection of base modifications, such as CpG methylation, in single molecules of DNA. In this study, researchers systematically compare the performance of CpG methylation detection from long-read sequencing. They demonstrate that CpG methylation detection from 7179 nanopore-sequenced DNA samples is highly accurate and consistent with 132 oxidative bisulfite-sequenced (oxBS) samples, isolated from the same blood draws. They introduce quality filters for CpGs that further enhance the accuracy of CpG methylation detection from nanopore-sequenced DNA, while removing at most 30% of CpGs. This study provides the first systematic comparison of CpG methylation detection tools for long-read sequencing methods.

Research paper

 

AlphaPept: a modern and open framework for MS-based proteomics | Nature Communications

In common with other omics technologies, mass spectrometry (MS)-based proteomics produces ever-increasing amounts of raw data, making efficient analysis a principal challenge. A plethora of different computational tools can process the MS data to derive peptide and protein identification and quantification. However, during the last years there has been dramatic progress in computer science, including collaboration tools that have transformed research and industry. To leverage these advances, researchers develop AlphaPept, a Python-based open-source framework for efficient processing of large high-resolution MS data sets. Numba for just-in-time compilation on CPU and GPU achieves hundred-fold speed improvements. AlphaPept uses the Python scientific stack of highly optimized packages, reducing the code base to domain-specific tasks while accessing the latest advances.

Research paper

 

Leave a Reply

Your email address will not be published. Required fields are marked *