scGHOST: identifying single-cell 3D genome subcompartments | Nature Methods
Single-cell Hi-C (scHi-C) technologies allow for probing of genome-wide cell-to-cell variability in three-dimensional (3D) genome organization from individual cells. Here researchers present scGHOST, a single-cell subcompartment annotation method using graph embedding with constrained random walk sampling. Applications of scGHOST to scHi-C data and contact maps derived from single-cell 3D genome imaging demonstrate reliable identification of single-cell subcompartments, offering insights into cell-to-cell variability of nuclear subcompartments.
BISCUIT: an efficient, standards-compliant tool suite for simultaneous genetic and epigenetic inference in bulk and single-cell studies | Nucleic Acids Research
Data from both bulk and single-cell whole-genome DNA methylation experiments are under-utilized in many ways. This is attributable to inefficient mapping of methylation sequencing reads, routinely discarded genetic information, and neglected read-level epigenetic and genetic linkage information. Researchers introduce the BISulfite-seq Command line User Interface Toolkit (BISCUIT) and its companion R/Bioconductor package, biscuiteer, for simultaneous extraction of genetic and epigenetic information from bulk and single-cell DNA methylation sequencing.
PLMSearch: Protein language model powers accurate and fast sequence search for remote homology | Nature Communications
Homologous protein search is one of the most commonly used methods for protein annotation and analysis. Compared to structure search, detecting distant evolutionary relationships from sequences alone remains challenging. Here researchers propose PLMSearch (Protein Language Model), a homologous protein search method with only sequences as input. PLMSearch uses deep representations from a pre-trained protein language model and trains the similarity prediction model with a large number of real structure similarity. This enables PLMSearch to capture the remote homology information concealed behind the sequences.