### The TR Prize are monthly awards for the best new research.

#### 1. Generalized Methods and Solvers for Noise Removal from Piecewise Constant Signals

23 October 2019 | Arxiv link | Write review

Removing noise from piecewise constant (PWC) signals, is a challenging signal processing problem arising in many practical contexts. For example, in exploration geosciences, noisy drill hole records need separating into stratigraphic zones, and in biophysics, jumps between molecular dwell states need extracting from noisy fluorescence microscopy signals. Many PWC denoising methods exist, including total variation regularization, mean shift clustering, stepwise jump placement, running medians, convex clustering shrinkage and bilateral filtering; conventional linear signal processing methods are fundamentally unsuited however. This paper shows that most of these methods are associated with a special case of a generalized functional, minimized to achieve PWC denoising. The minimizer can be obtained by diverse solver algorithms, including stepwise jump placement, convex programming, finite differences, iterated running medians, least angle regression, regularization path following, and coordinate descent. We introduce novel PWC denoising methods, which, for example, combine global mean shift clustering with local total variation smoothing. Head-to-head comparisons between these methods are performed on synthetic data, revealing that our new methods have a useful role to play. Finally, overlaps between the methods of this paper and others such as wavelet shrinkage, hidden Markov models, and piecewise smooth filtering are touched on.

#### 2. The polyvertex score (PVS): a whole-brain phenotypic prediction framework for neuroimaging studies

23 October 2019 | Biorxiv link | Write review

The traditional brain mapping approach has greatly advanced our understanding of the localized effect of the brain on behavior. However, the statistically significant brain regions identified by the standard mass univariate models only explain minimal variance in a behavior despite increased sample sizes and statistical power, highlighting the nonsparseness of the explanatory signal in the brain. We introduced the Bayesian polyvertex score (PVS-B), a whole-brain prediction framework that aggregates the effect sizes across all vertices to predict individual variability in behavior. The PVS-B estimates the posterior mean effect size at each vertex with the summary statistics from the brain mapping approach and the correlation structure of the imaging phenotype. Empirical data showed that the PVS-B was able to double the variance explained in the total composite cognition by an nBack fMRI contrast when compared to prediction models based on the mass univariate parameter estimates as well as models based on p-value thresholds. A fivefold improvement in variance explained by the stop signal fMRI contrast was observed using the PVS-B to predict individual variability in the stop signal reaction time. We believe that the PVS-B can shed light on the multivariate investigation of brain-behavioral associations and will empower small scale neuroimaging studies with more reliable and accurate effect size estimates.

#### 3. Steps and bumps: precision extraction of discrete states of molecular machines using physically-based, high-throughput time series analysis

23 October 2019 | Arxiv link | Write review

We report new statistical time-series analysis tools providing significant improvements in the rapid, precision extraction of discrete state dynamics from large databases of experimental observations of molecular machines. By building physical knowledge and statistical innovations into analysis tools, we demonstrate new techniques for recovering discrete state transitions buried in highly correlated molecular noise. We demonstrate the effectiveness of our approach on simulated and real examples of step-like rotation of the bacterial flagellar motor and the F1-ATPase enzyme. We show that our method can clearly identify molecular steps, symmetries and cascaded processes that are too weak for existing algorithms to detect, and can do so much faster than existing algorithms. Our techniques represent a major advance in the drive towards automated, precision, highthroughput studies of molecular machine dynamics. Modular, open-source software that implements these techniques is provided at http://www.eng.ox.ac.uk/samp/members/max/software/

#### 4. Robust, automated sleep scoring by a compact neural network with distributional shift correction

22 October 2019 | Biorxiv link | Write review

Studying the biology of sleep requires the accurate assessment of the state of experimental subjects, and manual analysis of relevant data is a major bottleneck. Recently, deep learning applied to electroencephalogram and electromyogram data has shown great promise as a sleep scoring method, approaching the limits of inter-rater reliability. As with any machine learning algorithm, the inputs to a sleep scoring classifier are typically standardized in order to remove distributional shift caused by variability in the signal collection process. However, in scientific data, experimental manipulations introduce variability that should not be removed. For example, in sleep scoring, the fraction of time spent in each arousal state can vary between control and experimental subjects. We introduce a standardization method, mixture z-scoring , that preserves this crucial form of distributional shift. Using both a simulated experiment and mouse in vivo data, we demonstrate that a common standardization method used by state-of-the-art sleep scoring algorithms introduces systematic bias, but that mixture z -scoring does not. We present a free, open-source user interface that uses a compact neural network and mixture z -scoring to allow for rapid sleep scoring with accuracy that compares well to contemporary methods. This work provides a set of computational tools for the robust automation of sleep scoring.

#### 5. Neural Codes and the Factor Complex

22 October 2019 | Arxiv link | Write review

We introduce the factor complex of a neural code, and show how intervals and maximal codewords are captured by the combinatorics of factor complexes. We use these results to obtain algebraic and combinatorial characterizations of max-intersection-complete codes, as well as a new combinatorial characterization of intersection-complete codes.

#### 6. Ascorbate deficiency does not limit non-photochemical quenching in Chlamydomonas reinhardtii

22 October 2019 | Biorxiv link | Write review

Ascorbate (vitamin C) plays essential roles in development, signaling, hormone biosynthesis, regulation of gene expression, stress resistance and photoprotection. In vascular plants, violaxanthin de-epoxidase (VDE) requires ascorbate (Asc) as reductant, thereby it is required for the energy-dependent component of non-photochemical quenching (NPQ). In order to assess the role of Asc in NPQ in green algae, which are known to contain low amounts of Asc, we searched for an insertional Chlamydomonas reinhardtii mutant affected in the VTC2 gene, essential for Asc biosynthesis. The Crvtc2-1 knockout mutant was viable and, depending on the growth conditions, it contained 10 to 20% Asc relative to its wild type. When Chlamydomonas was grown photomixotrophically at moderate light, the zeaxanthin-dependent component of NPQ emerged upon strong red illumination both in the Crvtc2-1 mutant and in its wild type. Deepoxidation was unaffected by Asc deficiency, demonstrating that the Chlorophycean VDE found in Chlamydomonas does not require Asc as a reductant. The rapidly induced, energy-dependent NPQ component, characteristic of photoautotrophic Chlamydomonas cultures grown at high light, was not limited by Asc deficiency either. On the other hand, a reactive oxygen species-induced photoinhibitory NPQ component was greatly enhanced upon Asc deficiency, both under photomixotrophic and photoautotrophic conditions. These results demonstrate that Asc has distinct roles in NPQ formation in Chlamydomonas than in vascular plants.

#### 7. Recombination and mutational robustness in neutral fitness landscapes

22 October 2019 | Arxiv link | Write review

Mutational robustness quantifies the effect of random mutations on fitness. When mutational robustness is high, most mutations do not change fitness or have only a minor effect on it. From the point of view of fitness landscapes, robust genotypes form neutral networks of almost equal fitness. Using deterministic population models it has been shown that selection favors genotypes inside such networks, which results in increased mutational robustness. Here we demonstrate that this effect is massively enhanced by recombination. Our results are based on a detailed analysis of mesa-shaped fitness landscapes, where we derive precise expressions for the dependence of the robustness on the landscape parameters for recombining and non-recombining populations. In addition, we carry out numerical simulations on different types of random holey landscapes as well as on an empirical fitness landscape. We show that the mutational robustness of a genotype generally correlates with its recombination weight, a new measure that quantifies the likelihood for the genotype to arise from recombination. We argue that the favorable effect of recombination on mutational robustness is a highly universal feature that may have played an important role in the emergence and maintenance of mechanisms of genetic exchange.

#### 8. Transmigration of Trypanosoma cruzi trypomastigotes through 3D cultures resembling a physiological environment

21 October 2019 | Biorxiv link | Write review

Chagas' disease, caused by the kinetoplastid parasite Trypanosoma cruzi, presents a variety of chronic clinical manifestations whose determinants are still unknown but probably influenced by the host-parasite interplay established during the first stages of the infection, when bloodstream circulating trypomastigotes disseminate to different organs and tissues. After leaving the blood, trypomastigotes must migrate through tissues to invade cells and establish a chronic infection. How this process occurs remains unexplored. Three-dimensional (3D) cultures are physiologically relevant because mimic the microarchitecture of tissues and provide an environment similar to the encountered in natural infections. In this work, we combined the 3D culture technology with host-pathogen interaction, by studying transmigration of trypomastigotes into 3D spheroids. T. cruzi strains with similar infection dynamics in 2D monolayer cultures but with different in vivo behavior (CL Brener, virulent; SylvioX10 no virulent) presented different infection rates in spheroids (CL Brener ~40%, SylvioX10 <10%). Confocal microscopy images evidenced that trypomastigotes from CL Brener and other highly virulent strains presented a great ability to transmigrate inside 3D spheroids: as soon as 4 hours post infection parasites were found at 50 {micro}m in depth inside the spheroids. CL Brener trypomastigotes were evenly distributed and systematically observed in the space between cells, suggesting a paracellular route of transmigration to deepen into the spheroids. On the other hand, poor virulent strains presented a weak migratory capacity and remained in the external layers of spheroids (<10{micro}m) with a patch-like distribution pattern. The invasiveness -understood as the ability to transmigrate deep into spheroids- was not a transferable feature between strains, neither by soluble or secreted factors nor by co-cultivation of trypomastigotes from invasive and non-invasive strains. We also studied the transmigration of recent T. cruzi isolates from children that were born congenitally infected, which showed a high migrant phenotype while an isolate form an infected mother (that never transmitted the infection to any of her 3 children) was significantly less migratory. Altogether, our results demonstrate that in a 3D microenvironment each strain presents a characteristic migration pattern and distribution of parasites in the spheroids that can be associated to their in vivo behavior. Certainly, the findings presented here could not have been studied with traditional 2D monolayer cultures.

#### 9. Deep learning tools for the measurement of animal behavior in neuroscience

21 October 2019 | Arxiv link | Write review

Recent advances in computer vision have made accurate, fast and robust measurement of animal behavior a reality. In the past years powerful tools specifically designed to aid the measurement of behavior have come to fruition. Here we discuss how capturing the postures of animals - pose estimation - has been rapidly advancing with new deep learning methods. While challenges still remain, we envision that the fast-paced development of new deep learning tools will rapidly change the landscape of realizable real-world neuroscience.

#### 10. A lysophospholipase plays role in generation of neutral-lipids required for hemozoin formation in malaria parasite

21 October 2019 | Biorxiv link | Write review

Phospholipid metabolism is crucial for membrane dynamics in malaria parasites during entire cycle in the host cell. Plasmodium falciparumharbours several members of phospholipase family, which play key role in phospholipid metabolism. Here we have functionally characterized a parasite lysophospholipase (PfLPL1) with a view to understand its role in lipid homeostasis. We used a regulated fluorescence affinity tagging, which allowed endogenous localization and transient knock-down of the protein. PffLPL1localizes to dynamic vesicular structures that traffic from parasite periphery, through the cytosol to get associated as a multi-vesicular neutral lipid rich body next to the food-vacuole during blood stages. Down-regulation of the PfLPL1 disrupted parasite lipid-homeostasis leading to significant reduction of neutral lipids in lipid-bodies. This hindered conversion of heme to hemozoin, leading to food-vacuole abnormalities, which in turn disrupted parasite development cycle and significantly inhibited parasite growth. Detailed lipidomic analyses of inducible knock-down parasites confirmed role of PfLPL1 in generation of neutral lipid through recycling of phospholipids. Our study thus suggests a specific role of PfLPL1 to generate neutral lipids in the parasite, which are essential for parasite survival.

#### 11. Emergent three-dimensional sperm motility: Coupling calcium dynamics and preferred curvature in a Kirchhoff rod model

21 October 2019 | Arxiv link | Write review

Changes in calcium concentration along the sperm flagellum regulate sperm motility and hyperactivation, characterized by an increased flagellar bend amplitude and beat asymmetry, enabling the sperm to reach and penetrate the ovum (egg). The signaling pathways by which calcium increases within the flagellum are well established. However, the exact mechanisms of how calcium regulates flagellar bending are still under investigation. We extend our previous model of planar flagellar bending by developing a fluid-structure interaction model that couples the three-dimensional motion of the flagellum in a viscous, Newtonian fluid with the evolving calcium concentration. The flagellum is modeled as a Kirchhoff rod: an elastic rod with preferred curvature and twist. The calcium dynamics are represented as a one-dimensional reaction-diffusion model on a moving domain, the centerline of the flagellum. The two models are coupled assuming that the preferred curvature and twist of the sperm flagellum depend on the local calcium concentration. To investigate the effect of calcium on sperm motility, we compare model results of flagellar bend amplitude and swimming speed for three cases: planar, helical (spiral with equal amplitude in both directions), and quasi-planar (spiral with small amplitude in one direction). We observe that for the same parameters, the planar swimmer is faster and a turning motion is more clearly observed when calcium coupling is accounted for in the model. In the case of flagellar bending coupled to the calcium concentration, we observe emergent trajectories that can be characterized as a hypotrochoid for both quasi-planar and helical bending.

#### 12. Evidence for a role for BK channels in the regulation of ADAM17 activity.

20 October 2019 | Biorxiv link | Write review

Large-conductance voltage and calcium activated channels, KCa1.1, have a large single conductance (~p250) and are highly selective for potassium ions. As a result they have been termed big potassium channels (BK channels). Because of the channels ability to integrate multiple physical and chemical signals they have received much attention in excitable cells. In comparison they have received relatively little attention in non-excitable cells in those of the immune system. Here we report evidence that the BK channel regulates ADAM17 activity. Upon macrophage activation, BK channels translocate to the cell membrane. Genetic or pharmacological inhibition of the cell membrane BK channels resulted in elevated TNF- release and increased metalloproteinase a disintegrin and metalloproteinase domain 17 (ADAM17) activity. Inhibitors of BK channels also increased IL-6R release, a second ADAM17 substrate. In comparison, a BK channel opener decreases TNF- release. Taken together, our results demonstrate a novel mechanism by which ion channel regulates ADAM17 activity. Given the broad range of ADAM17 substrates, this finding has implications in many fields of cell biology including immunology, neurology and cancer biology.

#### 13. Suppressor mutations in Mecp2-null mice reveal that the DNA damage response is key to Rett syndrome pathology

20 October 2019 | Biorxiv link | Write review

Mutations in X-linked methyl-CpG-binding protein 2 (MECP2) cause Rett syndrome (RTT). We carried out a genetic screen for secondary mutations that improved phenotypes in Mecp2/Y mice after mutagenesis with N-ethyl-N-nitrosourea (ENU), aiming to identify potential therapeutic entry points. Here we report the isolation of 106 founder animals that show suppression of Mecp2-null traits from screening 3,177 Mecp2/Y genomes. Using exome sequencing, genetic crosses and association analysis, we identify 33 candidate genes in 30 of the suppressor lines. A network analysis shows that 61% of the candidate genes cluster into the functional categories of transcriptional repression, chromatin modification or DNA repair, delineating a pathway relationship with MECP2. Many mutations lie in genes that are predicted to modulate synaptic signaling or lipid homeostasis. Surprisingly, mutations in genes that function in the DNA damage response (DDR) also improve symptoms in Mecp2/Y mice. The combinatorial effects of multiple loci can be resolved by employing association analysis. One line, which was previously reported to carry a suppressor mutation in a gene required for cholesterol synthesis, Sqle, carries a second mutation in retinoblastoma binding protein 8 (Rbbp8 or CtIP), which regulates a DDR choice in double stranded break (DSB) repair. Cells from Mecp2/Y mice have increased DSBs, so this finding suggests that the balance between homology directed repair and non-homologous end joining is important for neuronal cells. In this and other lines, the presence of two suppressor mutations confers better symptom improvement than one locus alone, suggesting that combination therapies could be effective in RTT.

#### 14. Erosion of the Epigenetic Landscape and Loss of Cellular Identity as a Cause of Aging in Mammals

19 October 2019 | Biorxiv link | Write review

All living things experience entropy, manifested as a loss of inherited genetic and epigenetic information over time. As budding yeast cells age, epigenetic changes result in a loss of cell identity and sterility, both hallmarks of yeast aging. In mammals, epigenetic information is also lost over time, but what causes it to be lost and whether it is a cause or a consequence of aging is not known. Here we show that the transient induction of genomic instability, in the form of a low number of non-mutagenic DNA breaks, accelerates many of the chromatin and tissue changes seen during aging, including the erosion of the epigenetic landscape, a loss of cellular identity, advancement of the DNA methylation clock and cellular senescence. These data support a model in which a loss of epigenetic information is a cause of aging in mammals.

#### 15. Species Tree Estimation Using ASTRAL: Practical Considerations

19 October 2019 | Arxiv link | Write review

ASTRAL is a method for reconstructing species trees after inferring a set of gene trees and is increasingly used in phylogenomic analyses. It is statistically consistent under the multi-species coalescent model, is scalable, and has shown high accuracy in simulated and empirical studies. This chapter discusses practical considerations in using ASTRAL, starting with a review of published results and pointing to the strengths and weaknesses of species tree estimation using ASTRAL. It then continues to detail the best ways to prepare input gene trees, interpret ASTRAL outputs, and perform follow-up analyses.

#### 16. Liquid-like and rigid-body motions in molecular-dynamics simulations of a crystalline protein

19 October 2019 | Biorxiv link | Write review

To gain insight into crystalline protein dynamics, we performed molecular-dynamics (MD) simulations of a periodic 2x2x2 supercell of staphylococcal nuclease. We used the resulting MD trajectories to simulate X-ray diffraction and to study collective motions. The agreement of simulated X-ray diffraction with the data is comparable to previous MD simulation studies. We studied collective motions by analyzing statistically the covariance of alpha-carbon position displacements. The covariance decreases exponentially with the distance between atoms, which is consistent with a liquid-like motions (LLM) model, in which the protein behaves like a soft material. To gain finer insight into the collective motions, we examined the covariance behavior within a protein molecule (intra-protein) and between different protein molecules (inter-protein). The inter-protein atom pairs, which dominate the overall statistics, exhibit LLM behavior; however, the intra-protein pairs exhibit behavior that is consistent with a superposition of LLM and rigid-body motions (RBM). Our results indicate that LLM behavior of global dynamics is present in MD simulations of a protein crystal. They also show that RBM behavior is detectable in the simulations but that it is subsumed by the LLM behavior. Finally the results provide clues about how correlated motions of atom pairs both within and across proteins might manifest in diffraction data. Overall our findings increase our understanding of the connection between molecular motions and diffraction data, and therefore advance efforts to extract information about functionally important motions from crystallography experiments.

#### 17. Competition-driven evolution of organismal complexity

19 October 2019 | Arxiv link | Write review

Non-uniform rates of morphological evolution and evolutionary increases in organismal complexity, captured in metaphors like "adaptive zones", "punctuated equilibrium" and "blunderbuss patterns", require more elaborate explanations than a simple gradual accumulation of mutations. Here we argue that non-uniform evolutionary increases in phenotypic complexity can be caused by a threshold-like response to growing ecological pressures resulting from evolutionary diversification at a given level of complexity. Acquisition of a new phenotypic feature allows an evolving species to escape this pressure but can typically be expected to carry significant physiological costs. Therefore, the ecological pressure should exceed a certain level to make such an acquisition evolutionarily successful. We present a detailed quantitative description of this process using a microevolutionary competition model as an example. The model exhibits sequential increases in phenotypic complexity driven by diversification at existing levels of complexity and the resulting increase in competitive pressure, which can push an evolving species over the barrier of physiological costs of new phenotypic features.

#### 18. Plants with self-sustained luminescence

18 October 2019 | Biorxiv link | Write review

In contrast to fluorescent proteins, light emission from luciferase reporters requires exogenous addition of a luciferin substrate. Bacterial bioluminescence has been the single exception, where an operon of five genes is sufficient to produce light autonomously. Although commonly used in prokaryotic hosts, toxicity of the aldehyde substrate has limited its use in eukaryotes. Here we demonstrate autonomous luminescence in a multicellular eukaryotic organism by incorporating a recently discovered fungal bioluminescent system into tobacco plants. We monitored these light-emitting plants from germination to flowering, observing temporal and spatial patterns of luminescence across time scales from seconds to months. The dynamic patterns of luminescence reflected progression through developmental stages, circadian oscillations, transport, and response to injuries. As with other fluorescent and luminescent reporters, we anticipate that this system will be further engineered for varied purposes, especially where exogenous addition of substrate is undesirable.

#### 19. A note on the complexity of evolutionary dynamics in a classic consumer-resource model

18 October 2019 | Arxiv link | Write review

We study how the complexity of evolutionary dynamics in the classic MacArthur consumer-resource model depends on resource uptake and utilization rates. The traditional assumption in such models is that the utilization rate of the consumer is proportional to the uptake rate. More generally, we show that if these two rates are related through a power law (which includes the traditional assumption as a special case), then the resulting evolutionary dynamics in the consumer is necessarily a simple hill-climbing process leading to an evolutionary equilibrium, regardless of the dimension of phenotype space. When utilization and uptake rates are not related by a power law, more complex evolutionary trajectories can occur, including the chaotic dynamics observed in previous studies for high-dimensional phenotype spaces. These results draw attention to the importance of distinguishing between utilization and uptake rates in consumer-resource models.

#### 20. Beyond generalization: Enhancing accurate interpretation of flexible models

18 October 2019 | Biorxiv link | Write review

Machine learning optimizes flexible models to predict data. In scientific applications, there is a rising interest in interpreting these flexible models to derive hypotheses from data. However, it is unknown whether good data prediction guarantees accurate interpretation of flexible models. We test this connection using a flexible, yet intrinsically interpretable framework for modeling neural dynamics. We find that many models discovered during optimization predict data equally well, yet they fail to match the correct hypothesis. We develop an alternative approach that identifies models with correct interpretation by comparing model features across data samples to separate true features from noise. Our results reveal that good predictions cannot substitute for accurate interpretation of flexible models and offer a principled approach to identify models with correct interpretation.

#### 21. Network modelling of topological domains using Hi-C data

18 October 2019 | Arxiv link | Write review

Chromosome conformation capture experiments such as Hi-C are used to map the three-dimensional spatial organization of genomes. One specific feature of the 3D organization is known as topologically associating domains (TADs), which are densely interacting, contiguous chromatin regions playing important roles in regulating gene expression. A few algorithms have been proposed to detect TADs. In particular, the structure of Hi-C data naturally inspires application of community detection methods. However, one of the drawbacks of community detection is that most methods take exchangeability of the nodes in the network for granted; whereas the nodes in this case, i.e. the positions on the chromosomes, are not exchangeable. We propose a network model for detecting TADs using Hi-C data that takes into account this non-exchangeability. In addition, our model explicitly makes use of cell-type specific CTCF binding sites as biological covariates and can be used to identify conserved TADs across multiple cell types. The model leads to a likelihood objective that can be efficiently optimized via relaxation. We also prove that when suitably initialized, this model finds the underlying TAD structure with high probability. Using simulated data, we show the advantages of our method and the caveats of popular community detection methods, such as spectral clustering, in this application. Applying our method to real Hi-C data, we demonstrate the domains identified have desirable epigenetic features and compare them across different cell types.

#### 22. Inter-reader agreement of 18F-FDG PET/CT for the quantification of carotid artery plaque inflammation

17 October 2019 | Biorxiv link | Write review

Background : A significant proportion of ischemic strokes are caused by emboli from unstable atherosclerotic carotid artery plaques with inflammation being a key feature of plaque instability and stroke risk. Positron emission tomography (PET) depicting the uptake of 2-deoxy-2-( 18 F)-fluoro-D-glucose ( 18 F-FDG) in carotid artery plaques is a promising technique to quantify plaque inflammation. A consensus on the methodology for plaque localization and quantification of inflammation by 18 F-FDG PET/computed tomography (CT) in atherosclerosis has not been established. High inter-reader agreement is essential if 18 F-FDG PET/CT is to be used as a clinical tool for the assessment of unstable plaques and stroke risk. The aim of our study was to assess the inter-reader variability of different methods for quantification of 18 F-FDG uptake in carotid atherosclerotic plaques with a separate CT angiography (CTA) providing anatomical guidance. Methods and results: Forty-three patients with carotid artery stenosis [≥]70% underwent 18 F-FDG PET/CT. Two independent readers separately delineated the plaque in all axial PET slices containing the atherosclerotic plaque and the maximum standardized uptake value (SUV max ) from each slice was measured. Uptake values with and without background correction were calculated. Intraclass correlation coefficients were highest for uncorrected uptake values (0.97-0.98) followed by those background corrected by subtraction (0.89-0.94) and lowest for those background corrected by division (0.74-0.79). There was a significant difference between the two readers definition of plaque extension, but this did not affect the inter-reader agreement of the uptake parameters. Conclusions : Quantification methods without background correction have the highest inter-reader agreement for 18 F-FDG PET of carotid artery plaque inflammation. The use of the single highest uptake value (max SUV max ) from the plaque will facilitate the methods clinical utility in stroke prevention.

#### 23. Learning protein sequence embeddings using information from structure

17 October 2019 | Arxiv link | Write review

Inferring the structural properties of a protein from its amino acid sequence is a challenging yet important problem in biology. Structures are not known for the vast majority of protein sequences, but structure is critical for understanding function. Existing approaches for detecting structural similarity between proteins from sequence are unable to recognize and exploit structural patterns when sequences have diverged too far, limiting our ability to transfer knowledge between structurally related proteins. We newly approach this problem through the lens of representation learning. We introduce a framework that maps any protein sequence to a sequence of vector embeddings --- one per amino acid position --- that encode structural information. We train bidirectional long short-term memory (LSTM) models on protein sequences with a two-part feedback mechanism that incorporates information from (i) global structural similarity between proteins and (ii) pairwise residue contact maps for individual proteins. To enable learning from structural similarity information, we define a novel similarity measure between arbitrary-length sequences of vector embeddings based on a soft symmetric alignment (SSA) between them. Our method is able to learn useful position-specific embeddings despite lacking direct observations of position-level correspondence between sequences. We show empirically that our multi-task framework outperforms other sequence-based methods and even a top-performing structure-based alignment method when predicting structural similarity, our goal. Finally, we demonstrate that our learned embeddings can be transferred to other protein sequence problems, improving the state-of-the-art in transmembrane domain prediction.

#### 24. miR-181a Regulates p62/SQSTM1, Parkin and Protein DJ-1 Promoting Mitochondrial Dynamics in Skeletal Muscle Ageing

17 October 2019 | Biorxiv link | Write review

One of the key mechanisms underlying skeletal muscle functional deterioration during ageing is disrupted mitochondrial dynamics. Regulation of mitochondrial dynamics is essential to maintain a healthy mitochondrial population and prevent the accumulation of damaged mitochondria, however the regulatory mechanisms are poorly understood. We demonstrated loss of mitochondrial content and disrupted mitochondrial dynamics in muscle during ageing concomitant with dysregulation of miR-181a target interactions. Using functional approaches and mitoQc assay, we have established that miR-181a is an endogenous regulator of mitochondrial dynamics through concerted regulation of Park2, p62/SQSTM1 and DJ-1 in vitro. Downregulation of miR-181a with age was associated with an accumulation of autophagy-related proteins and abnormal mitochondria. Restoring miR-181a levels in old mice prevented accumulation of p62, DJ-1 and PARK2, improved mitochondrial quality and muscle function. These results provide physiological evidence for the potential of microRNA-based interventions for age-related muscle atrophy and of wider significance for diseases with disrupted mitochondrial dynamics.

#### 25. Exploring the threshold of epidemic spreading for a stochastic SIR model with local and global contacts

17 October 2019 | Arxiv link | Write review

The spread of an epidemic process is considered in the context of a spatial SIR stochastic model that includes a parameter $0\le p\le 1$ that assigns weights $p$ and $1- p$ to global and local infective contacts respectively. The model was previously studied by other authors in different contexts. In this work we characterized the behavior of the system around the threshold for epidemic spreading. We first used a deterministic approximation of the stochastic model and checked the existence of a threshold value of $p$ for exponential epidemic spread. An analytical expression, which defines a function of the quotient $\alpha$ between the transmission and recovery rates, is obtained to approximate this threshold. We then performed different analyses based on intensive stochastic simulations and found that this expression is also a good estimate for a similar threshold value of $p$ obtained in the stochastic model. The dynamics of the average number of infected individuals and the average size of outbreaks show a behavior across the threshold that is well described by the deterministic approximation. The distributions of the outbreak sizes at the threshold present common features for all the cases considered corresponding to different values of $\alpha>1$. These features are otherwise already known to hold for the standard stochastic SIR model at its threshold, $\alpha=1$: (i) the probability of having an outbreak of size $n$ goes asymptotically as $n^{-3/2}$ for an infinite system, (ii) the maximal size of an outbreak scales as $N^{2/3}$ for a finite system of size $N$.

#### 26. Cryo-electron microscopy structure of a nucleosome-bound SWI/SNF chromatin remodeling complex

16 October 2019 | Biorxiv link | Write review

The multi-subunit chromatin remodeling complex SWI/SNF is highly conserved from yeast to humans and plays critical roles in various cellular processes including transcription and DNA damage repair. It uses the energy from ATP hydrolysis to remodel chromatin structure by sliding and evicting the histone octamer, creating DNA regions that become accessible to other essential protein complexes. However, our mechanistic understanding of the chromatin remodeling activity is largely hindered by the lack of a high-resolution structure of any complex from this family. Here we report the first structure of SWI/SNF from the yeast S. cerevisiae bound to a nucleosome at near atomic resolution determined by cryo-electron microscopy (cryo-EM). In the structure, the Arp module is sandwiched between the ATPase and the Body module of the complex, with the Snf2 HSA domain connecting all modules. The HSA domain also extends into the Body and anchors at the opposite side of the complex. The Body contains an assembly scaffold composed of conserved subunits Snf12 (SMARCD/BAF60), Snf5 (SMARCB1/BAF47/ INI1) and an asymmetric dimer of Swi3 (SMARCC/BAF155/170). Another conserved subunit Swi1 (ARID1/BAF250) folds into an Armadillo (ARM) repeat domain that resides in the core of the SWI/SNF Body, acting as a molecular hub. In addition to the interaction between Snf2 and the nucleosome, we also observed interactions between the conserved Snf5 subunit and the histones at the acidic patch, which could serve as an anchor point during active DNA translocation. Our structure allows us to map and rationalize a subset of cancer-related mutations in the human SWI/SNF complex and propose a model of how SWI/SNF recognizes and remodels the +1 nucleosome to generate nucleosome-depleted regions during gene activation.

#### 27. Magic numbers in polymer phase separation -- the importance of being rigid

16 October 2019 | Arxiv link | Write review

Cells possess non-membrane-bound bodies, many of which are now understood as phase-separated condensates. One class of such condensates is composed of two polymer species, where each consists of repeated binding sites that interact in a one-to-one fashion with the binding sites of the other polymer. Previous biologically-motivated modeling of such a two-component system surprisingly revealed that phase separation is suppressed for certain combinations of numbers of binding sites. This phenomenon, dubbed the "magic-number effect", occurs if the two polymers can form fully-bonded small oligomers by virtue of the number of binding sites in one polymer being an integer multiple of the number of binding sites of the other. Here we use lattice-model simulations and analytical calculations to show that this magic-number effect can be greatly enhanced if one of the polymer species has a rigid shape that allows for multiple distinct bonding conformations. Moreover, if one species is rigid, the effect is robust over a much greater range of relative concentrations of the two species. Our findings advance our understanding of the fundamental physics of two-component polymer-based phase-separation and suggest implications for biological and synthetic systems.

#### 28. Astrocyte-selective AAV-ADAMTS4 gene therapy combined with hindlimb rehabilitation promotes functional recovery after spinal cord injury

16 October 2019 | Biorxiv link | Write review

#### 29. Navigating through the R packages for movement

16 October 2019 | Arxiv link | Write review

The advent of miniaturized biologging devices has provided ecologists with unprecedented opportunities to record animal movement across scales, and led to the collection of ever-increasing quantities of tracking data. In parallel, sophisticated tools have been developed to process, visualize and analyze tracking data, however many of these tools have proliferated in isolation, making it challenging for users to select the most appropriate method for the question in hand. Indeed, within the R software alone, we listed 58 packages created to deal with tracking data or 'tracking packages'. Here we reviewed and described each tracking package based on a workflow centered around tracking data (i.e. spatio-temporal locations (x,y,t)), broken down into three stages: pre-processing, post-processing and analysis, the latter consisting of data visualization, track description, path reconstruction, behavioral pattern identification, space use characterization, trajectory simulation and others. Supporting documentation is key to render a package accessible for users. Based on a user survey, we reviewed the quality of packages' documentation, and identified 11 packages with good or excellent documentation. Links between packages were assessed through a network graph analysis. Although a large group of packages showed some degree of connectivity (either depending on functions or suggesting the use of another tracking package), one third of the packages worked in isolation, reflecting a fragmentation in the R movement-ecology programming community. Finally, we provide recommendations for users when choosing packages, and for developers to maximize the usefulness of their contribution and strengthen the links within the programming community.

#### 30. Structure of SWI/SNF chromatin remodeller RSC bound to a nucleosome

15 October 2019 | Biorxiv link | Write review

Chromatin remodelling complexes of the SWI/SNF family function in the formation of nucleosome-depleted regions and transcriptionally active promoters in the eukaryote genome. The structure of the Saccharomyces cerevisiae SWI/SNF family member RSC in complex with a nucleosome substrate reveals five protein modules and suggests key features of the remodelling mechanism. A DNA-interacting module grasps extra-nucleosomal DNA and helps to recruit RSC to promoters. The ATPase and arm modules sandwich the nucleosome disc with their 'SnAC' and 'finger' elements, respectively. The translocase motor engages with the edge of the nucleosome at superhelical location +2 to pump DNA along the nucleosome, resulting in a sliding of the histone octamer along DNA. The results elucidate how nucleosome-depleted regions are formed and provide a basis for understanding human chromatin remodelling complexes of the SWI/SNF family and the consequences of cancer mutations that frequently occur in these complexes.