Материал из Институт биоинформатики
Перейти к: навигация, поиск

Седьмая неделя журнального клуба не была посвящена какой-то определенной теме. Особенностью ее было то, что все доклады были представлены на английском языке. На четвертой неделе журнального клуба мы обсудили статьи, посвященные альтернативному сплайсингу. На повестке дня были такие вопросы: насколько такая форма регуляции экспрессии генов распространена и как осуществляется, какую роль альтернативный сплайсинг играет в патогенезе различных заболеваний, в развитии органов и тканей, а также о том, как альтернативный сплайсинг РНК оказывает влияние на взаимодействие белков и как и зачем изучать "сплайсинговый" код.

Статья 1 A whole-cell computational model predicts phenotype from genotype

Understanding how complex phenotypes arise from individual molecules and their interactions is a primary challenge in biology that computational approaches are poised to tackle. Report was abot a whole-cell computational model of the life cycle of the human pathogen Mycoplasma genitalium that includes all of its molecular components and their interactions. An integrative approach to modeling that combines diverse mathematics enabled the simultaneous inclusion of fundamentally different cellular processes and experimental measurements. Whole-cell model accounts for all annotated gene functions and was validated against a broad range of data. The model provides insights into many previously unobserved cellular behaviors, including in vivo rates of protein-DNA association and an inverse relationship between the durations of DNA replication initiation and replication. In addition, experimental analysis directed by model predictions undetected kinetic parameters and biological functions.

Model object – mycoplasma genitalium.

Algorithm implemented in matlab models all processes going in a cell.

Predicted DNA, lipid, protein and RNA content in the cell, cell division time compared with experimental data. (They considered processes connected with DNA, RNA, proteins). Discrete processes: each process was analyse independent They tried to predict the cell division process, compared model with reality, it was rather good, but not in terms of DNA content. But it turns out that their model was right, and 'experimental' results were wrong this time. Also they surveyed (nucleotide) metabolism, protein interactions with DNA/RNA.

Also researches tried to model all the processes in organism (one cell). Results were pretty close to reality.

To sum up — authors suggested the model which can be used for prediction different phenotypical characteristics of organism knowing only about its genome.

Статья 2 Joint detection of CNV in parent-offspring trios

Motivation: Whole genome sequencing (WGS) of parent-offspring trios is a powerful approach for identifying disease-associated genes via detecting copy number variations (CNVs). Existing approaches, which detect CNVs for each individual in a trio independently, usually yield low-detection accuracy. Joint modeling approaches leveraging Mendelian transmission within the parent-offspring trio can be an efficient strategy to improve CNV detection accuracy.

Results: In this study, was developed TrioCNV, a novel approach for jointly detecting CNVs in parent-offspring trios from WGS data. Using negative binomial regression, modeled the read depth signal while considering both GC content bias and mappability bias. Also incorporated the family relationship and used a hidden Markov model to jointly infer CNVs for three samples of a parent-offspring trio. Through application to both simulated data and a trio from 1000 Genomes Project, was shown that TrioCNV achieved superior performance than existing approaches.


  • CNVs - changes in the copy number of large genomic segments
  • CNVs are connected with human diseases including neuropsychiatric disorders and cancers

Trio CNVs are powerful in identifying disease-associated genetic variants for both common and rare diseases.

Existing approaches: detect CNVs for each individual in trio independently - low detection accuracy.

New approach: detect CNVs jointly. A joint modeling approach was applied to improve detection accuracy for SNPs.

CNVs detection

  • fluorescent in situ hybridization (FISH)
  • comparative genomic hybridization (array CGH)
  • SNP microarray
  • NGS data

How to use NGS to detect deletions and duplications: map reads and then look at read depth, GC content, mappability score. The authors have created a workflow for this.

Detection via NGS data Map reads and look at:

  • read depth (RD)
  • paired-end mapping (PEM)
  • split read (SR)

Or de-novo assembly

The detection of CNVs jointly: FISH, array CGH, split read in de-novo assembly, via NGS -> map reads and look at it Workflow: trio reads+pedigree+reference == divide the whole genome into windows and calculate reads depth, GC-content, and mapability score → perform CNV segmentation → refine. Hiden Markov Models for CNV segmentation.

Статья 3 Use of a Probabilistic Motif Search to Identify Histidine Phosphotransfer Domain-Containing Proteins

The wealth of newly obtained proteomic information affords researchers the possibility of searching for proteins of a given structure or function. Authors describe a general method for the detection of a protein domain of interest in any species for which a complete proteome exists.

In particular, this approach was applied to identify histidine phosphotransfer (HPt) domain-containing proteins across a range of eukaryotic species. From the sequences of known HPt domains, was created an amino acid occurrence matrix which was then used to define a conserved, probabilistic motif.

Examination of various organisms either known to contain (plant and fungal species) or believed to lack (mammals) HPt domains established criteria by which new HPt candidates were identified and ranked. Search results using a probabilistic motif matrix compare favorably with data to be found in several commonly used protein structure/function databases: method identified all known HPt proteins in the Arabidopsis thaliana proteome, confirmed the absence of such motifs in mice and humans, and suggests new candidate HPts in several organisms.

Moreover, probabilistic motif searching can be applied more generally, in a manner both readily customized and computationally compact, to other protein domains; this utility is demonstrated by identification of histones in a range of eukaryotic organisms.

The problem is to find new HP proteins using very low-conservative protein motif in proteoms of eukaryotes. Created consensus sequence of this motif for all eukaryotes. Finally they got significant and valuable results with very low rate of false-positives. The method is very computationally effective, because it is important that they’ve got good results for sequences with low conservation level.

So authors searched motifs in full proteoms → k-mers+highest scores → profit with work with low-conservative consequenses, low false positive rate.

HPt-domain = Histidine in alpha-helix center variability beyond that Likelihood of AA occurence in stretches flanking the -- key His residue -> Identification of new HPts in proteomes.

  • Hits all the Pfam-predicted proteins
  • Computationally effective
  • Bias of the procariotic data avoided by generating 2 consensus sequences
  • Finally: low conservation sequences gave significant and valuable results


  • Works for low-conservative sequences
  • Computationally effective
  • Finds all putatives from Pfam, Smart, Prosite and even more
  • Low false-positive rate
  • Generality and flexibility: can be adjusted for other protein families
  • Python code on GitHub

Статья 4 The effect of chaperonin buffering on the protein evolution

Molecular chaperones are highly conserved and ubiquitous proteins that help other proteins in the cell to fold.

Pioneering work by Rutherford and Lindquist suggested that the chaperone Hsp90 could buffer (i.e., suppress) phenotypic variation in its client proteins and that alternate periods of buffering and expression of these variants might be important in adaptive evolution. More recently, Tokuriki and Tawfik presented an explicit mechanism for chaperone-dependent evolution, in which the Escherichia coli chaperonin GroEL facilitated the folding of clients that had accumulated structurally destabilizing but neofunctionalizing mutations in the protein core.

But how important an evolutionary force is chaperonin-mediated buffering in nature? Authors address this question by modeling the per-residue evolutionary rate of the crystallized E. coli proteome, evaluating the relative contributions of chaperonin buffering, functional importance, and structural features such as residue contact density. Previous findings suggest an interaction between codon bias and GroEL in limiting the effects of misfolding errors.

Results suggest that the buffering of deleterious mutations by GroEL increases the evolutionary rate of client proteins. Then the evolutionary fate of GroEL clients in the Mycoplasmas, a group of bacteria containing the only known organisms that lack chaperonins, were examined. Was shown that GroEL was lost once in the common ancestor of a monophyletic subgroup of Mycoplasmas, and authors evaluate the effect of this loss on the subsequent evolution of client proteins, providing evidence that client homologs in 11 Mycoplasma species have lost their obligate dependency on GroEL for folding.

Analyses indicate that individual molecules such as chaperonins can have significant effects on proteome evolution through their modulation of protein folding.

Experimental evolution on four enzymes in E.coli.

So, there is an effect of chaperonine buffering in proten evolution. Chaperonine client proteins evolve faster than nonclients, either because of the chaperonines help in stabilizing the structure of protein with innovative mutation or because of something else.

Experiments showed that GroEL/GroES could maintain function of enzymes that have accumulated lots of destabilizing mutations. Without GroEL they did not function.


Refutation of the hypothesis that chaperone clients should evolve faster than nonclients in the case of the E. coli chaperonin clients and their homologs.

Support for the hypothesis that chaperones facilitate adaptive evolution under the condition that functionally innovative mutations tend to interfere with protein folding.

Authors give two ideas why do chaperonin clients tend to be more functionally important:

1. chaperonin clients are able to more easily fix functionally innovative mutations despite their structurally destabilizing effects

2. functionally important proteins are highly constrained and have more need of chaperone- assisted folding following the fixation of functionally innovative mutations

Result of this paper: client proteins evolve significantly slower than non-clients. Is it because of their functional importance? Authors think yes.