Ìîñêîâñêèé ñåìèíàð

ïî áèîèíôîðìàòèêå


Íîâîñòè

Êîíòàêòû

Ñõåìà ïðîåçäà:

ÈÌÁ

ÔÁèÁè, ÌÃÓ


Ñòàòüÿ î ñåìèíàðå


Êðàòêèå ðåçþìå äîêëàäîâ

   

2012-14   2009-11   2006-08   2003-05   2000-02   1997-99   1994-96

 

 


21.

16.08.1994

 

I.B.Rogozin, Luciano Milanesi*, Nicolay A. Kolchanov

Institute of Cytology and Genetics SD RAS, Russia

and

* Istituto di Technologie Biomediche Avanzate, Consiglio Nazionale Della Ricerche,  Milano, Italy

Gene structure prediction using information on homologous protein sequence

A new approach for protein coding genes structure prediction is suggested. The principal scheme of prediction is as follows. At first, the best potential exons are predicted in a sequence with unknown functions through revealing potential splice sites and regions with high coding potential. List of potential amino acid fragments encoded by these exons is formed. The next step is testing the homology between each amino acid fragment from the list and proteins from the SWISS-PROT database of amino acid sequences. The only sequence with the best identity score is chosen out of all the homologous proteins. The third step is reconstruction of the exon-intron structure based on the data regarding the homology of protein sequences. Testing of this method on the independent control set (20 genes) has shown the accuracy of exon/intron structure prediction is comparable with Grail. 21% of real exons was lost and 3% of nonreal exons was found.


22.

05.09.1994

 

Eugene Kolker, Edward N. Trifonov

Department of Structural Biology, The Weizmann Institute of Science

Modular structure of protein sequences and its possible origins

Analysis of the protein sequence length distributions showed that ~20% of proteins are made of standard size units of ~123 amino acids for eukaryotes and ~152 amino acids for prokaryotes. This underlying regularity is approximately twice stronger on more conservative proteins such as enzymes and proteins with a subunit structure.

Among other possible reasons of such protein sequence organization, the recombinational origin was proposed. One could think that in early evolution DNA segments of the standard size were shuffled between themselves. If that was the case, the initiation triplets (methionine residues) have to have preferences to positions corresponding to the multiples of the unit size. Our analysis of eukaryotic sequences confirms this hypothesis.


23.

22.09.1994

 

V.Makeev

Institute of Molecular Biology

Usage of different amino acid similarity scores in Fourier analysis of protein sequences. Comparison of periodical patterns in the genic and protein sequences of collageng

Fourier transform of the autocorrelation function of a sequence permits an accurate computation of the amino acid similarity. Moreover, it is feasible to study periodical patterns composed only by amino acids of some particular type, e.g. charged or hydrophobic. Calculations of periodical patterns in the primary structure of collagen using different similarity matrices show that the periodical structures found in collagen originate from distribution of amino acids of different types. Nevertheless, the comparative analysis of periodical patterns in the protein sequence and the sequence of the gene coding for this protein shows that at least some patterns originate from the gene sequence and seem to arise via gene duplications. [Makeev et al., 1995].


24.

1.12.1994

 

M.Gelfand

Statistical aspects of forensic DNA analysis (an overview)

I'll present a not-too-deep overview of various forensic DNA techniques including, in particular, analysis of variable number of tandem repeats polymorphisms (DNA fingerprinting), analysis of hypervariable loci in mitochondrial DNA, and applications of phylogenetic analysis. Some well known cases will be considered, namely, the “Florida dentist case” (who has transmited AIDS to a number of his patients), identification of the remains of the Romanov family, and, to some extent, the O.J.Simpson case (a former football player who is under trial in USA for murdering his wife).


25.

15.12.1994

 

Sh.R.Sunyaev, V.G.Tumanyan, E.N.Kuznetsov

Institute of Molecular Biology

Statistical approach to the inverse folding problem

·  The inverse folding problem.

·  Brief overview of the current situation in the field.

·  The problem of description of the 3D structure.

·  Classification of the existing protein data bank.

·  Our approach: statistical criteria and similarity functionals used.

·  The alignment problem in the considered case.

·  The probabilistic model and the problem of the decision rule.

·  Estimation of influence of certain amino acids.

[Ñþíÿåâ è äð., 1994; Sunyaev et al., 1997; Sunyaev, 1997].


26.

29.12.1994

 

A.A.Mironov, I.V.Grigoriev

Contacts of alpha helices and the peptide architecture

The Protein Data Base has been analysed. The rules for major contacts of alpha helices are formulated. It is shown that hydrophobicity of a contact is a wrong criterion for selection of major contacts. [Ãðèãîðüåâ è äð., 1997á; Grigoriev et al., 1998].


27.

19.1.1995

 

A.M.Leontovich

What a weight matrix for the alignment should be?

Various approaches for the choice of the matrix of weights for changes of residues in biosequences are discussed. Under one of them some theorems are proved on optimality of the Dayhoff matrix. The other more pragmatic approach provides the "normality" of the weight matrix. Problem of penalties for gaps will also be discussed.


30.

1.3.1995

 

V.V.Panjukov

Institute of Mathematical Problems of Biology

Finding steady alignments: Similarity and distance

Some aligments keep the optimum when the weight parameters vary over a range of values. Aligments of this kind are called steady. A method for finding all steady optimal aligments of two sequences will be presented. It assumes that the gap penalty is directly proportional to the gap length.

Previously it has been shown that if the weight one insertion/deletion is <0.5, the similarity-based and distance-based alignments are not equivalent. An explanation for this fact will be given.

[Panjukov, 1993].


32.

17.5.1995

 

M.Gelfand, A.Mironov, P.Pevzner

Spliced alignment: A new approach to gene recognition

The standard way of utilizing the information about homologous proteins in exon assembly, which is to predict several candidate exon-intron structures and then to submit them to similarity search, has several obvious drawbacks. An alterative is provided by the spliced alignment approach. We consider a set of probable splicing sites or a set of candidate exons and apply an effective procedure that simultaneously aligns all structures generated by this set with a target protein sequence.

The program implementing the spliced alignment algorithm correctly predicts all human genes from the testing set if a mammalian relative is known. More distanced targets provide less perfect, but still very good level of recognition. Several seeming errors proved to be results of alternative splicing or errors in GenBank feature tables. The results on simulated data demonstrate that the quality of prediction with strongly mutated targets crucially depends on the quality of filtering of candidate exons. [Gelfand et al., 1996b; Gelfand et al., 1996c; Mironov et al., 1998; Mironov et al., 1999b; Mironov et al., 2000].


33.

31.5.1995

 

I.Dedinsky

Institute of Biomedical Chemistry

Prediction of B-epitopes using grammar parsing

B-epitope (antigenic site) is modeled as a sequence site having stabilized (rigid) conformation. Stabilization is provided by interaction of amino acids within the epitope. We introduce the notion of epitope structure and construct a set of empirical rules of structure formation. Then these rules are transformed into a context-sensitive grammar. Parsing by this grammar recognizes antigenic and non-antigenic sites. As opposed to existing algorithms for epitope prediction that construct profiles of amino acids along th sequence, the developed method is more sensitive to the amino acid constext, provides less ambiguous results, and is in general more reliable.

This work leaded to a problem of formal construction of a grammar system based on a sample of positive and negative examples on sequences. In order to do that we introduce a distributive similarity operation on sequences allowing us to form generalized images of example sets.


34.

1.5.1995

 

I.Dedinsky

Institute of Biomedical Chemistry

Similarity between biological sequences from the point of view of mathematics and biology

1.  Pairwise alignment of biological sequences taking into account structural information.

2.  Distributive operation of consensus construction without alignment for biological sequences.


36.

13.07.1995

 

Leonid Mirny

Harvard University, Dept. of Chemistry

How protein may fold...

We study the thermodynamic and kinetic behavior of a simple model for protein folding. Different scenarios of folding are observed for a chain on a cubic lattice. We simulate protein folding experiments and compare observed kinetics with the data obtained in recent experiments. Detailed study of protein kinetics and thermodynamics reveals two physically different mechanisms providing fast folding. The role of intermediates in protein folding is discussed.


37.

20.7.1995

 

Leonid Mirny

Harvard University, Dept. of Chemistry

Fold recognition and dynamics in the space of contact maps

We introduce an energy function for contact maps of proteins, that takes into account pairwise interactions between amino acids as well as hydrophobic interactions of amino acids with water. The hydrophobic energy term is of a form that prefers an optimal number of inter-protein contacts, specific for each amino acid. We derived parameters of the energy function from a statistical analysis of the contact maps of known structures. The energy function was tested in several ways. First, the sequences obtained by randomly scrambling the amino acids of a protein were screened by calculating for each the energy of the protein's known contact map. This test demonstrated strong sequence specificity of the introduced energy function. Next we simulated protein dynamics by performing Monte-Carlo moves in the space of contact maps. Topological and polymeric constraints were taken into account by dynamic rules that reduced the possible allowed steps. In good agreement with expectations, the method identifies a set of local minima in the vicinity of the native state. We simulated melting of a protein by performing Monte-Carlo dynamics at a high temperature. Slow cooling from a partially unfolded state refolds a protein to conformations very similar to the native one. We also performed fold recognition experiments, i.e. screening a set of known structures against a given sequence. The results for the BPTI and myoglobin sequences are presented. In both cases the energy of the native structure lies significantly below the average value for the set. Moreover, the myoglobin sequence is able to identify structure of the other members of the globin family as having the lowest energy values in the set. The method is also able to identify incorrect folds of BPTI in the case when other currently used potentials failed to achieve this. Perspectives of application of the method for structure checking and fold recognition are discussed.


38.

12.9.1995

 

Ross Overbeek

Argonne National Laboratory

Interpreting microbial genomes

Two microbial genomes have already been completely sequenced. Two more will be completed during the next few months. I believe that there will be 10-15 complete genomes available within 18 months. What can be learned from these genomes? I propose to discuss the central issues of how to reduce the cost of determining functions for genes, for determination of operons, for analysis of regulatory mechanisms, and how to acquire and organize the data required to support this research.


40.

6.10.1995

 

M.Gelfand and M.Roytberg

New developments in recognition of coding regions

Since none of the gene recognition algorithms is perfect, the developers have to keep the balance between over-and underprediction. Usually some parameter more or less symmetrically dependent on both these values (e.g. the correlation coefficient) is optimized. However, there exist situations where there is no symmetry, and errors of one type are much more serious than those of the other type. We will consider two such situations.

(1) The number of candidate exons in a sequence fragment is typically very large. On the other hand, many approaches use algorithms polynomial (or even exponential) on this number. Thus there arises the problem of preliminary filtration of the exon set. Such procedure should have sensitivity close to 100% (lose nothing), although the specificity can be rather low.

(2) On the other hand, sometimes it is important to predict only fragments of a gene (not even complete exons), but with a very high specificity. This problem arises, in particular, in construction of oligonucleotide probes and PCR primers for screening cDNA libraries given a genomic fragment.

We will present algorithms based on the vector dynamic programming approach that address these problems.

[Ðîéòáåðã è äð., 1997; Roytberg et al., 1997; Mironov et al., 1998; Sze et al., 1998]


41.

1.12.1995

 

N.N.Vtyurin

Institute of Molecular Genetics

Modeling of the spatial structure of protein molecules

·  Types of protein architecture.

·  Search for structural analogs of a protein with known amino acid sequence.

·  Technology of computer modeling.

·  Mekler's constructions.


42-43.

15.12.1995, 22.12.1995

 

Sh.R.Sunyaev

Institute of Molecular Biology

Statistical approach to the inverse protein folding problem. Criteria of 3D-1D compatibility.

1)  Brief introduction. Reduced representations of the protein tertiary structure.

2)  Basic assumptions of our approach.

3)  The problem of 3D-1D compatibility as a problem of statistical hypothesis testing.

4)  Some criteria of 3D-1D compatibility.

5)  Requirements for environmental variables used for reduced structure representations. Tests performed on a representative set of proteins from PDB.

6)  Can statistical approach help to invent new environmental variables?

[Ñþíÿåâ è äð., 1995; Ñþíÿåâ è äð, 1996].


44.

26.1.1996

 

V.A.Shepelev

Institute of Molecular Genetics

Multidimensional dot-matrices

Dot-matrices of similarity are widely used for visualization of similarity regions in a pair of nucleotide or amino acid sequences. Generalization of the dot-matrix of homology for n sequences is suggested. For the visualization of the n-dimensional dot-matrix, a special projection which conserve the distances along the sequence is displayed. The common regions of similarity are revealed as segments of straight lines parallel to the main diagonal. An effective algorithm of n-dimensional dot-matrix calculation is suggested. The method is useful for visualization of similarity regions e.g. protein-coding region, for a wide variety of sequences' families as illustrated by a number of examples. Up to ten sequences 10 kb each can be analysed with this program. Some further improvements of the program are discussed. [Shepelev & Yanishevsky, 1994].


45.

9.2.1996

 

O.D.Ermolaeva

Institute of Bioorganic Chemistry

Mathematical model of subtractive hybridization and its practical application

The first theory of subtractive hybridization is developed. A kinetic model of this process is proposed and implemented in a computer program modeling the subtraction process. A new method of subtractive hybridization based on the theory allows one to perform routine comparison of genomes and products of genome expression. It is used in studies of the genetic mechanisms of embryogenesis, regeneration, cell differentiation and tumor transformation. [Ermolaeva & Wagner, 1995; Ermolaeva & Sverdlov, 1996; Ermolaeva et al., 1996].


46.

23.2.1996

 

A.V.Prokhorov

Department of Mathematics, Moscow State University

Mathematical analysis of verse

1.  Metric organization of speech.

2.  Probabilistic models of the speech rhythm.

3.  Mathematical analysis of verse.


47.

15.3.1996

 

G. Kutuzova

Institute of Molecular Biology

Artificial Neural Networks: some neural network models and their applications in computer analysis of DNA and protein sequences

Hopfield model, Kohonen model, Back-Propagation model: architecture and topology, rules of weights modification, learning algorithms. Applications to E. coli promoter recognition, prediction of protein secondary structure, search for unusual motives in DNA sequences. Comparison with traditional methods. [Êóòóçîâà è Ïîëîçîâ, 1995].


48.

3.9.1996

 

M.Gelfand

Genetics of two-spotted ladybird Adalia bipunctata (a review)

Although genetics of two-spotted ladybird is not as widely studied as, say, genetics of Drosophila, its tradition comes back to Dobzhansky and Timofeev-Ressovsky. This beetle has some peculiar and interesting features. Its genome carries a large number of recessive lethal mutations. It can be infected by a microbe killing all male eggs, whereas the female offspring are again infected. The most interesting and widely used phenomenon is the color polymorphism. There exist at least 12 variants of the coloration controlled by a single gene. In general, melanic (black with red spots) alleles are dominant, and typical (red with black spots) are recessive. The percentage of melanics in various populations is 0% through 80%. There exist various explanations for this phenomenon, ranging from ecology and geography (melanics seem to be preferred in seashore populations, large industrial cities, at the boundaries of the ladybird areal) to purely genetical (it seems that there exists a gene responsible for preference of melanic males by females). The history of the polymorphism studies is quite dramatical, with sharp contradictions between different groups, retractions and re-retractions, loss of pure lines etc. Finally, the papers themselves, especially those dedicated to sexual preferences of the ladies, are rather amusing.


49.

19.9.1996

 

A.A.Belyaev

Vernadsky Institute of Geochemistry and Analystical Chemistry

Geochemical earthquake precursors

Long-term observations of the ground water composition in several seismoactive regions allowed us to obtain a new geochemical predictive indicator of erthquake preparation. The observed natural phenomenon involves appearance of a regular sequence of specific geochemical anomalies. Duration of the observed preseismic period may exceed two years.

Serial regularity of such anomalies indicates that there exists an oscillating force of changing (increasing) frequency acting on the observed chemical system during this period. The discovered effect formed a base for a earthquake prediction method which uses the analysis of frequency modulated ascilllations (FMO) in the geochemical system.


50.

3.10.1996

 

T.A.Borovina

Institute of Mathematical Problems of Biology

On the resolution of methods for calculation of DNA redundancy

Three approaches for estimation of the DNA redundancy are compared: the Shannon entropy, the Lempel-Ziv complexity, and a new method, computation of the low frequency component of the l-gram graph. Although these methods are based on different ideas, they satisfy some reasonable requirements. The ability of these methods to find various kinds of repeats is compared. [Êèñëþê è äð., 1995].


51.

19.12.1996

 

Sh. Sunyaev

Institute of Molecular Biology

Statistical analysis of residue conformational properties, or what makes knowledge-based protein fold prediction so difficult?

Validity of the theoretical basis of currently used knowledge-based techniques for protein fold recognition was investigated.

Three following points were considered:

i)  Is it possible to introduce a probability distribution for various conformational properties of amino acid residues?

ii)  How strong are statistical preferences of amino acids to be in specific environment?

iii)  How conservative are conformational properties amongst proteins with the same folding type?

[Sunyaev et al., 1998].


     

 

 

 

© Seminar, 1993 - 2016