Moscow seminar on Bioinformatics 2003-05

Московский семинар

по биоинформатике

Новости

Контакты

Схема проезда:

ИМБ

ФБиБи, МГУ

Статья о семинаре

Краткие резюме докладов

2012-14 2009-11 2006-08 2003-05 2000-02 1997-99 1994-96

142.

23.1.2003

A.M. Leontovich, V.K. Nikolaev

Belozersky Institute of Physico-Chemical Biology, Moscow State University

On Power Threshold Values in Homology Searches

A problem of finding similarities between biological sequences arises before researchers often enough. In this case, it is necessary to distinguish between similarities that indicate real biological kinship (the sequences are not just similar, but homologous, e.g. they have the same evolutionary origin or fulfill the same functions) and purely accidental ones.
A natural approach to solve this problem is finding similarities between random Bernoulli sequences, instead of biological ones.
Such analysis, using the maximum sum of substitution weights in a motif as similarity measurement (e.g. in BLAST (1, 2)), has been carried out (3-7). A rigorous mathematical theory has been created in support of the analysis.
The purpose of this paper is to present an analogous analysis, in which the power of motifs (8, 9) is used as similarity measurement. It is performed by means of numerical modeling, after running random Bernoulli sequences against a biological bank. The paper also presents partially founded theoretical estimates of the threshold values.

A.M. Леонтович, В.К.Николаев

Институт физико-химической биологии им. Белозерского МГУ

О пороговых значениях мощности при поиске гомологов

Часто возникает задача нахождения сходства между биологическими последовательностями. При этом необходимо понять, в каком случае полученное сходство указывает на подлинное биологическое родство (последовательности не просто сходны, а “гомологичны” - например, имеют общее эволюционное происхождение, или же обладают одинаковыми функциями), а в каких – такое сходство носит чисто случайный характер. Естественным подходом к решению такой задачи является определение сходства для случая, когда вместо биологических последовательностей берутся случайные бернуллиевские.
В случае, когда в качестве меры сходства используется максимальная сумма весов за замены в мотиве, как это делается, например, в программе BLAST (1, 2) такой анализ был выполнен в (3-7). При этом была построена строгая математическая теория, обосновавшая этот анализ.
Целью настоящей работы является аналогичный анализ в случае, когда в качестве меры сходства используется не сумма весов, а мощность мотивов (8,9). Это делается путем численного моделирования, с помощью прогонки случайных бернуллиевских последовательностей через биологический банк. В работе также приводятся частично обоснованные теоретические оценки пороговых значений.

143.

6.2.2003

Sergey Lukyanov

Shemiakin and Ovchinnikov Institute of Bioorganic Chemistry RAS

A novel method for SNA detection using a new duplex-specific nuclease from crab hepatopancreas
See [D.A.Shagin et al. (2002) Genome Res. 12: 1935-1942].

Сергей Лукьянов

Институт биоорганическй химии им. Шемякина и Овчинникова РАН

Новый метод детекции однонуклеотидных замен с использованием дуплекс-специфической нуклеазы из камчатского краба

Детекция однонуклеотидных замен (SNP) приобретает в последнее время огромную значимость в различных областях биологии и медицины: в исследовании наследственных заболеваний, в фармо- и популяционной генетике, при картировании геномов и т.д. Несмотря на существование множества методов детекции SNP, ни один не получил пока повсеместного признания и распространения. Идеальный метод должен сочетать в себе точность и простоту анализа, легкость интерпретации данных и пригодность к полной автоматизации (роботизации) анализа.
Мы предложили новый высокоэффективный метод детекции SNP, названный DSNP (Duplex-Specific Nuclease Preference), основанный на уникальных свойствах новой дуплекс-специфической нуклеазы DSN (Duplex-Specific Nuclease). DSN специфически различает (гидролизует или нет) совершенные и несовершенные дуплексы коротких олигонуклеотидов и ДНК. Предложенный нами метод позволяет эффективно дискриминировать мутантную форму ДНК от ДНК дикого типа. В общем виде метод включает в себя следующие стадии: фрагмент ДНК, содержащий SNP, амплифицируется в ПЦР; далее, без промежуточной очистки, смешивается с DSN и зондом (сиксенс-специфический 10-звенный олигонуклеотид с флуоресцентной и "гасящей" метками). В ходе реакции только совершенные дуплексы между ДНК-мишенью и зондом расщепляются, приводя к накоплению флуоресцентного сигнала. Данный сигнал может быть детектирован визуально или с использованием стандартного лабораторного оборудования.

144.

13.2.2003

Fedor Kondrashov

NCBI, NLM, NIH, Bethesda, USA

Selection and introns in protein-coding genes

Фёдор Алексеевич Кондрашов

Национальный центр биотехнологической информации США

Отбор и интроны в белок-кодирующих генах

Интроны присутствуют в практически всех известных эукариотических геномах, а в некоторых интроны присутствуют в большом количестве. Тем не менее, предполагается, что отбор не влияет на интроны, и что интроны не влекут за собой никаких селективных последствий. Я представлю данные о действии естественного отбора на длину и на присутствие интронов, и о влиянии интронов на эффективность отбора в белок-кодирующих генах.

147.

20.3.2003

П.Новичков

IntegratedGenomics-Moscow и NCBI/NLM/NIH USA

Процедура автоматического поиска случаев горизонтального переноса генов

Анализ кластеров ортологичных групп белков (база данных COG) в близкородственных группах организмов позволяет для каждого белкового семейства получить оценку его относительной скорости эволюции.
Однако в некоторых случаях проведение такой оценки затруднено из-за наличия случаев горизонтального переноса генов. В данной работе предложен подход, позволяющий автоматически выявлять случаи горизонтального переноса и для этих случаев делать адекватную оценку относительной скорости эволюции.

148.

3.4.2003

D.Rodionov(with A.Vitrescvhak, A.Mironov, M.Gelfand)

GosNIIGenetika

Comparative genomics of the vitamin B12 metabolism and regulation in prokaryotes

Using comparative analysis of genes, operons, and regulatory elements, we describe the cobalamin (vitamin B12) biosynthetic pathway in available prokaryotic genomes. A highly conserved RNA secondary structure, the regulatory B12-element, is widely distributed in upstream regions of cobalamin biosynthetic/transport genes in Eubacteria. The binding signal (CBL-box) for a hypothetical B12 regulator was identified in some Archaea. Search for B12-elements and CBL-boxes and positional analysis identified a large number of new candidate B12-regulated genes in various prokaryotes. Among new proteins associated with the cobalamin biosynthesis there are several new types of cobalt transporters, ChlI and ChlD subunits of the CobN-dependent cobaltochelatase complex, cobalt reductase BluB, adenosyltransferase PduO, several new proteins linked to the lower ligand assembly pathway, L-threonine kinase PduX, and a large number of other hypothetical proteins. Most missing genes detected within the cobalamin biosynthetic pathways of various bacteria were identified as nonorthologous substitutes. The most variable parts of the cobalamin metabolism appear to be the cobalt transport and insertion, the CobG/CbiG- and CobF/CbiD-catalyzed reactions, and the lower ligand synthesis pathway. Another interesting result of analysis of B12-elements is that B12 -independent izozymes of the methionine synthase and ribonucleotide reductase are also regulated by B12-elements in bacteria that have both B12-dependent and B12-independent izozymes. Moreover, B12 regulons of various bacteria are thought to include enzymes from known B12-dependent or alternative pathways, as well as several hypothetical enzymes from unknown pathways.

Д.Родионов (и А.Витрещак, А.Миронов, М.Гельфанд)

ГосНИИГенетика

Реконструкция метаболизма кобаламина и анализ регуляторных B12-элементов у прокариот

Используя сравнительный анализ генов, оперонов и регуляторных элементов, был проанализирован метаболизм витамина B12 (кобаламина) и его регуляция во всех доступных бактериальных геномах. В 5`-нетранслируемых областях генов биосинтеза и транспорта кобаламина (CBL-генов) у большинства эубактерий была обнаружена высококонсервативная структура РНК, названная регуляторным B12-элементом. У некоторых архей были обнаружены потенциальные сайты связывания гипотетического B12-регулятора (CBL-боксы). Позиционный анализ CBL-генов и B12-элементов, а также CBL-боксов в археях, позволил обнаружить большое число новых членов B12-регулона в различных геномах. В частности, были обнаружены новые транспортеры кобальта (Co2+ необходим для биосинтеза кобаламина) и компоненты CobN-зависимового кобальтохелатазного комплекса (осуществляет включение ионов кобальта в тетрапирроловое кольцо предшественника кобаламина). К B12-регулону были отнесены новые кобальторедуктаза BluB, треонин киназа PduX и аденозилтрансфераза PduO, а также новые ферменты, участвующие в сборке нуклеотидной петли кобаламина; и другие гипотетические белки. Также были идентифицированы неортологичные замещения отсутствующих у некоторых бактерий генов биосинтеза кобаламина. Наиболее изменчивыми стадиями кобаламинового пути оказались транспорт ионов кобальта и их включение в корриновое кольцо, синтез нуклеотидной петли кобаламина, а также реакции, осуществляемые белками CobG/CbiG и CobF/CbiD. Впервые была показана регуляция B12-независимых изозимов метионин синтазы и рибонуклеотид редуктазы витамином B12 в бактериях, имеющих одновременно B12-зависимые и B12-независимые ферменты. Кроме того, в составе B12-регулонов некоторых бактерий были обнаружены различные ферменты, участвующие в известных B12-зависимых или альтернативных им метаболических путях, а также гипотетические ферменты из неизвестных метаболических путей.

150.

28.4.2003

Martin Farach-Colton

Department of Computer Science, Rutgers University

Genome Assemblies and Interval Graphs

The Sequence Assembly Problem is the computation end of the Human Genome Project (HGP). After individual chromosomal fragments are sequenced in the laboratory, the complete sequence of the genome must be assembled from the constituent parts. It is therefore one of the fundamental problems of bioinformatics.
The HGP started out with one general approach to sequencing (Hierarchical Shotgun Sequencing) and changed to another (Clone-based Sequencing). Both methods rely on similar mathematics related to interval graphs. However, the differences are crucial. We explore some of the implications of this change in sequencing technology. We present Barnacle, a method for assembling clone-based sequences and compare it with the NCBI assembler used for the official NIH assembly.

http://athos.rutgers.edu/~farach/

151.

22.05.03

Ольга Калинина, А.Б.Рахманинова (с М.Гельфандом и А.Мироновым)

Выделение позиций, определяющих функциональную специфичность белков, с помощью сравнительного анализа ортологических групп в белковых семействах

Исследование геномов открывает новые возможности для исследования специфичности белков. Мы представляем метод для поиска позиций, определяющий специфичность (specificity-determining positions, SDP). Исходными данными являются множественное выравнивание семейства белков, разбитое на группы с одинаковой специфичностью. Предполагается, что SDP сохраняются внутри групп и различаются между группами белков с разной специфичностью. Возможности алгоритма продемонстрированы на примере анализа LacI семейства бактериальных регуляторов транскрипции. Представлены результаты для MIP семейства транспортеров.

152.

12.6.2003

Matthew S. Meselson

Department of Molecular and Cellular Biology, Harvard University, USA

The evolutionary genetics of ancient asexuality in Bdelloid rotifers

See the Meselson Laboratory home page at
http://mcb.harvard.edu/meselson/
and the abstracts at
http://www.mcb.harvard.edu/Faculty/Meselson.html
http://mcb.harvard.edu/meselson/research.html
http://hermes.mbl.edu/labs/JBPC/Pages/meselsondesc.html

153.

19.6.2003

G. M. Crippen

College of Pharmacy, University of Michigan, Ann Arbor, USA

Building a conformation space for proteins and furnishing it with a potential function

We often refer vaguely to the conformation space of proteins and having a potential energy surface defined over it, but choosing a mathematically solid definition for this idea is not trivial. Recently we have generalized the discrete Haar wavelet transform to apply to polypeptide chains of arbitrary length. This can be used to define a standard position for a single chain in three-dimensional space, instead of the usual pairwise superposition of two conformers on each other. In this way, each conformer is mapped to a point in an abstract multi-dimensional conformation space, and the distance between points corresponds closely to the customary RMSD measure of conformational similarity. In any case, the high dimensionality of conformation space complicates the construction of empirical potential functions for the purpose of discriminating between native and nonnative folds. We will show some examples of exciting pitfalls.

154.

10.7.2003

Dmitry G. Vassylyev

Institute of Physical and Chemical Research, RIKEN Harima Institute (Harima, Japan)

Д.Г.Васильев

Структурное исследование одно- и многосубъединичных РНК-полимераз. Сходство и различия

The single-subunit bacteriophage T7 RNA polymerase carries out the transcription cycle in an identical manner to that of bacterial and eukaryotic multisubunit enzymes. Here we report the crystal structure of a T7 RNA polymerase elongation complex, which shows that incorporation of an 8-base-pair RNA-DNA hybrid into the active site of the enzyme induces a marked rearrangement of the amino-terminal domain. This rearrangement involves alternative folding of about 130 residues and a marked reorientation (about 130 degrees rotation) of a stable core subdomain, resulting in a structure that provides elements required for stable transcription elongation. A wide opening on the enzyme surface that is probably an RNA exit pathway is formed, and the RNA-DNA hybrid is completely buried in a newly formed, deep protein cavity. Binding of 10 base pairs of downstream DNA is stabilized mostly by long-distance electrostatic interactions. The structure implies plausible mechanisms for the various phases of the transcription cycle, and reveals important structural similarities with the multisubunit RNA polymerases. [Tahirov TH, Temiakov D, Anikin M, Patlan V, McAllister WT, Vassylyev DG, Yokoyama S. (2002) Structure of a T7 RNA polymerase elongation complex at 2.9 A resolution. Nature. 420(6911): 43-50].
In bacteria, the binding of a single protein, the initiation factor sigma, to a multi-subunit RNA polymerase core enzyme results in the formation of a holoenzyme, the active form of RNA polymerase essential for transcription initiation. Here we report the crystal structure of a bacterial RNA polymerase holoenzyme from Thermus thermophilus at 2.6 A resolution. In the structure, two amino-terminal domains of the sigma subunit form a V-shaped structure near the opening of the upstream DNA-binding channel of the active site cleft. The carboxy-terminal domain of sigma is near the outlet of the RNA-exit channel, about 57 A from the N-terminal domains. The extended linker domain forms a hairpin protruding into the active site cleft, then stretching through the RNA-exit channel to connect the N- and C-terminal domains. The holoenzyme structure provides insight into the structural organization of transcription intermediate complexes and into the mechanism of transcription initiation. [Vassylyev DG, Sekine S, Laptenko O, Lee J, Vassylyeva MN, Borukhov S, Yokoyama S. (2002) Crystal structure of a bacterial RNA polymerase holoenzyme at 2.6 A resolution. Nature. 417(6890): 712-719].

155.

25.8.2003

Dmitry Pervouchine

Center for BioDynamic, Boston University (USA)

Дмитрий Первушин

Structure prediction for RNA-RNA complexes

Recently, a considerable progress has been made in studying riboregulators, a class of non-coding RNAs that regulate gene expression at posttranscriptional level. The riboregulators are small RNA molecules that work by using base complementarity to hybridize with mRNA thereby modulating the translation of the encoded protein. Here we present a generic approach for prediction of structure of RNA-RNA complexes. The structure is obtained by free energy minimization using standard thermodynamic parameters for RNA folding. Essentially, the method can be regarded as a product of one sequence alignment and two MFOLD-type secondary structure prediction algorithms implemented as four-dimensional dynamic programming. Unlike other secondary structure prediction models, our method partially accounts for geometrical constraints on RNA backbone. We believe that this is the first attempt to make a step towards three-dimensional RNA structure prediction. We successfully predicted structures of several known RNA-RNA complexes, including oxyS-fhlA, dsrA-rpoS, and U6atac-U4atac systems, and found a number of functional suboptimal foldings. This method can be used for identification of antisense regulatory systems in sequenced organisms and for design of artificial riboregulators such as antisense drugs.

156.

29.8.2003

Georgii Bazykin

Dept. of Ecology and Evolutinary Biology, Princeton University

Positive selection at sites of multiple amino acid replacements since mouse-rat divergence

Егор Базыкин

Факультет экологии и эволюционной биологии, Принстонский университет

Положительный отбор в сайтах с несколькими аминокислотными заменами со времени дивергенции мыши и крысы

В тех кодонах, в которых со времени дивергенции мыши и крысы произошло две или три несинонимичные нуклеотидные замены, обе (или все три) замены происходили в той же линии (мыши или крысы) существенно чаще, чем ожидалось. Эта тенденция означает, что большинство двойных и тройных аминокислотных замен в этих линиях было связано с действием положительного отбора.

157.

29.8.2003

Leonid Mirny

MIT

Complexes and modules in the network of protein interactions

Proteins, nucleic acids, and small molecules form a dense network of molecular interactions in a cell. Molecules are nodes of this network and the interactions between them are edges. The architecture of molecular networks can reveal important principles of cellular organization and function, similarly to the way that protein structure tells us about the function and organization of a protein. Computational analysis of molecular networks has been primarily concerned with node degree [Wagner, A. & Fell, D. A. (2001) Proc R Soc Lond B Biol Sci 268, 1803-10; Jeong, H et.al. (2000) Nature 407, 651-4] or degree correlation [Maslov, S. & Sneppen, K. (2002) Science 296, 910-3.], and hence focused on single/two-body properties of these networks. Here, by analyzing the multi-body structure of the network of protein-protein interactions, we discovered molecular modules that are densely connected within themselves but sparsely connected with the rest of the network. Comparison with experimental data and functional annotation of genes showed two types of modules: (1) protein complexes (splicing machinery, transcription actors, etc.), and (2) dynamic functional units (signaling cascades, cell-cycle regulation, etc.). Discovered modules are highly statistically significant, as is evident from comparison with random graphs, and are robust to noise in the data.

158.

25.9.2003

Михаил Абрамович Ройтберг

ИМПБ

Выравнивание биологических последовательностей: новые алгоритмы

Будет дан обзор последних автора по сравнительному анализу первичных структур биополимеров: сравнению геномов (программа OWEN), выравниванию аминокислотных последовательностей белков с привлечением данных о вторичной структуре (программа StrSW). Предметом обсуждения будут как возможные постановки задач, так и алгоритмы.

159.

6.11.2003

Michael Brudno

Dept. of Computer Science, Stanford University

Alignment of Whole Genomes: Techniques and Algorithms

To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. The forthcoming availability of the rat genome in conjunction with the currently available mouse and human genomes necessitates the development of tools for multiple alignment of large genomes. While several tools for multiple alignment of individual genomic sequences have recently become available, applying these tools in the context of aligning whole genomes introduces many novel challenges. In this talk I will describe a method for multiple alignment of several long sequences, a novel approach to aligning two genomic sequences in the presence of rearrangements, and also present a multiple alignment of the whole human, mouse and rat genomes.
The two main classes of pairwise alignments are global alignment, where one string is transformed into the other, and local alignment, where all places of similarity between the two strings are returned. Global alignments are less prone to detecting false homology as each letter of one sequence is constrained to being aligned to only one letter of the other. On the other hand, local alignments can cope better with sequence rearrangements such as inversions, transpositions, and duplications; this, however, comes at the expense of a higher false positive rate, and a more fragmented global map.
I present Shuffle-LAGAN, a glocal alignment algorithm that is based on the CHAOS local alignment algorithm and the LAGAN global alignment algorithm, and is able to align long genomic sequences with rearrangements. To test Shuffle-LAGAN we aligned the entire human and mouse genomes using the Berkeley Genome Pipeline. We demonstrate that Shuffle-LAGAN has higher sensitivity and specificity than the leading local and global aligners.
We have designed and implemented a pipeline to align multiple whole genomes by combining the pairwise pipeline approach of the Berkeley Genome Pipeline and the MLAGAN multiple alignment algorithm. Our progressive alignment-based approach consists of two general steps: (1) identifying potential mouse-rat homology, and (2) searching the human genome for similarity to either the mouse-rat homologue pairs or the remaining unaligned mouse and rat sequences. This scheme easily generalizes to multiple alignment of any number of genomes with arbitrary phylogenetic relationship.
I present a whole-genome multiple alignment of human, mouse, and rat that was obtained using our pipeline. Using this alignment I discuss the sequence conservation between the three species in the different classes of biological features, and also present novel methods for visualization of multiple sequence alignments.

160.

2.12.2003

Pavel Pevzner

University of California at San Diego

Transforming Men into Mice: Lessons from Human and Mouse Genomic Sequences

Despite some differences in appearance and habits, men and mice are genetically very similar. In a pioneering paper, Nadeau and Taylor, 1984 estimated that surprisingly few genomic rearrangements (about 200) have happened since the divergence of human and mouse 75 million years ago.
The genomic sequences of human and mouse provide evidence for a larger number of rearrangements than previously thought and shed some light on previously unknown features of mammalian evolution. In particular, they provide evidence for extensive re-use of breakpoints from the same relatively short regions and reveal a great variability in the rate of micro-rearrangements along the genome. Our analysis also implies the existence of a large number of very short ``hidden'' synteny blocks that were invisible in comparative mapping data and were ignored in previous studies of chromosome evolution. These results suggest a new model of chromosome evolution that postulates that breakpoints are chosen from relatively short fragile regions that have much higher propensity for rearrangements than the rest of the genome.
This is a joint work with Glenn Tesler.

161.

23.1.2004

Shamil Sunyaev

Harvard University Medical School

Approaches to multiorganismal comparative proteomics

Шамиль Сюняев

Медицинский факультет Гарвардского университета (Бостон, США)

Сравнительная протеомика многих организмов

Mass spectrometry is the most efficient tool for identification of proteins and protein complexes. Conventional methods of the protein identification rely on the presence of protein sequences in the sequence database. We developed a series of mass spectrometry driven homology search algorithms to expand the scope of proteomics to a large number of organisms with yet unsequenced genomes. The algorithms were tested in computational simulations and multiple battlefield experimental tests. The utility of these algorithms, also extends to the orgnisms of fully sequenced genomes because of their ability to identify mispredicted proteins and alternative splicing isoforms. I will also present our new project on comparative analysis of protein-protein interaction graphs.

162.

26.1.2004

Shamil Sunyaev

Harvard University Medical School

Computional analysis of the genome variation and divergence

Шамиль Сюняев

Медицинский факультет Гарвардского университета (Бостон, США)

Компьютерный анализ разнообразия генома человеков

I will present several recent projects on the analysis of human population genetic variation and comparative genome analysis. We analysed the impact of human SNPs in protein coding regions and highly conserved non-coding regions on molecular function and fitness. The results of the analysis suggest new approaches to candidate gene association studies. Collaborative projects with medical geneticists on the inheritance of plasma cholesterol level, Alzheimer disease and IBD are ongoing. On comparative genomics side, evolutionary distance estimation based on insertions and deletions and the use of the upcoming chimpanzee sequence for detection of natural selection will be discussed.

164.

26.2.2005

С. Боринская

ИОГен РАН

Мифы, гены, народы: реконструкция элементов доисторического фольклора

Современная картина расселения человека по Земле и социокультурной эволюции человеческих обществ основана на данных антропологии, генетики, археологии и сравнительной лингвистики. При этом реконструируются в основном элементы материальной культуры. Узнать, что рассказывали друг другу наши предки 10 тысяч лет назад, когда еще не было письменности, кажется почти невозможным. Однако генетические данные помогают восстановить картину мира древних людей и те мифы, которые через тысячелетия дошли до их потомков. Сравнение мифологических традиций разных народов позволяет выявить сходные элементы, но во многих случаях невозможно установить, не является ли сходство результатом случайного совпадения. Например, у китайцев и у индейцев-ацтеков встречаются мифы о кролике на луне. Случайно ли это совпадение? или и те, и другие унаследовали этот миф от общих предков? Если «случайные» совпадения мифов встречаются преимущественно у генетически родственных народов, то это указывает на наследование мифа от общей предковой группы и позволяет отличить эту ситуацию от независимого появления сходных мифов в разных регионах (у народов, общих генетических линий не имеющих). Сопоставление с генетическим данными позволяет приблизительно установить дату и ареал формирования предковой версии мифа, а также направления его «расселения».
Работа основана на сопоставлении мифологических и генетических баз данных.

165.

11.3.2004

A.R.Rubinov

Algodign LLC

Comparison of graphs and molecular structures

А.Р.Рубинов

Cравнение графов и молекулярных структур

Будет дано представление о задачах теории графов, возникающих при работе с молекулярными структурами (вложение и декомпозици графов, поиск плотных подграфов), и описаны методы анализа и сравнения молекулярных структур и сайтов связывания, основанные на сопоставлении составляющих их подструктур.

166.

25.03.2004

Dmitry Rodionov with Alexey Vitreschak, Andrey Mironov and Mikhail Gelfand

Institute for Problems of Information Transmission RAS

Comparative genomics of amino acids biosynthesis in bacteria:
a variety of regulatory systems

Comparative analysis of genes, operons and regulatory elements was applied to analysis of the metabolism of two amino acids synthesized from aspartate, lysine and methionine. We report identification of a lysine-specific RNA element, named the LYS element, in the regulatory regions of bacterial genes involved in biosynthesis and transport of lysine. Similarly to the previously described riboswitches for three vitamins (riboflavin, thiamin and cobalamin), the lysine-responsive riboswitch is highly conserved on the sequence and structural levels. In contrast, regulation of the methionine biosynthesis and transport genes in bacteria is rather labile and involves two RNA-level regulatory systems and at least three DNA-level systems. In particular, the methionine metabolism in Gram-positive bacteria is controlled by the S-box riboswitch and the T-box mechanism, both acting on the level of premature termination of transcription. A candidate binding signal (MET-box) for a hypothetical methionine regulator, was identified in Streptococcaceae. From biochemical point of view, using of positional analysis of regulatory sites and genome context analysis allows us to identify new members of these amino acids regulons and to reconstruct corresponding metabolic pathways in bacterial genomes.

Дмитрий Родионов и А.Витещак, А.Миронов, М.Гельфанд

Институт проблем передачи информации РАН

Сравнительная геномика биосинтеза аминокислот у бактерий:
разнообразие регуляторных систем

С помощью сравнительного анализа генов, оперонов и регуляторных элементов были исследованы метаболические пути биосинтеза двух аминокислот, лизина и метионина. В регуляторных областях генов синтеза и транспорта лизина обнаружен новый консервативный элемент РНК. Лизиновый РНК-переключатель, также как и ранее описанные витамин-специфичные РНК-переключатели, сильно консервативен по последовательности и по вторичной структуре. Напротив, регуляция биосинтеза и транспорта метионина у бактерий весьма лабильна: известно по-крайней мере две РНК-овых и три ДНК-овых регуляторных ситемы. В частности, метаболизм метионина у грамположительных бактерий контролируется РНК-переключателем S-box, а также с помощью T-box механизма преждевременной терминации транскрипции. В группе стрептококков обнаружен новый сигнал MET-box для гипотетического метионинового регулятора. С биохимической точки зрения, использование позиционного анализа регуляторных сайтов и геномно-контекстный анализ позволили нам обнаружить новых членов данных аминокислотных регулонов и использовать это для метаболической реконструкции соответствующих метаболических путей.

167.

8.4.2004

В.Ю.Макеев

НИИГенетика

Локальный нуклеотидный состав и строение генома

Локальные вариации нуклеотидного состава возникают в результате взаимодействия ряда факторов влияющих на состав, а также селекции как на уровне собственно генома, так и на уровне закодированных белков. Факторы, стабилизирующие локальный нуклеотидный состав достаточно сильны, о чем, в частности, свидетельствует зависимость от этого состава частот встречаемости аминокислот в закодированных белках. Однако природа этих факторов, их связь с частотами мутаций и отбора на уровне ДНК до сих пор не ясна.
В то же время, эта стабилизация “фонового” состава, происходящая на уровне генома, позволяет “проявить” отдельные участки со специфическим соотношением изменчивости/селекции. Например, на протяжении длинных экзонов характерен однородный состав, а пространственная вариабельность существенно пониженна. У низших эукариот, в ряде случаев выделяются группы генов, возможно ко-регулируемые на уровне перестроек хроматина. В некодирующих областях, напротив, отмечаются резкие вариации с характерной длиной порядка 100 пар оснований. Не исключено, что таким путем проявляется какой-то новый уровень в организации/функционировании генома эукариот.

168.

8.7.2004

King Jordan¹, Fyodor A. Kondrashov², Ivan A. Adzhubei³, Yuri I. Wolf¹, Eugene V. Koonin¹, Alexey S. Kondrashov¹, Shamil Sunyaev³

¹ National Center for Biotechnology Information, NIH, Bethesda

² Section of Evolution and Ecology, University of California at Davis

³ Division of Genetics, Department of Medicine, Brigham & Women's Hospital, Harvard Medical School, Boston

A universal trend of amino acid gain and loss in protein evolution

Amino-acid compositions of proteins differ substantially between taxa and, thus, can evolve. For example, proteins from organisms with GC-rich (AT-rich) genomes contain more (less) amino acids encoded by GC-rich codons. However, no universal trends in ongoing changes of amino acid frequencies have been reported. We compared sets of orthologous proteins encoded by triplets of closely related genomes from 15 taxa representing all three domains of life, bacteria, archaea, and eukaryota. Five amino acids (Cys, Met, His, Ser, and Phe) are accumulated in at least 14 taxa, whereas four others (Pro, Ala, Glu, and Gly) are consistently lost. The same nine amino acids are also currently accumulated (lost) in human proteins, as revealed by data on human polymorphisms. All amino acids with declining frequencies are thought to be among the first incorporated into the genetic code; conversely, all amino acids with increasing frequencies, except Ser, were probably recruited late. Thus, expansion of initially underrepresented amino acids, which begun over 3 billion years ago, continues to this day.

169.

7.10.2004

Alexei Kourakine

Buck Institute for Age Research, Novato, CA, USA

Self-Organization versus Watchmaker: stochasticity and determinism in molecular and cell biology

Progress in our understanding of life systems at the molecular and cellular levels is critically dependent on both the technological advances, which expand scope and improve precision of experimental data, and the paradigm used to interpret the data. It is argued that the new experimental data generated as a result of recent progress in fluorescent imaging technologies and single-cell/single molecule analytical methods are increasingly inconsistent with conventional “clockwork” interpretations of a living cell and its organization. Determinism and linear causality used as interpretational defaults by conventional views are in a sharp contradiction with the experimental reality indicating on an inherent stochasticity and non-linear behavior underlying responses of individual molecules, cells and their populations. Just as the same pattern on the picture shown here can be interpreted as either two faces or a vase, the same set of experimental data viewed from two different paradigms gives rise to distinct perceptions of the same phenomena.
Examples from studies on single molecules, sub-cellular architecture and inducible gene expression are used to illustrate two distinct perceptions of biosystems - traditional clockwork-like image and a novel emerging view of biological systems treated as self-organizing fluxes or ever-evolving and dynamic organizations of interacting components.

170.

10.11.2004

Vadim Arshavsky

Harvard Medical School and the Massachusetts Eye and Ear Infirmary

Regulation of signal amplification, duration and adaptation in phototransduction

Phototransduction is the process by which a photon of light captured by a molecule of visual pigment generates an electrical response in a photoreceptor cell. Vertebrate rod phototransduction is one of the best-studied G protein signaling pathways. In this pathway the photoreceptor-specific G protein, transducin, mediates between the visual pigment, rhodopsin, and the effector enzyme, cGMP phosphodiesterase. Specific questions I will address are aimed at understanding the mechanisms underlying three basic properties of photoreceptor signaling. The first is light sensitivity, including the striking ability of rods to be activated by single photons. The second is rapid photoresponse recovery, which ensures repetitive signaling of photoreceptors. The third is adaptation, which allows photoreceptors to be sensitive in dim light and yet not become "blind" on sunny days.

171.

25.11.2004

Г.В. Лебедева, О.В. Демин

НИИ Физико-Химической Биологии им. А.Н.Белозерского, МГУ

Построение кинетической модели и исследование эффектов регуляции в сложных метаболических системах (на примере метаболизма пуринов в клетках E.coli)

Кинетическая модель биохимической системы – это система обыкновенных дифференциальных уравнений, описывающая ее динамические и регуляторные свойства.
В докладе будет рассказано об основных принципах построения кинетических моделей. На различных примерах будет продемонстрировано, как доступные из литературы in vitro и in vivo экспериментальные данные по кинетике отдельных ферментов и целых метаболических путей могут быть использованы для построения, исследования и верификации модели. Необходимость принимать во внимание как метаболический, так и генетический уровни регуляции будет показано на примере кинетической модели метаболизма пуринов в E.coli.

173.

23.12.2004

Konstantin Severinov

Rutgers University

Phageomics: genomic, biochemical and modeling studies of bacteriophage development

It has been estimated that bacteriophage is the most abundant and diverse life form on our planet. Recent advances of genomic sequencing have let to explosive growth of new bacteriophage genomes. However, bioinformatic analysis alone has proven not sufficient for understanding of gene expression strategies used by novel, highly diverse bacteriophages. We will present, using an example of bacteriophage Xp10 that infects Xanthomonas oryzae, the results of a concerted effort that involves bioinformatics, biochemistry, global transcription profiling and kinetic modeling to udnerstand the infection process by new bacteriophages.

174.

10.02.2005

Vsevolod Makeev

GosNIIGenetika

Clusters of transcription factor binding sites in Drosophila CRMs

Regulatory modules of eukaryotes show homotypic and heterotypic clustering of binding sites for transcription regulatory factors, which is necessary to provide for combinatorial regulation. Experiments show that one of the function of such clustering is providing for a functional response curve of factor concentration. Using a collection of annotated Drosophila enhancers it was observed that

1. Approximately 1/3 of regulatory proteins show clustering of cognate binding sites.

2. Binding sites in clusters are arranged in non-trivial way.

3. The arrangement and the presence of sites in clusters of different drosophila species is non-trivial and not always clusters coincide with conservative regions.

175.

24.02.2005

Eugene V. Koonin

National Center for Biotechnology Information, Bethesda, USA

Social status of a gene in the genomic community

In addition to multiple, complete genome sequences, genome-wide data on biological properties of genes, such as knockout effect, expression levels, protein-protein interactions, and others, are rapidly accumulating. Many significant correlations between these variables have been detected, e.g., a positive correlation between the tendency of a gene to be lost during evolution and sequence evolution rate, and negative correlations between each of the above measures of evolutionary variability and expression level or the phenotypic effect of gene knockout. However, most of these correlations are relatively weak and explain a small fraction of the variation present in the data. We propose that the relationships between the phenotypic (“input”) and evolutionary (“output”) variables can be better described with a single, composite variable, the gene's "social status in the genomic community". "High-status" genes, involved in house-keeping processes, are more likely to be higher and broa der expressed, to have more interaction partners, and to produce lethal or severely impaired knockout mutants. These genes also tend to evolve slower and are less prone to gene loss across various taxa. "Low-status" genes tend to be weakly expressed, have fewer interaction partners, and exhibit a narrower (and less coherent) phyletic distribution. On average, these genes evolve faster and are more often lost during evolution than high-status genes. The “gene status” notion may serve as a generator of null hypotheses regarding the connections between phenotypic and evolutionary parameters associated with genes. Any deviation from the expected pattern calls for attention – to the quality of the data, the nature of the analyzed relationship, or both. This is a joint work with Yuri I. Wofl and Liran Carmel.

176.

25.02.2005

Michael Galperin

National Center for Biotechnology Information, Bethesda, USA

Complete microbial genomes: A long way to complete understanding

Полные бактериальные геномы: долгий путь до полного понимания

177.

10.03.2005

Maxim Frank-Kamenetskii

Center for Advanced Biotechnology, Boston University

Torturing DNA with and without PNA

The DNA double helix can be sequence-specifically opened up in designated sites using peptide nucleic acid (PNA). This makes it possible to develop various new techniques for duplex DNA manipulation such as: DNA capturing, detecting ligand binding sites, assembly of topological link, making sequence-specific nick, etc. A new class of PNAs, pseudocompelentary PNAs, make it possible to sequence-specifically and very effectively deny proteins their binding sites on DNA and bend DNA at any site.
Recently introduced nicking enzymes have allowed us to make unique samples of nicked and gapped DNA fragments with all possible base pairs flanking the nick. Such DNA exhibits equilibrium between straight and kinked conformations, which is governed by the stacking interaction between base pairs flanking the nick. Studying mobility of nicked DNA in gel, all 10 stacking parameters have been determined. As a result, separate sequence-specific contributions of base stacking and base pairing into the stability of the DNA double helix has been determined for the first time. The results show that the double helix owes its thermal stability predominantly to stacking interactions.

178.

31.03.2005

В.С.Фридман

Лаб.экологии и охраны природы, Биологический ф-т МГУ

Сигнальные системы и механизмы коммуникации животных: современные представления и проблемы анализа

Коммуникативные системы птиц и других позвоночных весьма детально изучены в отношении эффектов и социальных последствий употребления демонстраций-потенциальных сигналов. Но само существование сигнальных свойств у выделенных этологами единиц поведения, наличие у них коммуникативных функций до сих пор подвергаются вполне обоснованному сомнению. В чём причина столь парадоксальной ситуации?
У всех видов позвоночных сигнальная система носит смешанный характер: одно и то же предъявление сигнала соединяет свойства знака и стимула. Один и тот же процесс коммуникации, в зависимости от временного паттерна взаимодействий (устанавливаемого и регулируемого сообществом, не отдельными особями) может идти в одном из нескольких альтернативных режимов, с альтернативным способом использования сигналов, имея разные последствия на популяционном уровне. Ситуационно-зависимое переключение между режимами – практически единственный способ регуляции смысла сообщений у низших позвоночных, в более высших группах его дополняет и постепенно вытесняет другой – обогащение значения самих сигналов в зависимости от усложнения ситуации, роста мотивационного уровня, наконец, накопления индивидуального опыта. Именно этот более специализированный тип управления безраздельно господствует в сигнальной системе человека.
Неспецифичность и противоречивость сигнальной системы животных состоит в том, что все сигнальные действия (демонстрации) одновременно являются знаком и стимулом, тогда как у человека функции “информации” и “влияния” осуществляются хоть и в одном процессе общения, но на разном носителе. Функционирование демонстраций в качестве сигналов и стимулов возможно лишь последовательно, а не параллельно, вследствие чего коммуникация животных оказывается прерывисто-пульсирующей, в отличие от непрерывного общения в человеческих знаковых системах и имеет результатом оптимальное распределение особей в популяции, а не усложнение словаря, как в языковых общностях у человека.
Прошедшая критику реконструкция сигнальных систем разных групп позвоночных, от рыб до птиц и млекопитающих, позволит им занять своё место в сравнительном ряду природных семиотических систем, от генотипа до человеческой речи.
Предложена процедура выделения потенциальных сигналов в потоке поведения как структур однозначно воспринимаемых всеми участниками взаимодействия, независимо от “шума” в канале связи, обусловленного, в первую очередь их собственным поведением, разработан метод отделения структур с сигнальным значением (демонстраций) от экспрессивных действий и выразительных движений, отражающих только уровень общего возбуждения. Показано, что устойчивость и дискретность демонстраций в потоке поведения определяется скоррелированностью во времени исполнения всех элементарных движений – составных частей позы или ритуала, а не экспрессивностью (демонстративностью) каждого элементарного движения.
Показано, что именно скоррелированность исполнения моторно не связанных движений – элементов сигнала, незначимых самих по себе, мера устойчивости данной структуры в условиях внутреннего и внешнего противостояния, направленного на её подавление и дестабилизацию. Показано существование “двойного членения” в сигналах животных: структуры образуются комбинаторикой субъединиц, незначимых самих по себе (элементарные двигательные акты).
Комбинаторика упорядочена набором видоспецифических правил (аналог системы оппозиций в выделении фонем языка), позволяющих разным видам рода сформировать видоспецифический сигнальный репертуар на базе однотипного набора ЭДА. Наличие “двойного членения” делает сигнальные системы животных сопоставимыми со знаковыми системами человека; отличия заключаются в разноуровневости значащих единиц у животных, единицей смысла бывает и отдельный ЭДА, и демонстрация, и ритуал (упорядоченная последовательность демонстраций).
Аналогично, ставится под сомнение существование в популяциях специальных систем связи, осуществляющих управление. Объявляется недоказанным существование механизмов регуляции поведения индивидов со стороны системы, действующих в популяционном, а не эволюционном масштабе времени, пусть даже эти механизмы не имеют сигнальной природы.
Предполагается, что такие воздействия - прерогатива исключительно ультимативных механизмов, связанных с движущим отбором поведенческих признаков особей вместо стабилизирующего отбора оптимальной структуры группировок. Тем самым понятие популяции фактически приравнивается к выборке населения определённой территории. Вместе с тем в последнее десятилетие накоплено множество косвенных доказательств реальности популяции как объекта системной природы, который управляет распределением особей данного вида по определённой территории.
Лишь корректная реконструкция видоспецифических сигнальных систем, отделение сигналов, передающих информацию от экспрессивных действий, вызывающих общее возбуждение, доказательная реконструкция механизмов коммуникации животных в сообществе позволяют возвратить понятию популяции онтологический статус, перевести его из разряда «полезных фикций» в число реальных систем, изучаемых натуралистами. Тем самым популяционное мышление биологов наконец получит объект, натурно соответствующий исходному замыслу.
Вне образования горизонтальных связей между особями, установленных процессом коммуникации, без раскрытия конкретных механизмов распространения сигналов в сообществе невозможно выделить популяцию как посредника между особью и отбором, - а тогда стоит ли предполагать их реальное существование?
Гипотезы о факторах эволюции социальности и сигнальных систем позвоночных в “макромасштабе” семейств, отрядов и классов определяются нашим представлением о «степени реальности» существования коммуникативных сетей и специфических сигналов у всех видов, участвующих в этом процессе, от реальности существования популяции как объекта системной природы, управляемого информационными процессами. К слову, объектом, изоморфным популяции, выступает любое человеческое сообщество (научное, педагогическое, политическое или рыночное), так как тоже структурируется информационными потоками и конкуренцией, основанной на результатах восприятия сигнальной информации (Налимов, Мульченко, 1969).

179.

21.04.2005

F.A.Kondrashov

Prediction of pathogenic mutations in mitochondrial tRNAs

Although tRNAs compose only 10% of the human mitochondrial genome a half of all the known pathogenic mutations are found in them. Previous attempts to a priory identify deleterious and pathogenic mutations in mitochondrial tRNAs have been mildly successful. Presumably, the ability to identify pathogenic mutations using computational methods would greatly benefit diagnosis of mitochondiral disease. I will describe an ad hoc method, based on patterns of compensatory evolution, that identifies benign substitutions in mitochondrial tRNA stems with a negligible false positive rate. The application of this method to loops is impossible without extensive data on 3D structures of these mithochondrial tRNAs, however, a comperative analysis also provides reasonable prediction rates, when appropriate species are used in the comparison. This analysis also shows that a substantial fraction of polymorphisms of mitochondrial tRNAs, that segregating in the human population, are deleterious.

180.

28.04.2005

Leonid Mirny

Massachusetts Institute of Technology

How a protein finds its binding sites in DNA?

Recognition and binding of specific sites on DNA by proteins is central for many cellular functions such as transcription, replication, and recombination. In the process of recognition, a protein rapidly searches for its specific site on a long DNA molecule and then strongly binds this site. We aim to find a mechanism that can provide both a fast search (1-10 sec) and high stability of the specific protein-DNA complex. Earlier studies have suggested that rapid search involves the sliding of a protein along the DNA. Here we consider sliding as a one-dimensional (1D) diffusion in a sequence-dependent rough energy landscape. We demonstrate that, in spite of the landscape's roughness, rapid search can be achieved if 1D sliding is accompanied by 3D diffusion. We estimate the range of the specific and non-specific DNA-binding energy required for rapid search and suggest experiments that can test our mechanism. We show that optimal search requires a protein to spend half of time sliding along the DNA and half diffusing in 3D. We also establish that, paradoxically, realistic energy functions cannot provide both rapid search and strong binding of a rigid protein. To reconcile these two fundamental requirements we propose a search-and-fold mechanism that involves the coupling of protein binding and partial protein folding. Proposed mechanism has several important biological implications for search in the presence of other proteins and nucleosomes, simultaneous search by several proteins etc. Proposed mechanism also provides a new framework for interpretation of experimental, chromatin-IP and structural data on protein-DNA interactions.
References:
Slutsky M. and Mirny L.A. Kinetics of protein-DNA interaction: facilitated target location in sequence-dependent potential. Biophys J. 2004 87(6):4021-35.
PDF

181.

12.05.2005

Василий Раменский

Институт молекулярной биологии им. Энгельгардта, РАН, Москва

PolyPhen: метод вычислительного анализа несинонимичных SNPs человека.

PolyPhen: a tool for computational analysis of human non-synonymous SNPs.

Однонуклеотидные полиморфизмы (SNP) являются основой внутривидовой изменчивости. Поиск SNPs, влияющих на фенотип, является одной из приоритетных задач медицинской генетики. Такими SNPs в первую очередь являются варианты в кодирующих частях генов, приводящие к вминокислотным заменам (non-synonymous SNPs). Разработан метод PolyPhen (http://www.bork.embl-heidelberg.de/PolyPhen) позволяющий с высокой вероятностью предсказывать структурный и функциональный эффект таких замен. Произведен вычислительный анализ всех известных на сегодняшний день однонуклеотидных полиморфизмов человека, устойчиво существующих в человеческой популяции. Получены предсказания, которые помогут сделать выводы о генетических факторах, обуславливающих предрасположенность к так называемым сложным, или мультифакториальным, заболеваниям человека и выявить приоритетные кандидатные полиморфизмы для прямых ассоциативных исследований, проводимых в медицинской генетике.

182.

26.05.2005

Андрей Фоменко

НИИ Биомедицинской химии РАМН, Москва

Предсказание специфичности ферментов с использованием MNA дескрипторов

Prediction of enzyme specificity based on structural MNA descriptors

Традиционно в биоинформатике аминокислотная последовательность представлена как символьная строка. При таком представлении химическая структура белка описывается косвенно, как частотная характеристика аминокислотного остатка в позиции выравнивания. Мы представляем исследование по прогнозу специфичности ферментов, используя описание аминокислотной последовательности в виде множества MNA-дескрипторов. На небольшой выборке белков мы показали, что использование MNA-дескрипторов в этой задаче (прогноз позиции белка в классификации ферментов) позволяет достичь весьма высокой точности.

183.

20.06.2005

Павел Певзнер

Университет Калифорнии, Сан-Диего

Полногеномный анализ Alu-повторов раскрывает сложную эволюционную историю

Whole-genome analysis of Alu repeat elements reveals complex evolutionary history

Alu repeats are the most abundant family of repeats in the human genome, with over 1 million copies comprising 10% of the genome. They have been implicated in human genetic disease and in the enrichment of gene-rich segmental duplications in the human genome, and they form a rich fossil record of primate and human history. Alu repeat elements are believed to have arisen from the replication of a small number of source elements, whose evolution over time gives rise to the 31 Alu subfamilies currently reported in Repbase Update. We apply a novel method to identify and statistically validate 213 Alu subfamilies. We build an evolutionary tree of these subfamilies and conclude that the history of Alu evolution is more complex than previous studies had indicated.

This is a joint work with Alkes Price and Eleazar Eskin.

184.

10.11.2005

Михаил Брудно

Университет Торонто

Реконструкция порядка генов в предковых геномах через потоки в сетях

Reconstructing Ancestral Genome Order via Network Flow

We present a novel alignment algorithm that reconstructs a likely ordering of the genome of the ancestor of two organisms. While the genome of this ancestor is not known, one can use the outgroup genomic sequences to establish this order (if reconstructing the ancestor of mouse and rat, then human, dog, and chicken are all outlgroups). By making several assumptions we formulate the problem of ordering a set of alignments based on a likely ancestral order as a network flow problem, which is efficiently solvable using linear programming approaches. The ancestral order of two genomes can be used, for example, for progressive alignment. The difficulty of applying progressive alignment to whole genomes or other sequences with rearrangements is due to the lack of a general ordering on the alignments - it is possible to order based on wither of the two genomes, but not both. By using the algorithm above we order these according to their most recent common ancestor, and then align the resulting chain of alignments. This approach has several advantages over previous algorithms:

1) it does not assume a base genome, to which all other genomes are aligned, but creates a symmetric alignment equally valid for all genomes,

2) it penalizes various rearrangement events progressively based on an evolutionary tree, creating a set of alignments that mirrors the evolutionary history of the sequences, and

3) it is able to align short, low similarity syntenic areas based on their adjacency to higher similarity areas even when there has bee! n a rearrangement event between the two areas.

We will present the results of some first analyses of whole genome alignment using this algorithm.

185.

24.11.2005

Валентина Боева (и Всеволод Ю. Макеев)

ФБиБи МГУ и ГосНИИГенетика

Вырожденные тандемные повторы в последовательностях ДНК: алгоритм поиска и его применение в геномах D. melanogaster and D. pseudoobscura.

Я собираюсь рассказать о достаточно новом алгоритме поиска вырожденных тандемных повторов в ДНК, который основан на оценивании статистической значимости множества найденных "кандидатных" повторов. Используя этот алгоритм, были получены распределения тандемных повторов в различных функциональных участках D. melanogaster and D. pseudoobscura; были найдены "предпочитаемые" периоды повторов в кодирующих и некодирующих областях, включая UTRs, гетерохроматин, межгенные и регуляторные участки.

Valentina Boeva (and Vsevolod Makeev)

Dept. of Bioengineering and Bioinformatics, MSU and GosNIIGenetika

Fuzzy tandem repeats in DNA sequences: identification algorithm and biological applications on genomes of D. melanogaster and D. pseudoobscura

I will talk about a new algorithm for fuzzy tandem repeats identification that is based on calculation of their statistical significance (p-value). Using the algorithm some interesting facts were observed on distribution of fuzzy tandem repeats across the genomes of D. melanogaster and D. pseudoobscura and in various functional regions, such as coding and non-coding regions including UTRs, heterochromatic, intergenic and enhancer sequences of D. melanogaster and D. pseudoobscura.

186.

8.12.2005

А.А Миронов

ФБиБи МГУ

Использование ранговых статистик для естественного определения порога (презентация)

Вчера я видел раков. Больших. По 5. А сегодня я видел раков по 3. Но маленьких. Выбирай, но осторожно - или большие по 5, но вчера, или сегодня по 3, но маленькие." (М. Жванецкий)

Решению этой сложной дилеммы посвящено сообщение. Проблема выбора порога встречается почти в каждой биоинформатической задаче, и, как правило, при этом используется более или менее произвольные значения. В настоящей работе представлен новый метод определения порога, суть которого сводится с следующей простой схеме. Допустим, мы сделали n наблюдений. Задача состоит в том, чтобы из этих n наблюдений выбрать k значимых. Для этого полученные значения сортируются по убыванию, и для каждого i оценивается вероятность того, что при случайных данных не менее i имеют значение, превышающее или равное значению в позиции i. Эта вероятность может рассматриваться в качестве p-value(i). Выбрав минимальное значение p-value=min(p-value(i)) мы определяем значимые наблюдения k=argmin(p-value(i)). Приведено несколько примеров использования этой техники в задачах сравнительной геномики, обсуждены возможные подводные камни и способы борьбы с ними.