Moscow seminar on Bioinformatics 2009-2011

Московский семинар

по биоинформатике

Новости

Контакты

Схема проезда:

ИМБ

ФБиБи, МГУ

Статья о семинаре

Краткие резюме докладов

2012-14 2009-11 2006-08 2003-05 2000-02 1997-99 1994-96

240.

2.02.2009

Christoph Haag

University of Fribourg, Switzerland

Evolutionary genetics of Daphnia metapopulations

Population subdivision and extinction-recolonization dynamics can lead to increased genetic load and reduced genetic diversity. We study the evolutionary genetic effects of population subdivision and extinction-recolonization dynamics in metapopulations of Daphnia inhabiting small rock pools on the coast of the Baltic Sea in southern Finland. Results from this ongoing work suggest that local populations suffer from a high genetic load, mainly because colonization involves strong founder events. I will show experimental results for two important consequences of the high genetic load, suggesting that (i) hybrids, which are formed after immigration into local populations, enjoy a high fitness advantage, and (ii) the invasion of obligate asexual competitor strains is facilitated. Finally, I will show preliminary evidence that, despite the strong population structure, a large number of fitness-affecting genes segregates within populations, but most of these gene have only weak fitness effects. http://www.unifr.ch/biol/ecology/haag/haag_lab_home.html

241.

19.02.2009

К.Ю.Еськов

Палеонтологический институт РАН

"Переходные формы" и "недостающие звенья": слишком мало их или слишком много?

Проблема "переходных форм" между таксонами высокого ранга стояла перед теорией эволюции с момента ее возникновения. Поначалу казалось, что их "слишком мало" (по сравнению с теоретической Дарвинской моделью); парадокс "редкости переходных форм" был более или менее успешно разрешен в рамках теорий "прерывистого равновесия" и "когерентной/некогерентной эволюции". Более того: новые открытия в области палеоонологии показывают, что этих форм скорее "больше, чем надо" -- в том смысле, что привычную нам по учебникам картину макроэволюции крупных таксонов они скорее запутывают, чем проясняют...

242.

2.03.2009

Шамиль Сюняев

Brigham and Women's Hospital, Harvard Medical School

Human mutations: learning from resequencing data and functional genomics

Human mutation rate is highly heterogeneous. It exhibits context-specific and regional variation. Analysis of replication timing and chromatin accessibility vis-a-vis population resequencing data and comparative genomics reveals biological mechanisms underlying mutation rate variation. Surprisingly, rate and effects of mutations are not statistically independent. Understanding human mutation rate is of importance for interpreting individual human genomes. Of immediate practical interest is interpretation of sequence data in a genetics diagnostics lab. Population genetics modeling and computational prediction of the effect of mutations may assist genetic counseling.

243.

10.03.2009

Eugene V. Koonin

National Center for Biotechnology Information, NLM, NIH

Search for a tree of life in the midst of the phylogenetic forest

Prokaryotic genomics revealed the wide spread of horizontal gene transfer (HGT) among prokaryotes, a major development that is often considered to undermine the Tree of Life (TOL) concept. However, the possibility remains that the TOL could be salvaged as a statistical central trend in the phylogenetic “Forest of Life” (FOL). I present the results of a comprehensive comparative analysis of 6901 phylogenetic trees for prokaryotic genes including a group of 102 nearly universal trees (NUTs). A complete matrix of topological distances between the tress was constructed and analyzed using Principal Component analysis, Classic Multidimensional Scaling, and a newly developed measure termed Inconsistency Score that reflects the average topological consistency of a given tree with the rest of the trees in the FOL. Although we detected high levels of inconsistency among the trees comprising the FOL, most likely, owing to extensive HGT, a distinct signal of vertical inheritance that was particularly strong among the NUTs was found as well. Despite the substantial amount of apparent HGT seen in the NUTs, the transfer events seemed to be distributed randomly and did not obscure the vertical signal. The topology of the NUTs was highly similar to those of numerous other trees in the FOL, so although the NUTs cannot represent the FOL completely, they might reflect a significant central trend. However, the consistency between the trees in the FOL was seen mostly at shallow depths of the trees and abruptly dropped at greater phylogenetic depths that correspond to the radiation of archaeal and bacterial phyla. This observation suggests the possibility that the early phases of evolution were non-tree-like (a Biological Big Bang), so the search for a central trend in the FOL could be futile. We addressed this problem by simulating evolution under the Compressed Cladogenesis model and the Biological Big Bang model, and found that the Compressed Cladogenesis scenario provides a better approximation of the observed dependence between tree inconsistency and phylogenetic depth.
To conclude, HGT is pervasive in the prokaryotic world, so that there are very few fully consistent phylogenetic trees. These findings make the original TOL concept obsolete. Nevertheless, the signal of vertical inheritance seems to be discernible throughout the evolution of archaea and bacteria although, under the Compressed Cladogenesis model, it seems unlikely that the relationships between the major archaeal and bacterial clades can be unequivocally resolved. However, the idea of a TOL as a central trend in the thicket of the forest of life appears viable and worth further investigation.

244.

22.04.2009

Иван Кулаковский, Александр Фаворов, Всеволод Макеев

ГосНИИГенетика

Улучшенная идентификация сайтов связывания факторов транскрипции и мотивов узнавания с использованием картированных на геном данных ДНКазного футпринтинга

Данные футпринтинга представляют собой важный источник информации мотивах, которые распознаются на ДНК специфическими факторами, регулирующими транскрипцию. Иногда результат футпринтинга может быть сдвинут относительно настоящего сайта связывания, что может быть обнаружено и скорректировано рассмотрением фрагментов генома, примыкающих к исходному футпринту. Мы создали вычислительный конвейер, включающий в себя построение позиционных матриц, оценку длины и качества мотива. Для выравнивания футпринтов мы использовали гиббсовский сэмплер SeSiMCMC и эвристический алгоритм Bigfoot. Мотивы распознавания были построены для 41 фактора из базы данных ДНКазного футпринтинга для D.melanogaster. Примерно в половине случаем были обнаружены футпринты, не содержащие сайтов. Довольно часто "пропавший" сайт удавалось найти при рассмотрении фланкирующих последовательностей длиной всего 2 п.н. с обеих сторон от футпринта. Уточненные мотивы, построенные с учетом фланкирующих областей, распознают сайты в футпринтах с лучшей специ фичностью, чем существующие модели. Также мы уделили внимание возможной проблеме "переобученности" моделей.
В качестве развития идеи построения уточненных моделей на базе футпринтинга мы обсудим возможность интеграции различных источников данных последовательностях, связываемых фактором транскрипции. Мы рассмотрим быстрый алгоритм поиска мотивов Chipmunk и его применение для прямой интеграции данных из различных источников. Кроме того мы рассмотрим алгоритм сравнения мотивов связывания одного и того же фактора, полученных разными способами, или их разных исходных данных.

245.

30.04.2009

Sergei Nuzhdin

University of Southern California, Los Angeles, USA

Population genomics of local adaptations

A powerful way to map functional genomic variation, and reveal the genetic basis of local adaptation, is to associate allele frequency at polymorphisms across the genome with environmental conditions (Hancock et al. 2008; Turner et al. 2008). Serpentine soils, characterized by naturally high heavy metal content, low calcium-to-magnesium ratios, and low nutrient and moisture content, are a classic context for studying adaptation of plants to local soil conditions (Kruckeberg 1951; Kruckeberg 1984). To investigate if Arabidopsis lyrata is locally adapted to serpentine soil, and map polymorphisms responsible for such adaptation, we pooled DNA from individuals on serpentine and non-serpentine soils and sequenced each “gene pool” with the Illumina Genome Analyzer. This allowed the identification and characterization of 8.4 million genomic polymorphisms on two serpentine and two non-serpentine populations in close proximity. A large proportion of the most differentiated polymorphisms occur at loci involved in heavy metal detoxification, calcium and magnesium transport, and the osmotic stress response, providing numerous candidate mutations for serpentine adaptation. Sequencing of several candidate loci in the European subspecies of A. lyrata indicate parallel differentiation of the same polymorphism at one locus, confirming ecological adaptation, and different polymorphisms at other loci, possibly indicating convergent evolution.

246.

18.06.2009

Wladek Minor

University of Virginia, Charlottesville, USA

Protein Crystallography with Speed and Finesse - Toward the Future of Structural Biology

The ultimate mission of structural genomics (SG) is the characterization of the entire protein universe. Experimental procedures are being developed to expedite and automate all stages of the experimental protein structure determination process (e.g. cloning, expression, crystallization, data collection, processing, phasing and deposition to Protein Data Bank). The detail analysis of structure-function relation as applicable to drug discovery is still a major bottleneck both from bioinformatics and experimental points of view.
Over 90% of X-ray protein structures deposited in the Protein Data Bank contain ordered small molecules, such as enzyme substrates, cofactors or ions. These ligands can be divided into two groups: molecules that are relevant to protein function and non-physiological agents introduced during sample preparation (purification, crystallization or cryocooling). The analysis shows that the structural and chemical quality of small molecule models in protein structures does not correlate with structure resolution. In particular, the analysis of metal-protein interaction distances, coordination numbers, B-factors (displacement parameters), and occupancies of metal binding sites in protein structures determined by X-ray crystallography and deposited in the PDB shows many unusual values and unexpected correlations. Our approach may be used for fast identification of metal-binding structural motifs that cannot be identified on the basis of sequence similarity alone will be discussed and the application of these analysis for drug discovery techniques will be presented.

247.

29.10.2009

Дмитрий Родионов

ИППИ РАН, Москва

Геномная энциклопедия метаболических путей утилизации сахаров, полученная методами сравнительного анализа бактериальных геномов

Для решения фундаментальной задачи реконструкцции аппарата утилизации сахаров в любом микроорганизме напрямую из его геномной последовательности был разработан новый подход, основанный на метаболической реконструкции и сравнительной анализе подсистем. Этот подход был применен к группе из 19 различных бактерий рода Shewanella с доступными геномными последовательностями. Основные этапы данного подхода: обнаружение генов-кандидатов на основе сходства аминокислотных последовательностей с генами из коллекции известных компонент путей утилизации сахаров; функциональная аннотация ортологов и предсказание альтернативных генов и вариантов путей путем анализа их геномного и функционального контекстов (опероны, регулоны, подсистемы); проверка биоинформатических предсказаний с помощью биохимических, генетических и физиологических экспериментов.
Новая геномная энциклопедия путей утилизации сахаров содержит около 170 белковых семейств (ферменты, транспортеры, регуляторы), неравномерно распределенных в исследуемых геномах Shewanella и формирующих 17 индивидуальных метаболических путей. Сравнение соответствующих генов из Shewanella и других бактерий позволяет смоделировать их возможную эволюцию и влияние на адаптацию бактерий к различным условиям окружающей среды. Более трети обнаруженных генов и более двух третей реконструированных путей являются ранее неизвестными вариантами, то есть либо неортологичными замещениями известных генов или генами, использующимися в альтернативных биохимических путях. Некоторые биоинформатичексие предсказания были экспериментально подтверждены в ходе работы. Предложенный подход может быть применен к любой другой группе бактерий с имеющимися геномными последовательностями. Аккумулированные геномные аннотации всех компонент путей утилизации углеводов в полных микробных геномах позволит провести аккуратное распознавание соответствеющих функций и вариантов метаболизма в новых метагеномных данных.

248.

19.11.2009

А.С. Раутиан

Палеонтологический институт РАН

О природе фенотипа и наследственности

Термин "генотип" обозначает одновременно содержание наследст венной информации и ее материальный носитель.
Содержание генотипа является не столько следствием его свойств как материального носителя, сколько следствием свойств фенотипа, которому он адресован.
Определенность содержания генотипа зависит не столько от устойчивости его элементов - генов, сколько от устойчивости (определенности) исторически сформировавшегося фенотипа адаптивной нормы ("дикого типа").
Генотип обладает определенным содержанием только для данного уже фенотипа.
Генотип как вся наследственная информация является аспектом фенотипа, а не его частью, и в этом смысле не представляет собой самостоятельной сущности.
Каждый элемент фенотипа, в том числе и генотип, по отношению к другим его элементам является получателем и, в то же время носителем наследственной информации.
Генотип как генетический код является специализированным, но далеко не единственным "органом" хранения и передачи генетической информации.
Биологический смысл генетического кода и его структурно-функциональное обособление в пределах сомы - в создании неуничтожимого в процессе онтогенеза пула наследственной информации.
Высокая универсальность многих смысловых единиц генетического кода - свидетельство глубокого филогенетического единства всех ныне живущих организмов, а не имманентной связи носителей наследственной информации с их содержанием.
Наследственность в ее широком первоначальном смысле, как способность потомков воспроизводить в процессе индивидуального развития свойства предков в нисходящем ряду поколений, является целостным неразложимым (точнее разложимым лишь для операциональных целей) свойством живого. "Вещества наследственности" нет и не может быть, как нет и не может быть "вещества информации".
Более подробный текст приложен.

249.

4.03.2010

Artem Novozhilov

On the origin and evolution of the standard genetic code

The genetic code is nearly universal, and the arrangement of the codons in the standard codon table is highly non-random. The fundamental question is how these regularities of the standard code came into being. In this talk I review the three main concepts on the origin and evolution of the genetic code, which are the stereochemical theory, the coevolution theory, and the error minimization theory. I present some recent results concerning the level of adaptation of the genetic code. I also discuss putative primordial genetic codes, which are shown to possess an exceptional level of robustness to mistranslations, and describe possible pathways of the code evolutionary expansion
1. Novozhilov, Wolf, Koonin: Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape. Biology Direct 2007, 2:24. doi:10.1186/1745-6150-2-24
2. Novozhilov, Koonin: Exceptional error minimization in putative primordial genetic codes. Biology Direct 2009, 4:44. doi:10.1186/1745-6150-4-44.
3. Koonin, Novozhilov: Origin and evolution of the genetic code: The universal enigma, IUBMB Life, 61(2), 99-111, 2009

250.

18.03.2010

Yulia Medvedeva

Intergenic, gene terminal, and intragenic CpG islands in the human genome

Recently, it has been discovered that the human genome contains hundred times more transcription start sites (TSSs) then protein-coding genes. At least part of those TSSs associated with non-coding RNA of different functions. Regulatory regions related to transcription of such RNAs are poorly studied. Some of these regulatory regions may be associated with CpG islands located far from transcription start-sites of any protein coding gene. The human genome contains many such CpG islands; however, until now their properties were not systematically studied.
We studied CpG islands located in different regions of the human genome using methods of bioinformatics and comparative genomics. We have observed that CpG islands have a preference to overlap with exons, including exons located far from TSS of the gene, but usually extend well into introns. Synonymous substitution rate of CpG-containing codons becomes substantially reduced in regions where CpG islands overlap with protein-coding exons, even if they are located far downstream from TSSs, which means that selection pressure on the nuclear acids level exists in CpG islands. CAGE tag analysis displayed frequent transcription start sites in all CpG islands, including those found far from transcription start sites of protein coding genes. Computational prediction and analysis of published ChIP-chip data revealed that CpG islands contain an increased number of sites recognized by Sp1 protein. CpG islands containing more CAGE tags usually also contain more Sp1 binding sites. This is especially relevant for CpG islands located in 3' gene regions. Various examples of transcription, confirmed by mRNAs or ESTs, but with no evidence of protein coding genes, were found in CAGE-enriched CpG islands located far TSS of any known protein coding gene.
Summarizing, CpG islands located far from transcription start sites of protein coding genes have transcription initiation activity and display Sp1 binding properties. In exons, overlapping with these islands, the synonymous substitution rate of CpG containing codons is decreased. This suggests that such CpG islands are involved in transcription initiation, possibly of some non-coding RNAs.

251.

2.04.2010

Mikhail Spivakov, Charles Girardot, Javier Herrero, Junaid Akhtar, Gautier Koscielny, Eileen Furlong and Ewan Birney

Individual variation at gene regulatory sequences: lessons from Drosophila mesoderm

Using a combination of wet-lab and in silico approaches, we are studying the regulatory logic of Drosophila developmental enhancers. Data on individual sequence variation in D. melanogaster has recently become available and we have started taking advantage of it to understand the rules underlying the sequence and positioning of transcription factor binding sites within TF-bound regions. I will present the preliminary findings of this analysis and discuss how the data on intra-species variation can complement the more traditional evo-devo approaches based on the comparison of genotype and phenotype across species.

252.

15.04.2010

Светлана Бурлак

Реконструкция праязыков: как это делается

Если какие-то языки родственны друг другу, это значит, что когда-то существовал их общий праязык. Можем ли мы что-то узнать о нем, используя данные ныне существующих языков, и если да, то что? И как оценить степень достоверности получаемых при этом результатов? И, наконец, почему праязык вообще приходится реконструировать - может быть, его надо искать в письменных памятниках или где-нибудь в труднодоступных местах, где он сохранился, как "живое ископаемое"? В докладе будут даны ответы на эти вопросы - и на все другие, которые захотят задать слушатели.

253.

6.05.2010

Павел Певзнер

Multiplex De Novo Sequencing of Antibiotics

Sequencing antibiotics, once a heroic effort, remains time-consuming and error-prone. Most antibiotics represent cyclic nonribosomal peptides (NRPs) that contain nonstandard amino acids. Moreover, the dominant technique for sequencing antibiotics (NMR) requires large amounts (milligrams) of highly purified materials that, for most compounds, are nearly impossible to obtain. Therefore, there is a need for sequencing NRPs by tandem mass spectrometry from picograms of material. Since nearly all NRPs are produced as related analogs by the same microorganism, we develop a mass spectrometry approach for sequencing all related peptides at once (in difference from the existing approach that analyzes individual peptides). Our results suggest that the current experimental protocol for sequencing antibiotics (and other NRPs) should be changed. Instead of attempting to isolate and NMR-sequence the most abundant compound, one should acquire spectra of many related compounds and sequence all of them at once using mass spectrometry.
The talk will mainly focus on biological/technological challenges and applications of the developed algorithms for studies of bacterial cannibalism (collaboration with Pieter Dorrestein) and allelopathy (collaboration with Pedro Leao). If time allows I will also cover the computational aspects of this work.

254.

11.05.2010

Philipp Khaitovich

A molecular survey across lifespan: role of developmental timing in human brain evolution and aging

Phenotypically, humans stand out from other primate species in many respects. Here, we focus on two characteristics specific to humans: unique cognitive abilities and extended lifespan.
To approach the first question, we surveyed gene expression and metabolic changes taking place during postnatal development in prefrontal cortex and cerebellum of humans, chimpanzees, and rhesus macaques. In prefrontal cortex, but not in cerebellum, we find excess of human-specific changes. Evolutionary, these changes reflect shifts in developmental timing, rather than human innovations. Functionally, these changes predominantly affect calcium signaling, synaptic transmission and long-term potentiation pathways.
To approach the second question, we studied lifelong gene expression changes and their regulation in brains of humans and rhesus macaques. We find that in both species many expression changes observed in old age, including the previously reported down-regulation of neural genes, initiate in early childhood. Furthermore, both developmental and aging changes appear controlled by microRNA and transcription factors. Thus, difference in the rates of developmental changes might influence the pace of aging in the two species.

255.

20.05.2010

Юрий Вульф

Universal distribution of protein evolution rates as a consequence of protein folding physics

We explore the connection between folding robustness and the evolution rate of proteins using a coarse-grained off-lattice model. The distribution of the logarithm of the evolution rates across distinct "folds" exhibits a peak with a long tail on the low rate side and resembles the universal empirical distribution of the evolutionary rates. The results suggest that the universal distribution of the evolutionary rates of protein-coding genes is a direct consequence of the basic principles of protein folding physics.

256.

27.05.2010

Olga Vitek

An insight into computational and statistical mass spectrometry-based proteomics

The goal of proteomics is to identify, quantify and characterize proteins in biological mixtures. Tremendous progress in performance of mass spectrometers and in the associated experimental workflows, has resulted in substantial advances in this field. In particular, Liquid Chromatography coupled with Mass Spectrometry (LC-MS) can be used for global genome-wide discovery-oriented research. Alternatively, targeted Selected Reaction Monitoring (SRM) is sensitive and specific, but requires prior knowledge of the targets and their biochemical characteristics. Measurements generated by these workflows are complex and large-scale, and are subject to stochastic variation and uncertainty in interpretation. Computational and statistical methods are therefore key for their accurate interpretation and for experimental design.
This talk focuses on computational and statistical methods and tools for mass spectrometry-based proteomics. We briefly introduce proteomic workflows and their biological applications. We then discuss computational challenges and state-of-the-art solutions for identification, quantification, and characterization of peptides and proteins from mass spectra, and highlight opportunities and resources for bioinformatics research in this field.
The talk is a short version of the tutorial that will be offered in July 2010 in conjunction with ISMB.
Introductory reading: O. Vitek. "Getting started in computational mass spectrometry-based proteomics". PLoS Computational Biology, 5(5), 2009.

257.

10.06.2010

Константин Мишуровский

Бронзовые голоса Московского Кремля

Нынешнее собрание кремлевских колоколов постепенно формировалось на протяжении пяти столетий. Главная соборная колокольня страны с ее колокольным набором за это время претерпела множество изменений - как созидательных, так и разрушительных. Многие из этих перемен и утрат коснулись и колокольного звона: его порядка, состава, устроения - и, как следствие, его гармонического благозвучия.
Изменчивость, и даже некоторая неосмысленность кремлевского звона была обусловлена спецификой задач, стоящих перед кремлевскими звонарями. Кремлевский звон воспринимался, прежде всего, как некая звуковая среда, взаимодействующая с архитектурой древнего Кремля нашей Столицы. Об этом, в частности, свидетельствуют отзывы иностранных путешественников, которые восхищались мощью и насыщенностью звона в Кремле, но почти никогда не касались каких-либо музыкальных особенностей этого звона.
Многие колокола Ивана Великого - это прежде всего, исторические раритеты, материальные свидетельства важных государственных и церковных событий, а не просто звучащие инструменты для совершения звона. Нам не известно, чтобы когда-либо существовал единый проект по упорядочиванию или целенаправленному созиданию колокольного подбора на Иване Великом: каждый новоотлитый колокол лишь пополнял уже существующий разнородный состав исторических колоколов, и это постоянно усиливало многоплановость кремлевского звона.
В нынешнем докладе представлены результаты исследований гармонических свойств колоколов, сохранившихся до наших дней на Иване Великом. Методика анализа колокольных созвучий, разработанная докладчиком, позволяет наглядно представить даже для неподготовленных слушателей звуковые компьютерные модели элементарных тонов, входящих в многоголосие каждого колокола. Также применяется компьютерное сопоставление одиночных звукозаписей (сэмплов) разных колоколов Кремля. Построение многоканальных "звонов" дает возможность, в свою очередь, анализировать сочетания колоколов в различных звонах. Результаты анализа дают возможность убедиться, что на Иване Великом сохранились колокола, по меньшей мере, трех различных гармонических типов, и не существует какого-либо единого их звукоряда.
Восприятие колокольного звона зависит, во многом, от архитектурных особенностей колокольного здания. Комплекс Успенской звонницы и Ивановского столпа представляет собой, в этом контексте, беспрецедентно сложное колоколонесущее здание. Ивановский столп, с его тремя ярусами, не дает возможности слушателям воспринимать звучание всех колоколов одновременно. Два первых яруса представляют собою в плане кольцеобразные площадки с развешанными по периметру восьмерика колоколами. В середине "кольца" - глухая кладка: находясь на какой-либо стороне от Ивановского столпа, слушатели не могут отчетливо слышать колокола, расположенные на противоположной стороне. Третий ярус колокольни отгорожен от слушателей, стоящих внизу, сплошным парапетом: это затрудняет распространение колокольных звуков с верхней площадки. Успенская звонница, на которой расположены три больших колокола, имеет сложную конфигурацию, вследствие чего, тяжелые благовестники расположены в изолированных друг от друга пролетах.
В силу исторических причин, кремлевский звон находится в состоянии непрекращающегося развития. Нынешние технологические возможности позволяют предварительно моделировать колокольные звоны, чтобы впоследствии применять полученные знания на практике.

258.

9.12.2010

A. Kel

Flexibility of regulatory code. Application in evolution and diseases

The multiplicity of cellular conditions for eukaryotic genes to be expressed gives rise to the polyfunctionality of the structure of their transcription regulatory regions. I propose a "fuzzy puzzle" hypothesis of organization of transcription regulatory code, which allows to encode multiple regulatory messages in the same DNA sequence in the regulatory regions of eukaryotic genes. The structure of regulatory sequences on one hand and the specific features of transcription factors on the other hand provide a possibility to encode several regulatory programs within one regulatory region. It is known that each transcription factor has the ability to bind to a variety of different DNA sites. This is maintained by flexible mechanisms of DNA-protein interactions, when DNA conformation rather than the particular sequence context often plays the major role in selection of DNA targets. In addition, the ability of TFs to operate through a so-called "induced fit" mechanism (when a TF becomes finally structured only upon interaction with DNA; Frankel and Kim, 1991) greatly relaxes the restrictions from binding to various DNA sites. Besides that, the protein-protein interactions between different transcription factors in the multiprotein regulatory complexes can stabilize low-energy protein-DNA contacts thus additionally widen the variety of target sites for particular transcription factors. The huge diversity of transcription factors functioning in the living cells multiplied by the wide choice of target sites for each TF gives rise to a precondition to form multiple alternative DNA-protein complexes on the same gene regulatory region. As a result extremely complex patterns of gene expression are observed.I will discuss how, I think, the "fuzzy puzzle" breaks off the evolutional limitations on multicellular organization and how it becomes the basis for a new, very effective mechanism of evolution. I also will discuss the application of the "fuzzy puzzle" hypothesis in understanding mechanisms of complex diseases and possible ways how to battle them.

259.

17.02.2011

С.А.Вакуленко*, О.Радулеску

Генетические сети со сложным устойчивым поведением

Будут рассмотрены генетические сети специальной структуры, которые как мы полагаем, могут описывать взаимодействие между микроРНК и факторами транскрипции. Такие free scale структуры возникают в результате естественной эволюции. Мы покажем, что динамика сетей порождает сложные аттракторы и бифуркации; нетривиальная бифуркация возможна в результате мутации в одном гене; с другой стороны, возможно, что мутации во многих генах не влияют на структуру аттрактора. Структуры, образованные такими сетями, могут быть устойчивы под влиянием больших флуктуаций морфогенетических полей
презентация

260.

24.02.2011

Шамиль Сюняев

Brigham and Women's Hospital, Harvard Medical School

Эволюционная и статистическая генетика редких аллелей человека и секвенирование полных экзомов

Вместо резюме, цитата из частной переписки с докладчиком: "А экзомов у нас уже тысячами".

261.

3.03.2011

Г.С.Старостин

Центр компаративистики и Институт восточных культур и античности Российского государственного гуманитарного университета

Принципы построения глобального лексикостатистического древа языков мира

Из всех теоретически возможных математических методов построения генеалогического древа языков мира наиболее удобным и (на данный момент) единственным универсально применимым является лексикостатистический, основанный на процентах совпадения (т. е. в лучшем случае - доказанного общего исторического происхождения, в худшем - хотя бы общего звукового сходства) базисной лексики в сопоставляемых языках. Тем не менее, лексикостатистическая методика, особенно в части ее применения для абсолютной датировки времени распада праязыков (глоттохронология), до сих пор не является общепринятой среди компаративистов. В докладе будет рассказано о некоторых вариантах лексикостатистики, используемых для построения лингвистических деревьев, и о последних достижениях по усовершенствованию методики, во многом снимающих высказанную критику.

262.

17.03.2011

А. Я. Мулкиджанян

НИИ физико-химической биологии им. А.Н. Белозерского МГУ, School of Physics, University of Osnabruck, Germany

Фотосинтезирующие, ячеистые отложения сульфида цинка - колыбель жизни?

Mы пытаемся реконструировать химические особенности как первых организмов, так и их окружения, анализируя свойства, общие для всех форм жизни.
Так, например, молекулы РНК и ДНК построены из нуклеотидов, которые чрезвычайно светоустойчивы. Это может означать, что отбор природных нуклеотидов происходил на солнечном свету, интенсивность которого в ультрафиолетовой (УФ) области - до появления кислорода и озона в атмосфере - была в десятки раз выше, чем сейчас. Результаты нашего кинетического моделирования показывают, что подобный УФ-отбор должен был способствовать накоплению РНК-подобных, фотостабильных полимеров.
Анализ содержания переходных металлов в современных клетках указывает, что образование первых клеток происходило в средах, обогащенных ионами цинка. Равновесная концентрация цинка в водах древнего, еще бескислородного океана должна была быть, однако, очень низкой, менее одного пикомоля (<10-12 М). Значительные количества цинка и других переходных металлов могли накапливаться только у геотермальных источников. Можно предположить, что на древней Земле очень горячие металлоносные жидкости могли, благодаря высокому давлению углекислой атмосферы, доставлять ионы цинка и на поверхность первых континентов, так что пористые, ZnS-содержащих осадки покрывали освещенные участки вокруг горячих источников. Будучи способными надолго запасать энергию света, кристаллы сульфида цинка, известного как "фосфор" (от слова "фосфоресценция"), являются наиболее эффективными природными фотокатализаторами широкого спектра действия. Частицы ZnS способны осуществлять абиогенный фотосинтез - восстанавливая углекислоту до моно-, ди-, три- и тетракарбоновых кислот с с эффективностью до 80%, а также катализировать различные синтетические реакции, включая фотополимеризацию.
Предлагаемый эволюционный сценарий позволяет делать проверяемые предсказания о роли ионов цинка в современных организмах. Он, в частности, предсказывает повышенное содержание цинка в наиболее древних клеточных структурах. Анализируя общедоступные базы данных, удается подтвердить эти предсказания, что может свидетельствовать о том, что эволюция первых форм жизни, вплоть до стадии общего предка всех клеточных организмов, проходила в фотосинтезирующих, ячеистых, богатых сульфидом цинка отложениях вокруг горячих источников. Ячеистые ZnS могли не только фотосинтезировать, но и служить матрицами для синтеза первых биополимеров, а также содействовать фотоселекции РНК-подобных макромолекул, защищая их от фотодиссоциации.
Mulkidjanian, A. Y., Koonin, E. V., Makarova, K. S., Mekhedov, S. L., Sorokin, A., Wolf, Y. I., Dufresne, A., Partensky, F., Burd, H., Kaznadzey, D., Haselkorn, R., and Galperin, M. Y. (2006), 'The cyanobacterial genome core and the origin of photosynthesis', Proc Natl Acad Sci U S A, 103, 13126-31.
Mulkidjanian, A. Y., Galperin, M. Y., Makarova, K. S., Wolf, Y. I., and Koonin, E. V. (2008b), 'Evolutionary primacy of sodium bioenergetics', Biol Direct, 3, 13.
Mulkidjanian, A. Y. (2009), 'On the origin of life in the zinc world: 1. Photosynthesizing, porous edifices built of hydrothermally precipitated zinc sulfide as cradles of life on Earth', Biol Direct, 4, 26.
Mulkidjanian, A. Y. and Galperin, M. Y. (2009), 'On the origin of life in the zinc world. 2. Validation of the hypothesis on the photosynthesizing zinc sulfide edifices as cradles of life on Earth', Biol Direct, 4, 27.
Mulkidjanian, A. Y. and Galperin, M. Y. (2010b), 'On the abundance of zinc in the evolutionarily old protein domains', Proc Natl Acad Sci U S A, 107, E37.

263.

24.03.2011

А. Фаворов

Университет Джона Хопкинса, ГосНИИГенетика и ИОГен

GenometriCorr: Геометрическая корреляция или независимость геномных интервальных аннотаций

Интервальные аннотации, или разметки, - это множества интервалов на хромосомах, например, все гены. Независимость или корреляция двух разметок позволяет косвенно судить о степени взаимосвязи того, что эти разметки представляют (тривиальный пример: гены и CpG острова).
При всей интуитивности понятия соседства, выбор статистики, характеризующей склонность интервалов одной разметки появляться вблизи от интервалов другой, не всегда очевиден. Мы придумали некую коллекцию (естественно, на полноту она не претендует) возможных статистик и собрали их в пакет (package) GenometriCorr для языка R.
В процессе семинара предполагается представить эту коллекцию и рассказать про изготовление пакета.
презентация

264.

18.04.2011

Г.А. Савостьянов

Институт эволюционной физиологии и биохимии им. И.М. Сеченова РАН, Санкт-Петербург

Развитие многоклеточности и пространственная организация тканей

Таким образом, гистионы и клеточные сети представляют собой самостоятельные, упускаемые до сих пор уровни биологической организации. Предложенные представления составляют основу номогенетической теории, позволяющей описывать, прогнозировать и измерять развитие тканей и, в пределе, управлять им. Рассматриваются топологический, геометрический и социальный аспекты биологии развития, лежащие вне рамок геномики и протеомики. Предлагается концепция предсказательной теории становления многоклеточности и развития биологических тканей (клеточных пластов). В основе концепции лежат две идеи: 1) идея элементарных единиц многоклеточности - гистионов; 2) идея тканей как регулярных клеточных сетей (решеток), возникающих в результате полимеризации гистионов.
Гистионы - это клеточные группы из двух и более клеток, возникающие в результате их специализации и интеграции (т.е. разделения труда между клетками). Предлагается формализованная теория такого разделения, позволяющая вычислять состав и структуру множества клеточных групп и строить их модели. Показано, что развитие гистионов подчиняется периодическому закону, а их классификация принимает вид периодической таблицы, параметры которой имеют биологический смысл и пригодны для измерения прогрессивного развития. Впервые объясняется возникновение стволовости как неизбежного следствия разделения труда.
Полимеризация гистионов дает возможность вычислять варианты строения клеточных сетей как регулярных решеток различной размерности. На этом основании построены семейства двух- и трехмерных топологических и геометрических моделей тканевой структуры, осуществлена их компьютерная визуализация и проведена экспериментальная верификация. Развитая теория позволяет предсказывать и затем обнаруживать неизвестные ранее топологические варианты гистоархитектуры эпителиев и прогнозировать их изменения в нормальном развитии и при патологии. Такие изменения рассматриваются как смена структуры гистионов и клеточных решеток по типу фазовых переходов.
http://members.tripod.com/~Gensav

265.

25.04.2011

А.М. Райгородский

Мехмат МГУ, ФИВТ МФТИ, Яндекс

Модели веб-графов, их математические характеристики и приложения

За последние полтора десятка лет появилось множество математических моделей, призванных "адекватно" описывать рост сети интернет. В нашем докладе мы расскажем историю развития науки об этих моделях, приведем ряд совсем свежих результатов в области и обсудим различные практические применения подобных результатов, относящиеся к поиску и анализу данных в интернете.

266.

19.05.2011

И.В.Кулаковский*, А.А.Белостоцкий, А.С.Касьянов, Ю.А.Медведева, И.А.Елисеева, В.Ю.Макеев

ИМБ им. В.А. Энгельгардта РАН*, ФГУП ГосНИИгенетика, ИОГен им. Н.И. Вавилова РАН, Институт белка РАН

Шаблоны предпочтительных взаимных расстояний для комбинаций участков ДНК, связывающих факторы регуляции транскрипции

Механизмы регуляции экспрессии генов высших эукариот до сих пор недостаточно понятны. Нет четкого понимания, каким образом регуляторные модули, т.е. сегменты ДНК, содержащие характерные последовательности, специфически распознаваемые регуляторными белками, определяют формирование белкового комплекса, контролирующего экспрессию генов в специфических тканях или условиях. Одним из подходов при изучении регуляторных модулей является анализ "композитных элементов", представляющих собой комбинации участков связывания различных транскрипционных факторов, расположенные на заданных расстояниях друг от друга.
Было показано существование ряда предпочтительных расстояний локализации сайтов связывания некоторых пар транскрипционных факторов.
В данной работе мы попытались показать, что этот феномен является достаточно общим, по крайней мере для белков, задействованных в регуляции ответа на гипоксию в клетках человека. Мы предполагаем, что информация о правильном взаимном позиционировании сайтов связывания может быть использована для идентификации функциональных сайтов связывания при анализе как экспериментальных данных так и предсказаний in silico.

267.

30.05.2011

П. Мазин*, М.С. Гельфанд, Ф. Хайтович

ИППИ РАН и MPG-CAS Partner Institute of Computational Biology, Шанхай

Изменения сплайсинга в ходе развития мозга у приматов

Несмотря на интенсивные исследования, молекулярные механизмы, ответственные за анатомические, поведенческие и когнитивные отличия человека от его ближайших эволюционных родственников - обезьян, остаются неясными. В данной работе мы использовали методы секвенирования нового поколения, чтобы исследовать разницу в сплайсинге между человеком, шимпанзе и макакой в двух областях головного мозга (префронтальной коре и мозжечке) и двух возрастах (в новорожденных и взрослых). Мы показали, что из более чем 10 тысяч генов, экспрессирующихся на детектируемом уровне, более 12% имеют значимые различия в сплайсинге белок-кодирующих экзонов между видами, в то время как только 3.5% и 3% сплайсируются по-разному в разных областях мозга или в разных возрастах, соответственно. Интересно, что 108 генов (1.1%) меняют сплайсинг с возрастом по-разному в разных видах, и в более чем в половине из них изменения человек-специфичны. Функциональный анализ показал, что эти 108 генов обогащены функциями, связанными с развитием и регуляцией развития высшей нервной системы. Таким образом, наши результаты показывают, что альтернативный сплайсинг вовлечен в межвидовые отличия у приматов и, может быть, частично ответственен за специфичные для человека особенности в развитии и функционировании головного мозга.

268.

19.05.2011

Philipp Khaitovich

MPG-CAS Partner Institute of Computational Biology, Шанхай

Insights into human biology from RNA-seq data

In this seminar I will discuss use for RNA-seq data to investigate on two biological questions. First, we studied expression, evolution and functions of a putative miRNA that appeared and undergone copy number amplification on the human evolutionary lineage after the split from the common ancestor of humans and chimpanzees. Second, we determined the extent of antisense expression in the human brain throughout the lifespan and investigated its influence on expression of the corresponding sense genes.

269.

15.12.2011

I.S. Povolotskaya*, F.A. Kondrashov, P.K. Vlasov

Centre for Genomic Regulation, Barcelona

Stop codons in bacteria are not selectively equivalent

Many global patterns in molecular evolution are defined by the genetical code, including rates of nonsynonymous and synonymous evolution, synonymous codon usage and the optimality of the genetic code. The evolution and usage of stop codons, however, have not been rigorously studied with the exception of coding of non-canonical amino acids. Here, we study the rate of evolution and genomic frequency of TAA, TGA and TAG canonical stop codons in bacterial genomes. We find that stop codons evolve slower than synonymous sites, suggesting the action of weak negative selection. However, the frequency of stop codon usage relative to genomic nucleotide content indicates that this selection regime is not straightforward. The usage of TAA and TGA stop codons is GC-content dependent, with TAA decreasing and TGA increasing with GC content, while TAG frequency is independent of nucleotide content. We thus modeled stop codon usage and nucleotide content with mutation rates and two selection on nucleotide content and TAG frequency as parameters. We found that the relationship between stop codon frequencies and nucleotide content cannot be explained by mutational biases or selection on nucleotide content. However, with weak nucleotide content-dependent selection on TAG, -0.5 < Nes < 1.5, the model fits all of the data and recapitulates the lack of a relationship of TAG and nucleotide content. For biologically plausible rates of mutations we show that, in bacteria, TAG stop codon is universally associated with lower fitness, with TAA being the optimal stop codon for G-content < 16% while for G-content > 16% TGA has a higher fitness than TAG.

270.

16.12.2011

K. Bozek

Max Planck Institute for Computer Sciences, Saarbrucken (now at PICB, Shanghai)

Physicochemical and structural properties determining HIV-1 coreceptor usage

The entry of the human immunodeficiency virus (HIV) into human cells is a multi step process involving binding to one of the cell-surface coreceptors CCR5 or CXCR4. The binding site of the coreceptor is partially situated on the third variable region (V3) of gp120 viral protein. Whether a virus can bind to CCR5 only (R5 virus), to CCR5 and CXCR4 alternately (dual virus) or to CXCR4 only (X4 virus) is determined predominantly by the sequence and structure of this region. The phenotype related to the virus coreceptor usage is termed viral tropism. While in the early, asymptomatic stages of infection mainly R5 viruses are observed, progression to AIDS is often correlated with the emergence of X4 viruses. The relationship of HIV tropism with disease progression and the recent development of CCR5-blocking drugs underscore the importance of monitoring virus coreceptor usage. As an alternative to costly phenotypic assays, computational methods aim at predicting virus tropism based on the V3 loop sequence of the virus gp120 protein and on its structure. The major drawback of the binary sequence representation is that it offers insights into the physicochemical properties of amino acids and their spatial arrangement in the binding site that determines coreceptor binding.
Here we present a structural descriptor of the V3 loop encoding the physicochemical properties of the loop together with their locations on the protein structure. We map 54 amino acid indices representing the physicochemical properties of amino acids onto the V3 loop structure and use machine learning methods to extract the features which are the most informative for coreceptor usage. The extracted set of features represents a small fraction of the initial feature set and models based on this set attain higher prediction accuracy with decreased computational load.
Our descriptor used as input to the support vector machine predicting tropism shows a statistically significant improvement over the binary representation of the V3 sequence. At the specificity of 11/25 rule a sensitivity of 69% was achieved, comparing favorably with the 62% sensitivity of sequence-based prediction. In addition to the data inferred from lab-cloned viruses (clonal data) we assessed the predictive power of our method on the clinically derived 'bulk' sequence data of patient samples and obtained a statistically significant 3% improvement over the sequence representation evaluated using receiver operating characteristic (ROC) curve. We also demonstrated the capacity of our method to predict the outcome of the coreceptor blocker-based therapy by applying it to 53 samples of patients undergoing Maraviroc therapy.
Our structural descriptor affords direct interpretation of the features of the V3 loop relevant for viral tropism by pointing to specific physicochemical properties of amino acids in specific parts of the loop being predictive of coreceptor usage. The analysis of features important for the classification pointed to two loop regions and their physicochemical properties playing determining role in the coreceptor usage. The regions are located on the opposite strands of the loop stem; and show predominantly structure, hydrophobicity and charge-related properties. These regions are in close proximity in the bound conformation of the loop forming a potentially determinant site for the coreceptor usage. The resulting method offers higher performance over sequence-based method with a comparable efficiency and a direct interpretation of structural and physicochemical determinants of tropism.