![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
||||||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
IV. Ecosmomics: Independent Complex Network Systems, Computational Programs, Genetic Ecode Scripts2. The Innate Affinity of Genomes, Proteomes and Language Searls, David. A Primer in Macromolecular Linguistics. Biopolymers. 99/3, 2013. The philosophical geneticist (bio below) has been a prescient observer (search) that nature’s dual domains of informational nucleotides and literary discourse are innately similar in kind. This entry describes via graphic, evidential visuals their parallel, self-similar essence. The import is that if the relation could move from metaphor to analogy to factual, both the genetics and linguistics endeavors could much benefit from cross-applications of methods and analytic techniques. Polymeric macromolecules, when viewed abstractly as strings of symbols, can be treated in terms of formal language theory, providing a mathematical foundation for characterizing such strings both as collections and in terms of their individual structures. In addition this approach offers a framework for analysis of macromolecules by tools and conventions widely used in computational linguistics. This article introduces the ways that linguistics can be and has been applied to molecular biology, covering the relevant formal language theory at a relatively nontechnical level. Analogies between macromolecules and human natural language are used to provide intuitive insights into the relevance of grammars, parsing, and analysis of language complexity to biology. (Abstract) Searls, David. Reading the Book of Life. Bioinformatics. 17/7, 2001. A report on a conference between geneticists and linguists to explore the systematic affinities between the DNA molecular code and human language. Searls, David. The Language of Genes. Nature. 420/211, 2002. An affirmation that the molecular genetic code, as now studied by computer-based bioinformatics, is in fact a true language with its own grammar and syntax. And these techniques are also being used to explore the structures of literature. …nucleic acids may be said to be at about the same level of linguistic complexity as natural human languages.…genes do convey information, and furthermore this information is organized in a hierarchical structure whose features are ordered, constrained and related in a manner analogous to the syntactic structure of sentences in a natural language. (213) Searls, David. Trees of Life and of Language. Nature. 426/391, 2003. The same pattern occurs for the lineage of ancient languages and the reconstruction of evolutionary ancestors, whereby “philology recapitulates phylogeny.” Shabi, Uri, et al. Processing DNA Molecules as Text. Systems and Synthetic Biology. 4/3, 2011. Weizmann Institute of Science, Rehovot, Israel mathematicians, biochemists, and a cell biologist spell out an extensive, technically detailed consideration of the nucleotide genome in terms of, as if, a linguistic document. A paper in the next issue (4/4, 2010) “Creating Novel Protein Scripts beyond Natural Alphabets” by Anil Kumar, University of Toronto, and Vibin Ramakrishan, Rajiv Gandhi Centre for Biotechnology, similarly reinforces. Are we altogether at last verifying a true literal nature, as prior traditions well know, whose creative genetic program then manifests in kind at each emergent level? And could it now be passing into our conscious knowledge, indeed to commence a new era of “synthetic biology”? Polymerase Chain Reaction (PCR) is the DNA-equivalent of Gutenberg’s movable type printing, both allowing large-scale replication of a piece of text. De novo DNA synthesis is the DNA-equivalent of mechanical typesetting, both ease the setting of text for replication. What is the DNA-equivalent of the word processor? (227) Here we present a novel operation on DNA molecules,…and show that it provides a foundation for DNA processing as it can implement all basic text processing operations on DNA molecules including insert, delete, replace, cut and paste and copy and paste. (227) In this work we present a uniform framework for DNA processing that encompasses DNA edition, DNA synthesis, and DNA library construction. (228) Sheinman, Michael, et al. Evolutionary Dynamics of Selfish DNA Explains the Abundance Distribution of Genomic Sequences. Nature Scientific Reports. 6/30851, 2016. As an instance of genome complexity, with Anna Ramisch, Florian Massip, and Peter Arndt, MPI Molecular Genetics researchers draw upon physics and linguistics to finesse features from these realms. See Massip in the next section for more from this team. Circa 2016, genomes are commonly treated as a whole entity, which are then seen to have deep affinities to universal nonlinear systems before and after. Since the sequencing of large genomes, many statistical features of their sequences have been found. One intriguing feature is that certain subsequences are much more abundant than others. In fact, abundances of subsequences of a given length are distributed with a scale-free power-law tail, resembling properties of human texts, such as Zipf’s law. Despite recent efforts, the understanding of this phenomenon is still lacking. Here we find that selfish DNA elements, such as those belonging to the Alu family of repeats, dominate the power-law tail. Interestingly, for the Alu elements the power-law exponent increases with the length of the considered subsequences. Motivated by these observations, we develop a model of selfish DNA expansion. The predictions of this model qualitatively and quantitatively agree with the empirical observations. This allows us to estimate parameters for the process of selfish DNA spreading in a genome during its evolution. The obtained results shed light on how evolution of selfish DNA elements shapes non-trivial statistical properties of genomes. (Abstract)
Soares, Eduardo, et al.
Beyond Chemical Language: A Multimodal Approach to Enhance Molecular Property Prediction.
arXiv:2306.14919.
Seven IBM researchers posted in Rio de Janeiro, Brazil and San Jose, USA including Dmitry Zubarev first describe current approaches as this broad field of biomolecule parsings actively shifts to deep machine learning methods. See also Artificial Intelligence-aided Protein Engineering from Topological Data Analysis to Deep Protein Language Models at 2307.14587 for another instance. A number of technique proposals are then advanced going forward. Altogether such novel literacies add more evidence for an affine genetic and protein equivalence. Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development. (Excerpt) Sondka, Zbyslaw, et al.. COSMIC: a curated database of somatic variants and clinical data for cancer.. Nucleic Acids Research. 52/D1, 2024. Wellcome Sanger Institute geneticists describe the latest four year version of their extensive, actively used informational resource for treating this malady.
Steels, Luc. Analogies between Genome and Language Evolution. Pollack, J. et.al, eds. Proceedings of Artificial Life IX. Cambridge: MIT Press, 2004. The Vrije Universiteit Brussel computer scientist and SONY Paris AI laboratory director contributes to the welling comparison between these molecular and textual programmatic modes. The paper develops an analogy between genomic evolution and language evolution, as it has been observed in the historical change of languages through time. The analogy suggests a reconceptualisation of evolution as a process that makes implicit meanings or functions explicit. Suhr, Stephanie. Is the Notion of Language Transferable to the Genes? Dorries, Matthiaus, ed. Experimenting in Tongues. Stanford: Stanford University Press, 2002. From a volume on how metaphors inform scientific paradigms, a history of linguistic interpretations and analogies of the molecular genetic code. These two “information-trading processes” share much affinity, which springs from a long tradition of imagining nature as a book to be read and translated. Recursivity is indeed a universal phenomenon, as it shows in fractal pattern formation; it is an economical phenomenon as well creating complex variety – as for example the human brain – out of a few elements; and it is an important creative principle, which applies to areas beyond linguistics and information transmission. (60) Tavares, Ana, et al. DNA Word Analysis Based on the Distribution of the Distances Between Symmetric Words. Nature Scientific Reports. 7/728, 2017. We note in 2017 this paper by University of Aveiro, Portugal, medical and computational mathematicians as an example of how it has become common usage to consider genetic phenomena by way of similar linguistic features. We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. (Abstract excerpts) Turenne, Nicolas. On a Possible Similarity between Gene and Semantic Networks. arXiv:1606.00414. The University of Paris, INRA Science and Society bioinformatics researcher contributes to growing realizations, after decades of intimations since Jean Piaget and Roman Jakobson, that as similar self-organizing systems, the disparate realms of literature and genomes are necessarily one and the same natural testaments. In several domains such as linguistics, molecular biology or social sciences, holistic effects are hardly well-defined by modeling with single units, but more and more studies tend to understand macro structures with the help of meaningful and useful associations in fields such as social networks, systems biology or semantic web. A stochastic multi-agent system offers both accurate theoretical framework and operational computing implementations to model large-scale associations, their dynamics and patterns extraction. We show that clustering around a target object in a set of associations of object prove some similarity in specific data and two case studies about gene-gene and term-term relationships leading to an idea of a common organizing principle of cognition with random and deterministic effects. (Abstract)
Previous 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 Next
|
![]() |
|||||||||||||||||||||||||||||||||||||||||||||
HOME |
TABLE OF CONTENTS |
Introduction |
GENESIS VISION |
LEARNING PLANET |
ORGANIC UNIVERSE |