|
IV. Ecosmomics: Independent Complex Network Systems, Computational Programs, Genetic Ecode Scripts2. The Innate Affinity of Genomes, Proteomes and Language Waseem, Muhammad, et al.. Language-independence of DisCoCirc’s Text Circuits: English and Urdu. arXiv:2208.10281. An Oxford University Computational Intelligence team, including Bob Coecke continue to finesse reasons why genomic and linguistic descriptive phases can be found to have a common character which arises from a communicative reality. DisCoCirc is a newly proposed framework for representing the grammar and semantics of texts using compositional, generative circuits. While it advances the Categorical Distributional Compositional (DisCoCat) framework, it achieves radical new features toward eliminating grammatical differences between languages. In this paper we suggest that this is indeed the case for restricted fragments of English and Urdu. There is a simple translation from English grammar to Urdu grammar, and vice versa. We then show that differences in grammatical structure between English and Urdu - primarily relating to the ordering of words and phrases - vanish when passing to DisCoCirc circuits. (Abstract) Wilson, Erin, et al. Genotype Specification Language. ACS Synthetic Biology. 5/6, 2016. We cite this entry by a nine member team including Darren Platt of Amyris Biotechnologies, Emeryville, CA as an example of how the nascent field of genoinformatics or genolinguistics is moving to respectfully reinvent a much better life, environment and sustainable planet. See also Double Dutch: A Tool for Designing Combinatorial Libraries of Biological Systems by Nicholas Roehner in this same issue. We describe here the Genotype Specification Language (GSL), a language that facilitates the rapid design of large and complex DNA constructs used to engineer genomes. The GSL compiler implements a high-level language based on traditional genetic notation, as well as a set of low-level DNA manipulation primitives. The language allows facile incorporation of parts from a library of cloned DNA constructs and from the “natural” library of parts in fully sequenced and annotated genomes. GSL was designed to engage genetic engineers in their native language while providing a framework for higher level abstract tooling. To this end we define four language levels, Level 0 (literal DNA sequence) through Level 3, with increasing abstraction of part selection and construction paths. GSL targets an intermediate language based on DNA slices that translates efficiently into a wide range of final output formats, such as FASTA and GenBank. (Abstract) Witzany, Gunther. Biocommunication and Natural Genome Editing. Dordrecht: Springer, 2010. A book-length exposition by the Austrian philosopher and editor (see next) of the linguistic turn to perceive living systems, across many nested whole scales, as most characterized by text-like, informational qualities. In an initial chapter an historic sequence of worldviews from “monistic-organismic” to “pluralistic-mechanistic” to this nascent “organic-morphological” phase is laid out. (Reading this synopsis, one is struck by an apparent Right to Left to Whole Brain passage for humankind that we retrace in our own lives.) The next chapters dutifully span flora and fauna from viral, genomic, fungal, bacterial, cellular, and honey bee realms so as to highlight their dialogic, quorum-sensing essence. Altogether with other recent postings (e.g. Beckner, et al) a sense of a natural genesis may accrue that is intrinsically textual as genetics and language meld in a singular evolutionary emergence. Since both modes are being found to express a self-organizing complex dynamics, this phenomenal propensity itself could take on the guise of a universe to human genetic code. Current molecular biology as well as cell biology investigates its scientific object by using key terms such as genetic code, code without commas, misreading of the genetic, coding, open frame reading, genetic storage medium DNA, genetic information, genetic alphabet, genetic expression, messenger RNA, cell-to-cell communication….All these terms combine a linguistic and communication theoretical vocabulary with a biological one. In this book I try to introduce and appropriate model to exemplify this vocabulary (which is used in biology all the time without people thinking about it), on the basis of explanation and understanding of a linguistic action, the great variety of communicative actions. (v) Wu, Fang, et al. Integration of pre-trained protein language models into geometric deep learning networks. Communications Biology. 6/876, 2023. Westlake University, Hangzhou, China, Yale University, and Tsinghua University, Beijing computational biologists provide another example of this frontier cross-adoption of protein linguistics with AI neural net contents. Our comment for these contributions is that as genetic and metabolic processes are able to be grammatically parsed, so to say, they gain a common textual basis. As a result, a wide and deep natural narrative is being realized in our midst written in an ecosmome to geonome code script. See also ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training by Le Zhuo, et al at arXiv:2403.07920 for more work in this regard. Geometric deep learning has achieved much success in defining 3D structures of large biomolecules. Meanwhile, protein language models trained on 1D sequences apply to a broad range of applications. In this work, we integrate the knowledge learned by protein language models into geometric networks and evaluate a variety of protein representation learning benchmarks. The incorporation of protein language knowledge enhances geometric networks’ capacity and can be generalized to complex tasks. (Excerpt) Wu, Yanying and Quanlong Wang. A Categorical Compositional Distributional Modelling for the Language of Life. arXiv:1902.09303. Oxford University computer neuroscientists are able to treat and parse protein biology in a linguistic manner by use of this title computational insight achieved by the Oxford computer science group, search Bob Coecke. We log in amongst concurrent papers which establish this deep affinity between genetic and literary domains, broadly conceived, as a later evolutionary stage of one, same natural script. The Categorical Compositional Distributional (DisCoCat) Model is a powerful mathematical method for composing the meaning of sentences in natural languages. Since we can think of biological sequences as the "language of life", here we apply this model to see if we can obtain new insights and a better understanding of life’s language. We choose to focus on proteins as their linguistic features are more prominent as compared with other macromolecules such as DNA or RNA. Thus, we treat each protein as a sentence and its constituent domains as words. The meaning of a word or the sentence is its biological function, and the arrangement of protein domains corresponds to the syntax. Putting all those into the DisCoCat frame, we can "compute" the function of a protein with grammar rules that combine them together. (Abstract excerpts) Xiao, Yi, et al. Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecules. arXiv:2403.13830. Chinese Academy of Sciences AI researchers provide an example of how readily language-based content can be assimilated by computational methods as they are then employed to parse protein linguistics. Altogether a common natural narrative from nucleotides to nouns is being read and written anew, With recent trend in machine learning and natural language processing is aimed at building multimodal frameworks to jointly model molecules with textual domain knowledge. In this paper, we present the first systematic survey of this integrative endeavor. We focus on advances in text-molecule alignment methods, categorizing current models into two groups based on their architectures and listing relevant pre-training tasks. We next delve into the utilization of large language models and prompting techniques for molecular tasks and present significant applications in drug discovery. (Excerpt) Xin, Lei, at al. Artificial Intelligence for Central Dogma-Centric Multi-Omics. arXiv:2412.12668. We note this work by eleven analysts at Peking, Wuhan, Hunan, Nanjing Universities, China, University of Michigan Medical School and Harvard Medical School, (Chinese and American cooperation) because it is based on a growing appreciation of a wide array of organismic genetic-like encodings. With the development of high-throughput sequencing platforms, many omic methods such as genomics, metabolomics, and transcriptomics are being applied to disease genetics research. However, biological data often exhibits high dimensionality and significant noise. To address this, current studies have turned to artificial intelligence for multi-omics research. This paper reviews the mathematical strategies for integrating multi-omics data, applications of AI and deep learning, foundational theories, and new technologies. We seek to provide practical guidance for computational biologists to effectively utilize AI-based multi-omics machine learning algorithms. (Excerpt) Zaccagnino, Rocco, et al. Testing DNA Code Words Properties of Regular Languages. Theoretical Computer Science. 608/84, 2015. In a special issue From Computer Science to Biology and Back, we cite this entry by University of Salerno informatics researchers as an example of how DNA nucleotides are finding a common utility across disparate genetic, linguistic and computational domains. One aspect of DNA Computing is the possibility of using DNA molecules for solving some “complicated” computational problems. In this context, the DNA code word design problem assumes a fundamental role: given a problem encoded in DNA strands and biochemical processes, the final computation is a concatenation of the input DNA strands that must allow us to recover the solution of the given problem in terms of the input (unique decipherability). Thus the initial set of DNA strands must be a code. In addition, it should satisfy some restrictions, called here DNA properties, in order to prevent them from interacting in undesirable ways. So a new interest towards the design of efficient algorithms for testing whether a language X is a code, has arisen from (wet) DNA Computing, but, as far as we know, only when X is a finite set. In this paper we provide an algorithm for testing whether an infinite but regular set of words is a code that avoids some DNA properties among unwanted intermolecular and intramolecular hybridizations. (Abstract) Zambon, A., et al. Structure of the space of folding protein sequences defined by large language models. Physical Biology. January, 2024. We cite this entry by Center for Complexity and Biosystems, University of Milan researchers as another instance of this mid 2020s cross-integrity of metabolic methods with AI computational network capabilities. Proteins populate a sequence space whose geometrical structure guides their natural evolution. By way of transformer models, we examine the protein landscape as an effective energy of sequence foldability, an approach similar to optimization methods in machine learning. We then employ statistical mechanics algorithm to explore regions with high local entropy in relatively flat landscapes. Our work thus combines machine learning and statistical physics so to provide new insights into the exploration of sequence landscapes where wide, flat minima coexist alongside narrower minima. (Excerpt) Zhang, Hong-Yu. The Evolution of Genomes and Language. EMBO Reports. 7/8, 2006. EMBO = European Molecular Biology Organization. A brief note, but one which cites a growing convergence between the “information sciences” of genetics and linguistics. Of especial interest is the author’s comparison with historic inscriptions of Chinese characters. Zolyan, Suren. From Matter to Form: The Evolution of the Genetic Code as Semio-poiesis. Semiotica. March, 2022. A senior Russian linguist (search) considers how a better perception of nature’s deeply pervasive, self-similar, genomic procreativity can be attained by a turn to and inclusion of an informative biosemiotic essence. We address issues of description of the origin and evolution of the genetic code from a semiotics standpoint. Developing the concept of code-poiesis introduced by (Marcello) Barbieri, a new idea of semio-poiesis is proposed. Such a recursive auto-referential processing of semiotic system could become a form of organization of the bio-world when notions of meaning are introduced into it. The description of the genetic code as a semiotic system (grammar and vocabulary) allows us to apply an internal reconstruction on the basis of heterogeneity so to explicate forms of coding and textualization. (Abstract excerpt) Zolyan, Suren. On the Context-Sensitive Grammar of the Genetic Codes. Biosystems. Volume 206, 2021. Into the 2020s, an Immanuel Kant Baltic Federal University, Kaliningrad, Russia senior linguist can identify deeper affinities between these two prime phases of nature’s generative and descriptive expressions. Further comparisons then include involve semiotic communications, alphabetic comparisons, nucleotide profiles and more. By a reflective view, it seems that whatever maelstroms may consume archaic nations (thinking with tanks), worldwise human intellects continue their quest to parse and read our gravid liferature endowment. We address the possibilities of the semiotic description of the genetic information as a dual and self-replicative correspondence between its biochemical substance and organization. Combining the principles of contextual dependence and arbitrariness leads to the conclusion that the genetic code's primary elements (nucleotides) can be considered as not only biochemical constants but as having a grammatical quality and informational content. We thus view the genetic phase as a language consisting of 1) units of the alphabet; 2) a vocabulary that includes meaningful items; and 3) context-sensitive rules for the formation of units based on grammatical categories. (Abstract excerpt)
Previous 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 Next
|
||||||||||||||||||||||||||||||||||||||||||||||
HOME |
TABLE OF CONTENTS |
Introduction |
GENESIS VISION |
LEARNING PLANET |
ORGANIC UNIVERSE |