(logo) Natural Genesis (logo text)
A Sourcebook for the Worldwide Discovery of a Creative Organic Universe
Table of Contents
Introduction
Genesis Vision
Learning Planet
Organic Universe
Earth Life Emerge
Genesis Future
Glossary
Recent Additions
Search
Submit

IV. Ecosmomics: Independent, UniVersal, Complex Network Systems and a Genetic Code-Script Source

2. The Innate Affinity of Genomes, Proteomes and Language

Wu, Fang, et al. Integration of pre-trained protein language models into geometric deep learning networks. Communications Biology. 6/876, 2023. Westlake University, Hangzhou, China, Yale University, and Tsinghua University, Beijing computational biologists provide another example of this frontier cross-adoption of protein linguistics with AI neural net contents. Our comment for these contributions is that as genetic and metabolic processes are able to be grammatically parsed, so to say, they gain a common textual basis. As a result, a wide and deep natural narrative is being realized in our midst written in an ecosmome to geonome code script. See also ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training by Le Zhuo, et al at arXiv:2403.07920 for more work in this regard.

Geometric deep learning has achieved much success in defining 3D structures of large biomolecules. Meanwhile, protein language models trained on 1D sequences apply to a broad range of applications. In this work, we integrate the knowledge learned by protein language models into geometric networks and evaluate a variety of protein representation learning benchmarks. The incorporation of protein language knowledge enhances geometric networks’ capacity and can be generalized to complex tasks. (Excerpt)

Wu, Yanying and Quanlong Wang. A Categorical Compositional Distributional Modelling for the Language of Life. arXiv:1902.09303. Oxford University computer neuroscientists are able to treat and parse protein biology in a linguistic manner by use of this title computational insight achieved by the Oxford computer science group, search Bob Coecke. We log in amongst concurrent papers which establish this deep affinity between genetic and literary domains, broadly conceived, as a later evolutionary stage of one, same natural script.

The Categorical Compositional Distributional (DisCoCat) Model is a powerful mathematical method for composing the meaning of sentences in natural languages. Since we can think of biological sequences as the "language of life", here we apply this model to see if we can obtain new insights and a better understanding of life’s language. We choose to focus on proteins as their linguistic features are more prominent as compared with other macromolecules such as DNA or RNA. Thus, we treat each protein as a sentence and its constituent domains as words. The meaning of a word or the sentence is its biological function, and the arrangement of protein domains corresponds to the syntax. Putting all those into the DisCoCat frame, we can "compute" the function of a protein with grammar rules that combine them together. (Abstract excerpts)

Xiao, Yi, et al. Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecules. arXiv:2403.13830. Chinese Academy of Sciences AI researchers provide an example of how readily language-based content can be assimilated by computational methods as they are then employed to parse protein linguistics. Altogether a common natural narrative from nucleotides to nouns is being read and written anew,

With recent trend in machine learning and natural language processing is aimed at building multimodal frameworks to jointly model molecules with textual domain knowledge. In this paper, we present the first systematic survey of this integrative endeavor. We focus on advances in text-molecule alignment methods, categorizing current models into two groups based on their architectures and listing relevant pre-training tasks. We next delve into the utilization of large language models and prompting techniques for molecular tasks and present significant applications in drug discovery. (Excerpt)

Zaccagnino, Rocco, et al. Testing DNA Code Words Properties of Regular Languages. Theoretical Computer Science. 608/84, 2015. In a special issue From Computer Science to Biology and Back, we cite this entry by University of Salerno informatics researchers as an example of how DNA nucleotides are finding a common utility across disparate genetic, linguistic and computational domains.

One aspect of DNA Computing is the possibility of using DNA molecules for solving some “complicated” computational problems. In this context, the DNA code word design problem assumes a fundamental role: given a problem encoded in DNA strands and biochemical processes, the final computation is a concatenation of the input DNA strands that must allow us to recover the solution of the given problem in terms of the input (unique decipherability). Thus the initial set of DNA strands must be a code. In addition, it should satisfy some restrictions, called here DNA properties, in order to prevent them from interacting in undesirable ways. So a new interest towards the design of efficient algorithms for testing whether a language X is a code, has arisen from (wet) DNA Computing, but, as far as we know, only when X is a finite set. In this paper we provide an algorithm for testing whether an infinite but regular set of words is a code that avoids some DNA properties among unwanted intermolecular and intramolecular hybridizations. (Abstract)

Zambon, A., et al. Structure of the space of folding protein sequences defined by large language models. Physical Biology. January, 2024. We cite this entry by Center for Complexity and Biosystems, University of Milan researchers as another instance of this mid 2020s cross-integrity of metabolic methods with AI computational network capabilities.

Proteins populate a sequence space whose geometrical structure guides their natural evolution. By way of transformer models, we examine the protein landscape as an effective energy of sequence foldability, an approach similar to optimization methods in machine learning. We then employ statistical mechanics algorithm to explore regions with high local entropy in relatively flat landscapes. Our work thus combines machine learning and statistical physics so to provide new insights into the exploration of sequence landscapes where wide, flat minima coexist alongside narrower minima. (Excerpt)

Zhang, Hong-Yu. The Evolution of Genomes and Language. EMBO Reports. 7/8, 2006. EMBO = European Molecular Biology Organization. A brief note, but one which cites a growing convergence between the “information sciences” of genetics and linguistics. Of especial interest is the author’s comparison with historic inscriptions of Chinese characters.

Zolyan, Suren. From Matter to Form: The Evolution of the Genetic Code as Semio-poiesis. Semiotica. March, 2022. A senior Russian linguist (search) considers how a better perception of nature’s deeply pervasive, self-similar, genomic procreativity can be attained by a turn to and inclusion of an informative biosemiotic essence.

We address issues of description of the origin and evolution of the genetic code from a semiotics standpoint. Developing the concept of code-poiesis introduced by (Marcello) Barbieri, a new idea of semio-poiesis is proposed. Such a recursive auto-referential processing of semiotic system could become a form of organization of the bio-world when notions of meaning are introduced into it. The description of the genetic code as a semiotic system (grammar and vocabulary) allows us to apply an internal reconstruction on the basis of heterogeneity so to explicate forms of coding and textualization. (Abstract excerpt)

The dual nature – biochemical and informational – of the genetic code and genome presupposes that one should be based on the principle of complementarity for its description (cf. Pattee 2007). As in the case of the waveparticle duality of physical entities, only when taken together will biochemical and informational descriptions represent a comprehensive state of affairs. The duality of genetic information will be represented through the double theoretical description. (18)

In philosophy, poiesis (from Ancient Greek) is the activity in which something is brought into being that did not exist before. It is a combined word for the making or formation some compound result. The term autopoiesis refers to a system capable of producing and maintaining itself.

Suren Zolyan: National Academy of Sciences of the Republic of Armenia, Yerevan, Armenia; Institute of Scientific Information on Social Sciences of the Russian Academy of Sciences, Moscow, Russia; and Immanuel Kant Baltic Federal University, Kaliningrad, Russia,

Zolyan, Suren. On the Context-Sensitive Grammar of the Genetic Codes. Biosystems. Volume 206, 2021. Into the 2020s, an Immanuel Kant Baltic Federal University, Kaliningrad, Russia senior linguist can identify deeper affinities between these two prime phases of nature’s generative and descriptive expressions. Further comparisons then include involve semiotic communications, alphabetic comparisons, nucleotide profiles and more. By a reflective view, it seems that whatever maelstroms may consume archaic nations (thinking with tanks), worldwise human intellects continue their quest to parse and read our gravid liferature endowment.

We address the possibilities of the semiotic description of the genetic information as a dual and self-replicative correspondence between its biochemical substance and organization. Combining the principles of contextual dependence and arbitrariness leads to the conclusion that the genetic code's primary elements (nucleotides) can be considered as not only biochemical constants but as having a grammatical quality and informational content. We thus view the genetic phase as a language consisting of 1) units of the alphabet; 2) a vocabulary that includes meaningful items; and 3) context-sensitive rules for the formation of units based on grammatical categories. (Abstract excerpt)

Zolyan, Suren and Renad Zhdanov. Genome as (hyper)Text: From Metaphor to Theory. Semiotica. 225/1, 2018. Immanuel Kant Baltic Federal University, Kalingrad and Moscow State Pedagogical University senior scholars present a strongest claim to date of a natural identity between these preeminent generative codes. With a past reference to I. Kant and Johann Goethe, in our age of global communication a true “isomorphism” is evident for these informative processes. A common trait is their code script (Schrodinger) and sign system (semiotic) quality. A conclusion can then be stated. They are two prime manifest exemplars of an inherently literate cosmos. If we might fully appreciate, in closing a novel beneficial phase of “social genomics” is proposed.

The similarity between language and genetic information transmission has been recognized since molecular genetics was founded. Numerous attempts have been made to use linguistics techniques to decipher protein genes. However, this approach cannot describe a language nor the semantic and textual structures that are decisive for communication. A text should be regarded as an artifact of the creation, conservation and conveyance of information. A general theory should be capable of describing linguistic writings and the process of their structuring, functioning and transformation. A (hyper) text can be considered as a quasi-organism that possesses memory, creative-cognitive characteristics and communicative force, and a cell as a quasi-intelligence capable of manipulating semiotic entities. (Abstract excerpt)

Previous   1 | 2 | 3 | 4 | 5 | 6 | 7