|
IV. Ecosmomics: Independent, UniVersal, Complex Network Systems and a Genetic Code-Script Source2. The Innate Affinity of Genomes, Proteomes and Language Chaing, David, et al. Grammatical Representations of Macromolecular Structure. Journal of Computational Biology. 13/5, 2006. With co-authors Aravind Joshi and David Searls, a contribution to the long convergence of genomes and language, genetic and linguistic composition and function, which exemplify the “same core principles.” Upon reflection, the achievement is to realize molecular realms as a literal text, while our written and spoken discourse then becomes biologically instructive in kind, if only we could learn to perceive and read this testament. Since the first application of context-free grammars to RNA secondary structures in 1988, many researchers have used both ad hoc and formal methods from computational linguistics to model RNA and protein structure. We show how nearly all of these methods are based on the same core principles and can be converted into equivalent approaches in the framework of tree-adjoining grammars and related formalisms. (1077) Chaudhuri, Pramit and Joseph Dexter. Bioinformatics and Classical Literary Study. arXiv:1602.08844. As part of an ongoing program, a Dartmouth University professor of classics, and a Harvard University molecular biologist explain how a genome sequencing technique can be applied to parse classic Latin epics such as Vergil’s Aeneid. The literary humanities can thus become amenable to a common “-omics” analysis. As other entries in this new section report, an historic discovery, mostly unbeknownst, of an innately textual, ultimately genomic nature which reaches from cosmome and quantome to our human epitome is dawning in our midst. This paper describes a collaborative project between classicists, quantitative biologists, and computer scientists to apply ideas and methods drawn from the sciences to the study of literature. A core goal of the project is the use of computational biology, natural language processing, and machine learning techniques to investigate intertextuality, reception, and related phenomena of literary significance. As a case study in our approach, here we describe the use of sequence alignment, a common technique in genomics, to detect intertextuality in Latin literature. Sequence alignment is distinguished by its ability to find inexact verbal parallels, which makes it ideal for identifying phonetic resemblances in large corpora of Latin texts. Although especially suited to Latin, sequence alignment in principle can be extended to many other languages. (Abstract) Consens, Micaela, et al.. To Transformers and Beyond: Large Language Models for the Genome. arXiv:2311.07621. As an example of how far an integrative merger of AI large language models, deep neural nets and genetic sequence techniques has proceeded this year, in November nine computational biologists mainly in Toronto and Munich, post a 33 page, 124 citation tutorial which introduces these novel capabilities as they are seen to open a new frontier. See also BEND: Benchmarking DNA Language Models by F. Marin, et al (2311.12570) for a similar guide. In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool for tackling complex computational challenges. This review focuses on the transformative role of Large Language Models (LLMs), which are mostly based on the transformer architecture, in genomics. Building on convolutional and recurrent neural network abilities, we explore the strengths and limitations of transformers and other LLMs for genomics. Our paper is meant as a guide for computational biologists and computer scientists interested in LLMs for genomic data. (Excerpt) Delwiche, Charles. The Genomic Palimpsest: Genomics in Evolution and Ecology. BioScience. 54/11, 2004. Advances in the sequencing and analyzing of complete genomes can now inform the study of populations, interacting organisms and the course of evolution. A clever metaphor is then enlisted whereby the DNA code is seen to have evolved in a similar way to medieval manuscripts. Because good parchment was scarce, earlier texts were partially erased and written over by later, superimposed passages, known as a palimpsest. Genomes likewise evolve not by adding new genes but through modifying prior ones whose remnants reflect its history. Deming, Laura, et al. Genetic Architect: Discovering Genomic Structure with Learned Neural Architectures. arXiv:1605.07156. As the genetic and brain sciences merge and cross-inform, UC San Francisco, Institute for Human Genetics, researchers show how deep learning network algorithms can also effectively parse genome phenomena. An inclusive synthesis then seems underway from celestial webs (Coutinho) to literature (Rosetta Cosmos), as a cosmos to culture anatomy and physiology just now becomes realized. Each human genome is a 3 billion base pair set of encoding instructions. Decoding the genome using deep learning fundamentally differs from most tasks, as we do not know the full structure of the data and therefore cannot design architectures to suit it. As such, architectures that fit the structure of genomics should be learned not prescribed. Here, we develop a novel search algorithm, applicable across domains, that discovers an optimal architecture which simultaneously learns general genomic patterns and identifies the most important sequence motifs in predicting functional genomic outcomes. The architectures we find using this algorithm succeed at using only RNA expression data to predict gene regulatory structure, learn human-interpretable visualizations of key sequence motifs, and surpass state-of-the-art results on benchmark genomics challenges. (Abstract) Dunn, Ian. Are Molecular Alphabets Universal Enabling Factors for the Evolution of Complex Life? Origins of Life and Evolution of Biospheres. 43/6, 2013. The CytoCure LLC geneticist and research director comes a description of nucleotides and proteins as most like an alphabetic string of characters. While a correspondence between genetics and linguistic has been in the offing for some time, Ian Dunn gives it an updated, robust affirmation. Thus “a digital self-organizing complementary primary replicative alphabet” is seen as a universal property of genomic phenomena. And as ever the implication is a textual, inscribed, naturome as life’s newly legible script. Terrestrial biosystems depend on macromolecules, and this feature is often considered as a likely universal aspect of life. While opinions differ regarding the importance of small-molecule systems in abiogenesis, escalating biological functional demands are linked with increasing complexity in key molecules participating in biosystem operations, and many such requirements cannot be efficiently mediated by relatively small compounds. It has long been recognized that known life is associated with the evolution of two distinct molecular alphabets (nucleic acid and protein), specific sequence combinations of which serve as informational and functional polymers. In contrast, much less detailed focus has been directed towards the potential universal need for molecular alphabets in constituting complex chemically-based life, and the implications of such a requirement. Eetemadi, Ameen and Ilias Tagkopoulos. Genetic Neural Networks: An Artificial Neural Network Architecture for Capturing Gene Expression Relationships. Bioinformatics. 35/13, 2019. We cite this entry by UC Davis computer scientists to show how readily these popular analytic methods seem to find similar application everywhere, even in this case so as to parse life’s heredity. Could commonality infer that brains and genomes and all else are deeply cerebral, information bearing, relative aware in kind? Results: We present the Genetic Neural Network (GNN), an artificial neural network for predicting genome-wide gene expression given gene knockouts and master regulator perturbations. In its core, the GNN maps existing gene regulatory information in its architecture and it uses cell nodes that have been specifically designed to capture the dependencies and non-linear dynamics that exist in gene networks. Our results argue that GNNs can become the architecture of choice when building predictors of gene expression from the growing corpus of genome-wide transcriptomics data. Elnaggar, A., et al. ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing. IEEE Transactions on Pattern Analysis and Machine Intelligence. 14/8, 2021. Twelve computer scientists mainly at the Technical University of Munich explore these 2020 frontiers of deep new methods and insights into the deep, natural grammars of the language of life9 from the text). A long Abstract cites many computer code methods with a facile ability facility to read, write and take up natural, ecosmomic code-scripts. See also CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing by the same group at arXiv:2104.02443,. Faltynek, Dan, et al. Bases are not Letters: On the Analogy between the Genetic Code and Natural Language by Sequence Analysis. Biosemiotics. Online April, 2019. Palacky University, Olomouc, Czech Republic system scholars DF, Vladimir Matlach, and Ludmila Lackova (search) continue their project to parse an endemic, natural affinity between the prime informative occasions of biochemical nucleotide genomes and human linguistic complexities. The article deals with the notion of the genetic code and its metaphorical understanding as a “language”. In the traditional view of the language metaphor of the genetic code, combinations of nucleotides are signs of amino acids. Similarly, words combined from letters (speech sounds) represent certain meanings. The language metaphor of the genetic code assumes that the nucleotides stay in the analogy to letters, triples to words and genes to sentences. We propose an application of mathematical linguistic methods on the notion of the genetic code. We provide quantitative analysis (n-gram structure, Zipf’s law) of mRNA strings and natural language texts, along with a representative analysis of DNA, RNA and proteins. Our analysis of mRNA confirms an assumption that the design of the genetic code cannot analogize DNA bases and letters. The notion of the letter is much more appropriate if analogized with triplets or amino acids (Abstract excerpt) Fang, Jing-Kai, et al. Divide-and-Conquer Quantum Algorithm for Hybrid de novo Genome Assembly of Short and Long Reads. PRX Life. 2/023006, 2024. We note this contribution by BGI Research, Shenzhen, China computational geneticists as a frontier instance of how genetic studies are being taken to a new dimension by virtue of quantum capabilities. The evidential result implies that life’s implicate genomic proscription can gain an affinity with this fundamental physical ground. Researchers have begun to apply quantum computing in genome assembly implementation, but the issue of repetitive sequences remains unresolved. Here, we propose a hybrid assembly quantum algorithm using short reads and long reads which utilizes divide-and-conquer strategies to approximate the ground state of a larger Hamiltonian while conserving quantum resources. The convergence speed is improved via the problem-inspired Ansatz based on the known information. In addition, we verify that entanglement within quantum circuits may accelerate the assembly path optimization. (Excerpt) Ferrer-I-Cancho, Ramon and Nuria Forns. The Self-Organization of Genomes. Complexity. Online First, March, 2010. As the quote cites, Barcelona biologists contribute to the recent robust affirmation that genetic and linguistic codes are one and the same in their expression of the universal complex system dynamics, which then, one may add, could take on the likeness of an independent, mathematical cosmic genotype. Menzerath-Altmann law is a general law of human language stating, for instance, that the longer a word, the shorter its syllables. With the metaphor that genomes are words and chromosomes are syllables, we examine if genomes also obey the law. We find that longer genomes tend to be made of smaller chromosomes in organisms from three different kingdoms: fungi, plants, and animals. Our findings suggest that genomes self-organize under principles similar to those of human language. (Abstract) Ferrer-i-Cancho, Ramon, et al. The Challenges of Statistical Patterns of Language: The Case of Menzerath’s Law in Genomes. Complexity. Online December, 2012. With coauthors Nuria Forns, Antoni Hernandez-Fernandez, Gemma Bel-enguix and Jaume Baixeries, Barcelona systems scientists advise that along with (George Kingsley) Zipf’s law, the theorem of German linguist Paul Menzerath about word or note frequencies in a text or score can hold equally well for biomolecular nucleotide genomes. By these lights, another entry is gained to appreciate a deep, parallel affinity between the genetic code and literate languages. The importance of statistical patterns of language has been debated over decades. Although Zipf's law is perhaps the most popular case, recently, Menzerath's law has begun to be involved. Menzerath's law manifests in language, music and genomes as a tendency of the mean size of the parts to decrease as the number of parts increases in many situations. This statistical regularity emerges also in the context of genomes, for instance, as a tendency of species with more chromosomes to have a smaller mean chromosome size. It has been argued that the instantiation of this law in genomes is not indicative of any parallel between language and genomes because (a) the law is inevitable and (b) noncoding DNA dominates genomes. Here mathematical, statistical, and conceptual challenges of these criticisms are discussed. Two major conclusions are drawn: the law is not inevitable and languages also have a correlate of noncoding DNA. However, the wide range of manifestations of the law in and outside genomes suggests that the striking similarities between noncoding DNA and certain linguistics units could be anecdotal for understanding the recurrence of that statistical law. (Abstract)
Previous 1 | 2 | 3 | 4 | 5 | 6 | 7 Next
|
||||||||||||||||||||||||||||||||||||||||||||||
HOME |
TABLE OF CONTENTS |
Introduction |
GENESIS VISION |
LEARNING PLANET |
ORGANIC UNIVERSE |