IV. Ecosmomics: Independent, UniVersal, Complex Network Systems and a Genetic Code-Script Source

3. Iteracy: A Rosetta Ecosmos Textuality

Bibliography on Laws of Language Outside Human Language. http://www.lsi.upc.edu/~rferrericancho/laws_of_language_outside_human_language.htmll. A 2012 resource posted by Ramon Ferrer-i-Cancho, Universitat Politecnica de Catalunya (BarcelonaTech), linguist, for the “Statistical Laws of Language on the Behavior of Other Species, Genomes and Beyond.” Its topical list is Law of Brevity, Menserath-Altmann Law, Number of Meanings versus Frequency, and Zipf’s Law for Word Frequencies. Search Ferrer-i-Cancho for more contributions.

Physicists' Papers on Natural Language from a Complex Systems Viewpoint. www.pks.mpg.de/mpi-doc/sodyn/physicist-language. A bibliography on this field posted by Eduardo Altmann and Martin Gerlach, director and member of the Dynamical Systems and Social Dynamics group, Max Planck Institute for the Physics of Complex Systems. As of August 2012, it contains over 250 citations from 1989 to 2012. From this page can also be reached a similar “Physics Papers in Natural Languages” listing on the Mendeley site.

, . Stanisz, Tomasz et al. Complex systems approach to natural language. Physics Reports. Volume 1053, 2024.. Physics Reports. Volume 1053, 2024. Polish Academy of Sciences system theorists TS, Jarosław Kwapień, and Stanisław Drożdż (search JK, SD) contribute a 2020s complexity science survey with 113 pages and 380 references, after their 2012 paper Physical Approach to Complex Systems in this journal (515/3-4). In this entry they achieve a thorough exegesis of the many ways that our vital conversant literacy strongly exemplifies a wide array of nonlinear dynamical qualities. As the quotes say, a constant fractal-like similarity can be seen to course through writings everywhere. Word-adjacency networks, sentence lengths, word frequency and other aspects can be parsed as an emblematic animate complexity. By a philosophia vista, an ecosmic genesis universe might be well appreciated as a natural narrative we Earthlings seem meant to read, avail and continue. See also Scale in Language by N. J. Enfield in Cognitive Science (47/10, 2023).

The science of complexity studies what rules nature uses to assemble matter and energy into structures and dynamical patterns that cascade through the hierarchy of scales in the Universe. Natural language can mirror such phenomena by its ability to encode and transmit information and thus deserves a central place in the quantitative project. Here we review its basic concepts and identify common as well as specific features of written examples in several Western languages. Three research trends in quantitative linguistics are then covered. The first addresses word frequencies while the second uses methods from time series analysis to study long-range correlations. It turns out that these textual features match signals generated by a fractal or multifractal geometry. In the third part, a network formalism is applied to word-adjacency statistics, co-occurrences and a semantic hierarchy of word associations. (Excerpt)

Scale invariance: There is another common property of complex systems related to the hierarchies. Both the structure and processes are the same over a broad range of spatial and temporal scales. The examples are ubiquitous such as the circulatory and nervous systems, shore coastlines, crystal growth, earthquake magnitudes, turbulent vortices, animal metabolic rates, social wealth distribution, web page links, and so on. Various mechanisms were proposed in attempt to account for This universality of scale invariance in nature. Self-organization is closely related to critical phenomena, therefore the scale invariance, which is universal for all such phenomena, may in a natural way be among the most important mechanisms leading to the scale invariance in complex systems. (11)

Natural language is a system at the interface between biology and personal interactions. In one way, language is aligned to the principles of human brain function of which it is a product. However, language emerged in a long process of socialization so to facilitate vital exchanges between members of a primitive human group. In regard, language is an emergent phenomenon born out of complex nonlinear interrelations among neurons in many different brain areas. (16)

Natural language is an advanced and efficient achievement of nature and it is for this reason alone that one can expect it to serve as a paradigm of a complex system. Furthermore, because the laws of nature can be so effectively formalized in the mathematical terms, one can also expect that certain characteristics of natural language can be described within this framework. Indeed, as the present review documents various aspects of linguistic patterns and organization can be grasped using mathematical tools ranging from basic methods of statistics and time series analysis to fractal geometry and network theory. (113)

Altmann, Eduardo and Martin Gerlach. Statistical Laws in Linguistics. arXiv:1502.03296. As another example of an universe to human textual narrative, MPI Physics of Complex Systems researchers are able to treat historic literature as a singular dynamic corpus amenable to the same nonlinear complex principles as everywhere else. The paper was presented at the 2014 Flow Machines Workshop: Creativity and Universality in Language in Paris (Google) where an array of similar works add confirmation, such as Fluctuations, Self-similarity, and Universality in Music by Theo Geisel, noted below. This 2010s witness of a cosmos to civilization genesis by virtue of an emergent iteration might as well be cited as scripture (nature) to scribes (people), it we can only learn to read and write.

Zipf's law is just one out of many universal laws proposed to describe statistical regularities in language. Here we review and critically discuss how these laws can be statistically interpreted, fitted, and tested (falsified). The modern availability of large databases of written text allows for tests with an unprecedent statistical accuracy and also a characterization of the fluctuations around the typical behavior. We find that fluctuations are usually much larger than expected based on simplifying statistical assumptions (e.g., independence and lack of correlations between observations).These simplifications appear also in usual statistical tests so that the large fluctuations can be erroneously interpreted as a falsification of the law. Instead, here we argue that linguistic laws are only meaningful (falsifiable) if accompanied by a model for which the fluctuations can be computed (e.g., a generative model of the text). The large fluctuations we report show that the constraints imposed by linguistic laws on the creativity process of text generation are not as tight as one could expect. (Abstract)

Time-series analyses have revealed the existence of long-range correlations in musical pitch and loudness fluctuations as well as in temporal fluctuations of musical rhythms. This talk investigates the statistical laws underlying these fluctuations and discusses their origins and their role in musical perception. Based on our findings one can make computer generated music sound more human. Audio examples from the Art of Fugue to stochastic music highlight the general role of long range correlations and self-similarity in music for its perception by the information processing in our brains. (Geisel abstract)

Altmann, Eduardo, et al. Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words. PLoS One. 4/11, 2009. Northwestern University scientists from Complex Systems, Linguistics, and Physics departments proceed to find manifest evidence in literary documents of universal dynamical network phenomena. In turn might one assume that cosmic mathematical nature, as Galileo advised four centuries ago, to be essentially textual, read scriptural, in kind.

Background: Zipf’s discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective on human communication. More recent research has also identified scaling regularities in the dynamics underlying the successive occurrences of events, suggesting the possibility of similar findings for language as well. (e7678)

Conclusions/Significance: Recurrence patterns of words are well described by a stretched exponential distribution of recurrence times, an empirical scaling that cannot be anticipated from Zipf’s law. Because the use of words provides a uniquely precise and powerful lens on human thought and activity, our findings also have implications for other overt manifestations of collective human dynamics. (e7678)

Research on the distribution of time intervals between successive occurrences of events has revealed correspondences between natural phenomena on the one hand [1,2] and social activities on the other hand [3–5]. These studies consistently report bursty deviations both from random and from regular temporal distributions of events [6]. Taken together, they suggest the existence of a dynamic counterpart to the universal scaling laws in magnitude and frequency distributions [7–11]. Language, understood as an embodied system of representation and communication [12], is a particularly interesting and promising domain for further exploration, because it both epitomizes social activity, and provides a medium for conceptualizing natural and biological reality. (e7678)

Altmann, Eduardo, et al. Generalized Entropies and the Similarity of Texts. Journal of Statistical Mechanics. 014002, 2017. MPI Physics of Complex Systems researchers proceed to reconceive literary corpora by way universal nonlinear principles, often using the phrase Natural Language. If our speech and script indeed has a deep rooting in dynamic physical structures (see Feistel), might we ask what script is a genesis cosmos written in? And could its natural genetic code be known as Genlish or Genelish?

We show how generalized Gibbs–Shannon entropies can provide new insights on the statistical properties of texts. The universal distribution of word frequencies (Zipf's law) implies that the generalized entropies, computed at the word level, are dominated by words in a specific range of frequencies. Here we show that this is the case not only for the generalized entropies but also for the generalized (Jensen–Shannon) divergences, used to compute the similarity between different texts. This finding allows us to identify the contribution of specific words (and word frequencies) for the different generalized entropies and also to estimate the size of the databases needed to obtain a reliable estimation of the divergences. We test our results in large databases of books (Google n-gram database) and scientific papers (Web of Science). (Abstract)

Altmann, Eduardo, et al. On the Origin of Long-Range Correlations in Texts. Proceedings of the National Academy of Sciences. 109/11582, 2012. MPI, Physics of Complex Systems, and University of Bologna, scientists parse the corpora of human writings and uncover a distinctive nonlinear hierarchical patterning. It is then alluded in closing (second quote) that such literatures can exemplify nature’s recurrent complexities, so as to reveal a deep affinity between languag(om)e and genome.

The complexity of human interactions with social and natural phenomena is mirrored in the way we describe our experiences through natural language. In order to retain and convey such a high dimensional information, the statistical properties of our linguistic output has to be highly correlated in time. An example are the robust observations, still largely not understood, of correlations on arbitrary long scales in literary texts. In this paper we explain how long-range correlations flow from highly structured linguistic levels down to the building blocks of a text (words, letters, etc.). By combining calculations and data analysis we show that correlations form a bursty sequence of events once we approach the semantically relevant topics of the text. The mechanisms we identify are fairly general and can be equally applied to other hierarchical settings. (Abstract)

Apart from these applications, more fundamental extensions of our results should: (i) consider the mutual information and similar entropy-related quantities, which have been widely used to quantify long-range correlations; (ii) go beyond the simplest case of the two point autocorrelation function and consider multi-point correlations or higher order entropies, which are necessary for the complete characterization of the correlations of a sequence; and (iii) consider the effect of non-stationarity on higher levels, which could cascade to lower levels and affect correlations properties. Finally, we believe that our approach may help to understand long-range correlations in any complex system for which an hierarchy of levels can be identified, such as human activities and DNA sequences. (11587)

Amancio, Diego. Probing the Topological Properties of Complex Networks Modeling Short Written Texts. PLoS One. Online February, 2015. We cite the latest array of papers by the University of Sao Paulo physicist as he continues to discern the many ways that even human literature can be treated as a dynamic complex system by statistical, condensed matter physical methods, similar to every other domain. See also his A Complex Network Approach to Stylometry (this journal, August 2015) and Network Analysis of Named Interactions in Written Texts at arXiv:1509.05281.

Amancio, Diego Raphael, et al. Identification of Literary Movements Using Complex Networks to Represent Texts. New Journal of Physics. 14/4, 2012. In a paper cited as Computational Physics, Statistical Physics and Nonlinear Systems, São Carlos University of São Paulo, Brazil, researchers find the historical corpus of literature to instantiate the characteristic format and exposition of scale-free network topologies. See also in the same journal (11/12, 2009) “The Meta Book and Size-Dependent Properties of Written Language” by Sebastian Bernhardsson, et al, for a similar sense of humankind’s transcriptions as if a single dynamic text. Search "Amancio" in arXiv for more similar papers.

The use of statistical methods to analyze large databases of text has been useful in unveiling patterns of human behavior and establishing historical links between cultures and languages. In this study, we identified literary movements by treating books published from 1590 to 1922 as complex networks, whose metrics were analyzed with multivariate techniques to generate six clusters of books. The latter correspond to time periods coinciding with relevant literary movements over the last five centuries. The most important factor contributing to the distinctions between different literary styles was the average shortest path length, in particular the asymmetry of its distribution. Furthermore, over time there has emerged a trend toward larger average shortest path lengths, which is correlated with increased syntactic complexity, and a more uniform use of the words reflected in a smaller power-law coefficient for the distribution of word frequency. Changes in literary style were also found to be driven by opposition to earlier writing styles, as revealed by the analysis performed with geometrical concepts. The approaches adopted here are generic and may be extended to analyze a number of features of languages and cultures. (Abstract)

Amancio, Diego, et al. Complex Network Analysis of Language Complexity. Europhysics Letters. 100/5, 2012. As the Abstract notes, University of Sao Paulo mathematicians find a deep affinity, across a wide expanse, between physical phenomena and nonlinear linguistic discourse. At once, might we muse what does this say about cosmic nature, which seems to be seeking a voice and vision through us?

Methods from statistical physics, such as those involving complex networks, have been increasingly used in the quantitative analysis of linguistic phenomena. In this paper, we represented pieces of text with different levels of simplification in co-occurrence networks and found that topological regularity correlated negatively with textual complexity. The complex networks metrics were treated with multivariate pattern recognition techniques, which allowed us to distinguish between original texts and their simplified versions. For each original text, two simplified versions were generated manually with increasing number of simplification operations. As expected, distinction was easier for the strongly simplified versions, where the most relevant metrics were node strength, shortest paths and diversity. Also, the discrimination of complex texts was improved with higher hierarchical network metrics, thus pointing to the usefulness of considering wider contexts around the concepts. Though the accuracy rate in the distinction was not as high as in methods using deep linguistic knowledge, the complex network approach is still useful for a rapid screening of texts whenever assessing complexity is essential to guarantee accessibility to readers with limited reading ability. (Abstract)

Arias-Gonzalez, Ricardo. Writing, Proofreading and Editing in Information Theory. Entropy. 20/5, 2018. In a suitable paper for this Rosetta Cosmos section, an IMDEA Nanoscience Institute, Madrid researcher parses and draws out a literary analogy across a wide range of natural, organic, computational and genetic systems. See also by the author Information Management in DNA Replication Modeled by Directional Stochastic Chains (145/185103, 2016) and Thermodynamic Framework for Information in Nanoscale Systems (147/205101, 2017) in the Journal of Chemical Physics. In regard, we seem to be just realizing a natural uniVerse which is indeed textual in kind, written in a genetic scriptome which we cosmic curators are meant to read and intentionally continue forward.

Information is a physical entity amenable to be described by an abstract theory. The concepts associated with the creation and post-processing of the information have not, however, been mathematically established, despite being broadly used in many fields of knowledge. Here, inspired by how information is managed in biomolecular systems, we introduce writing, entailing any bit string generation, and revision, as comprising proofreading and editing, in information chains. Our formalism expands the thermodynamic analysis of stochastic chains made up of material subunits to abstract strings of symbols. We introduce a non-Markovian treatment of operational rules over the symbols of the chain that parallels the physical interactions responsible for memory effects in material chains. Our theory underlies any communication system, ranging from human languages and computer science to gene evolution. (Abstract)

This work represents a round trip: from information theory to stochastic thermodynamics, non-Markovian dynamics and molecular biophysics, and back. This trip affords for the first time the introduction of writing, proofreading and editing in information theory, with a sufficient degree of conceptualization to be used in social and natural sciences, including the physics of stochastic systems and nanoscale engineering. Our formalism enables semantic roots to bit sequences, which are important in gene evolution analysis and in context-sensitive error correction algorithms for computational linguistics, currently based on statistical occurrences. (9)

Arsiwalla, Xerxes, et al. Connectomics to Semantomics: Addressing the Brain’s Big Data Challenge. Procedia Computer Science. 53/48, 2015. An eight member team of University of Pompeu Fabra, Barcelona cognitive scientists seek to integrate these cerebral and linguistic phases so to better handle and avail the flood of digital information. Of much interest is a perception of how our literature can be seen to possess a genomic quality, which then melds with a similar neural mode. See also The Global Dynamic Complexity of the Human Brain by Ariswalla and Paul Verschure (search). As many current papers, a seamless transference is growing across natural and social realms from cosmome to algorithome, neurome to languagome. Such a universal, independent occurrence just being found everywhere bodes well for a salutary worldwise discovery.

An eight member team of University of Pompeu Fabra, Barcelona cognitive scientists seek to integrate these cerebral and linguistic phases so to better handle and avail the flood of digital information. Of much interest is a perception of how our literature can be seen to possess a genomic quality, which then melds with a similar neural mode. See also The Global Dynamic Complexity of the Human Brain by Ariswalla and Paul Verschure (search). As many current papers, a seamless transference is growing across natural and social realms from cosmome to algorithome, neurome to languagome. Such a universal, independent repetition just being found everywhere bodes well for a salutary worldwise discovery.

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 Next [More Pages]