II. Planetary Prodigy: An Emergent Sapiensphere Learns on Her/His Own
4. A Rosetta Cosmos Literacy
Bibliography on Laws of Language Outside Human Language.
A 2012 resource posted by Ramon Ferrer-i-Cancho, Universitat Politecnica de Catalunya (BarcelonaTech), linguist, for the “Statistical Laws of Language on the Behavior of Other Species, Genomes and Beyond.” Its topical list is Law of Brevity, Menserath-Altmann Law, Number of Meanings versus Frequency, and Zipf’s Law for Word Frequencies. Search Ferrer-i-Cancho for more contributions.
Physicists' Papers on Natural Language from a Complex Systems Viewpoint. www.pks.mpg.de/mpi-doc/sodyn/physicist-language. A bibliography on this field posted by Eduardo Altmann and Martin Gerlach, director and member of the Dynamical Systems and Social Dynamics group, Max Planck Institute for the Physics of Complex Systems. As of August 2012, it contains over 250 citations from 1989 to 2012. From this page can also be reached a similar “Physics Papers in Natural Languages” listing on the Mendeley site.
Altmann, Eduardo and Martin Gerlach. Statistical Laws in Linguistics. arXiv:1502.03296. As another example of an universe to human textual narrative, MPI Physics of Complex Systems researchers are able to treat historic literature as a singular dynamic corpus amenable to the same nonlinear complex principles as everywhere else. The paper was presented at the 2014 Flow Machines Workshop: Creativity and Universality in Language in Paris (Google) where an array of similar works add confirmation, such as Fluctuations, Self-similarity, and Universality in Music by Theo Geisel, noted below. This 2010s witness of a cosmos to civilization genesis by virtue of an emergent iteration might as well be cited as scripture (nature) to scribes (people), it we can only learn to read and write.
Altmann, Eduardo, et al. Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words. PLoS One. 4/11, 2009. Northwestern University scientists from Complex Systems, Linguistics, and Physics departments proceed to find manifest evidence in literary documents of universal dynamical network phenomena. In turn might one assume that cosmic mathematical nature, as Galileo advised four centuries ago, to be essentially textual, read scriptural, in kind.
Background: Zipf’s discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective on human communication. More recent research has also identified scaling regularities in the dynamics underlying the successive occurrences of events, suggesting the possibility of similar findings for language as well. (e7678)
Altmann, Eduardo, et al. Generalized Entropies and the Similarity of Texts. Journal of Statistical Mechanics. 014002, 2017. MPI Physics of Complex Systems researchers proceed to reconceive literary corpora by way universal nonlinear principles, often using the phrase Natural Language. If our speech and script indeed has a deep rooting in dynamic physical structures (see Feistel), might we ask what script is a genesis cosmos written in? And could its natural genetic code be known as Genlish or Genelish?
We show how generalized Gibbs–Shannon entropies can provide new insights on the statistical properties of texts. The universal distribution of word frequencies (Zipf's law) implies that the generalized entropies, computed at the word level, are dominated by words in a specific range of frequencies. Here we show that this is the case not only for the generalized entropies but also for the generalized (Jensen–Shannon) divergences, used to compute the similarity between different texts. This finding allows us to identify the contribution of specific words (and word frequencies) for the different generalized entropies and also to estimate the size of the databases needed to obtain a reliable estimation of the divergences. We test our results in large databases of books (Google n-gram database) and scientific papers (Web of Science). (Abstract)
Altmann, Eduardo, et al. On the Origin of Long-Range Correlations in Texts. Proceedings of the National Academy of Sciences. 109/11582, 2012. MPI, Physics of Complex Systems, and University of Bologna, scientists parse the corpora of human writings and uncover a distinctive nonlinear hierarchical patterning. It is then alluded in closing (second quote) that such literatures can exemplify nature’s recurrent complexities, so as to reveal a deep affinity between languag(om)e and genome.
The complexity of human interactions with social and natural phenomena is mirrored in the way we describe our experiences through natural language. In order to retain and convey such a high dimensional information, the statistical properties of our linguistic output has to be highly correlated in time. An example are the robust observations, still largely not understood, of correlations on arbitrary long scales in literary texts. In this paper we explain how long-range correlations flow from highly structured linguistic levels down to the building blocks of a text (words, letters, etc.). By combining calculations and data analysis we show that correlations form a bursty sequence of events once we approach the semantically relevant topics of the text. The mechanisms we identify are fairly general and can be equally applied to other hierarchical settings. (Abstract)
Amancio, Diego. Probing the Topological Properties of Complex Networks Modeling Short Written Texts. PLoS One. Online February, 2015. We cite the latest array of papers by the University of Sao Paulo physicist as he continues to discern the many ways that even human literature can be treated as a dynamic complex system by statistical, condensed matter physical methods, similar to every other domain. See also his A Complex Network Approach to Stylometry (this journal, August 2015) and Network Analysis of Named Interactions in Written Texts at arXiv:1509.05281.
Amancio, Diego Raphael, et al. Identification of Literary Movements Using Complex Networks to Represent Texts. New Journal of Physics. 14/4, 2012. In a paper cited as Computational Physics, Statistical Physics and Nonlinear Systems, São Carlos University of São Paulo, Brazil, researchers find the historical corpus of literature to instantiate the characteristic format and exposition of scale-free network topologies. See also in the same journal (11/12, 2009) “The Meta Book and Size-Dependent Properties of Written Language” by Sebastian Bernhardsson, et al, for a similar sense of humankind’s transcriptions as if a single dynamic text. Search "Amancio" in arXiv for more similar papers.
The use of statistical methods to analyze large databases of text has been useful in unveiling patterns of human behavior and establishing historical links between cultures and languages. In this study, we identified literary movements by treating books published from 1590 to 1922 as complex networks, whose metrics were analyzed with multivariate techniques to generate six clusters of books. The latter correspond to time periods coinciding with relevant literary movements over the last five centuries. The most important factor contributing to the distinctions between different literary styles was the average shortest path length, in particular the asymmetry of its distribution. Furthermore, over time there has emerged a trend toward larger average shortest path lengths, which is correlated with increased syntactic complexity, and a more uniform use of the words reflected in a smaller power-law coefficient for the distribution of word frequency. Changes in literary style were also found to be driven by opposition to earlier writing styles, as revealed by the analysis performed with geometrical concepts. The approaches adopted here are generic and may be extended to analyze a number of features of languages and cultures. (Abstract)
Amancio, Diego, et al. Complex Network Analysis of Language Complexity. Europhysics Letters. 100/5, 2012. As the Abstract notes, University of Sao Paulo mathematicians find a deep affinity, across a wide expanse, between physical phenomena and nonlinear linguistic discourse. At once, might we muse what does this say about cosmic nature, which seems to be seeking a voice and vision through us?
Methods from statistical physics, such as those involving complex networks, have been increasingly used in the quantitative analysis of linguistic phenomena. In this paper, we represented pieces of text with different levels of simplification in co-occurrence networks and found that topological regularity correlated negatively with textual complexity. The complex networks metrics were treated with multivariate pattern recognition techniques, which allowed us to distinguish between original texts and their simplified versions. For each original text, two simplified versions were generated manually with increasing number of simplification operations. As expected, distinction was easier for the strongly simplified versions, where the most relevant metrics were node strength, shortest paths and diversity. Also, the discrimination of complex texts was improved with higher hierarchical network metrics, thus pointing to the usefulness of considering wider contexts around the concepts. Though the accuracy rate in the distinction was not as high as in methods using deep linguistic knowledge, the complex network approach is still useful for a rapid screening of texts whenever assessing complexity is essential to guarantee accessibility to readers with limited reading ability. (Abstract)
Arias-Gonzalez, Ricardo. Writing, Proofreading and Editing in Information Theory. Entropy. 20/5, 2018. In a suitable paper for this Rosetta Cosmos section, an IMDEA Nanoscience Institute, Madrid researcher parses and draws out a literary analogy across a wide range of natural, organic, computational and genetic systems. See also by the author Information Management in DNA Replication Modeled by Directional Stochastic Chains (145/185103, 2016) and Thermodynamic Framework for Information in Nanoscale Systems (147/205101, 2017) in the Journal of Chemical Physics. In regard, we seem to be just realizing a natural uniVerse which is indeed textual in kind, written in a genetic scriptome which we cosmic curators are meant to read and intentionally continue forward.
Information is a physical entity amenable to be described by an abstract theory. The concepts associated with the creation and post-processing of the information have not, however, been mathematically established, despite being broadly used in many fields of knowledge. Here, inspired by how information is managed in biomolecular systems, we introduce writing, entailing any bit string generation, and revision, as comprising proofreading and editing, in information chains. Our formalism expands the thermodynamic analysis of stochastic chains made up of material subunits to abstract strings of symbols. We introduce a non-Markovian treatment of operational rules over the symbols of the chain that parallels the physical interactions responsible for memory effects in material chains. Our theory underlies any communication system, ranging from human languages and computer science to gene evolution. (Abstract)
Ariswalla, Xerxes, et al. Connectomics to Semantomics: Addressing the Brain’s Big Data Challenge. Procedia Computer Science. 53/48, 2015. An eight member team of University of Pompeu Fabra, Barcelona cognitive scientists seek to integrate these cerebral and linguistic phases so to better handle and avail the flood of digital information. Of much interest is a perception of how our literature can be seen to possess a genomic quality, which then melds with a similar neural mode. See also The Global Dynamic Complexity of the Human Brain by Ariswalla and Paul Verschure (search). As many current papers, a seamless transference is growing across natural and social realms from cosmome to algorithome, neurome to languagome. Such a universal, independent occurrence just being found everywhere bodes well for a salutary worldwise discovery.
An eight member team of University of Pompeu Fabra, Barcelona cognitive scientists seek to integrate these cerebral and linguistic phases so to better handle and avail the flood of digital information. Of much interest is a perception of how our literature can be seen to possess a genomic quality, which then melds with a similar neural mode. See also The Global Dynamic Complexity of the Human Brain by Ariswalla and Paul Verschure (search). As many current papers, a seamless transference is growing across natural and social realms from cosmome to algorithome, neurome to languagome. Such a universal, independent repetition just being found everywhere bodes well for a salutary worldwise discovery.
Ausloos, Marcel. Measuring Complexity with Multifractals in Texts. Chaos, Solitons & Fractals. 45/1349, 2012. The University of Liege systems linguist cleverly treats humanity’s literature as a physical materiality suffused with the same dynamic self-similarities. And by a two-way inference, may one surmise that our written corpus can thus become ensconced within nature’s universality, and moreover that a greater logos creation appears as truly textual testament in kind?
Should quality be almost a synonymous of complexity? To measure quality appears to be audacious, even very subjective. It is hereby proposed to use a multifractal approach in order to quantify quality, thus through complexity measures. A one-dimensional system is examined. It is known that (all) written texts can be one-dimensional nonlinear maps. Thus, several written texts by the same author are considered, together with their translation, into an unusual language, Esperanto, and as a baseline their corresponding shuffled versions. (Abstract)