Multilayer Network of Language: a Uniﬁed
Framework for Structural Analysis of Linguistic
Domagoj Margan, AnaMeˇstroviґc, SandaMartinˇciґc-Ipˇsiґc
Department of Informatics,
5 University of Rijeka,
1 RadmileMatejˇciґc 2, 51000 Rijeka, Croatia
Если мы ищем наиболее частые сочетания слов, то мы рассматриваем только один слой – словесный как бы. Под ним находятся скрытые при таком подходе слои – слоги и буквы. Subword layers. Фишка в том, что частота корреляций между слогами как-то там влияет на частоту корреляций между словами.
На уровне здравого смысла – если последний слог одного слова сочетается с первым слогом другого так, что не выговоришь, то и слова такие вместе, друг за другом будут встречаться реже. Хотя по смыслу вроде слова подходят.
Банально, но правильно.
These ﬁndings reveal a variety of new and thrilling questions which will open
new paths for future research in network linguistics. Хотя, конечно, вот это уже навряд ли.
Discussion and Conclusion
The presented ﬁndings show that standard networkmeasures on isolated layers
exhibit no substantial diﬀerences across layers, only slight variations between
word and subword levels. Although, if we compare the structural diﬀerences
across the examined languages there are indications of diﬀerent principles in
their organization. For instance, English is characterized by higher clustering,
with the exception of the syllabic layer. TheEnglish syllabic layer has 54 compo-
nents, while Croatian has 17, which is reﬂected in the low clustering coeﬃcient
of English syllables. This is caused by high ﬂectivity of Croatian, where many
words share the suﬃx - the last syllable, which decreases the number of compo-
nents, and increases the clustering coeﬃcient. This observation raises a question,
which properties will the morpheme language subsystem expose during the in-
corporation into a multilayer language framework?
Evena standarddistributionanalysis is not suﬃcient to take adeeper insight
into themutual inﬂuencesbetweensubsystems of language.The (in-/out-)degree
andstrengthdistributions of theword-level layers areoverlappeddue to the same
word frequencies reﬂected from the same data source. Therefore, the standard
approach to study the structure of linguistic networks showed no discrepancies
among layers. However, the (in-/out-) selectivity values are potentially capable
of quantifyingdiﬀerences, namely to showthe potential of revealing the interplay
among the layers.
The inter layer degree and strength correlations suggest that CO-SHU layers
are more related than the CO-SIN, and SIN-SHU pairs, due to the preserving
Zipf’s lawduring shuﬄing  (reﬂecting theutilizationof the samedata source).
In-distributions for syntax layers in both languages have higher values than the16 Domagoj Margan, AnaMeˇstroviґc, SandaMartinˇciґc-Ipˇsiґc
corresponding out-distributions, and generally SIN is less inter correlated than
the CO and SHU layers. The inter and intra layer correlations in the multilayer
language network suggest the manifestation of diﬀerent governing principles in
the syntax structure of the examined languages. The interesting part is that this
is the ﬁrst observable indication of diﬀerences between languages manifested
in amultilayer analysis framework, which encouraged a deeper investigation. In
addition, the selectivitydistributions (regardless of side or layer or language) are
not correlated, supporting the potential of selectivity as a measure capable to
quantify structural diﬀerences across language subsystems. Moreover, Croatian
exhibits higher correlations then English in general.
The examination of the word-level layers overlap reveals additional insights
into the mutual interplay between the layers. The weighted overlap provides a
thorough insight into the intersection of links between network layers. It seems
thatWO is more appropriate to approximate the overlaps of layers in weighted
networks than the commonly employed Jaccardmeasure. As expected, CO-SIN
layers are more overlapped than shuﬄed pairs, and Croatian syntax is better
captured throughwords co-occurrences than the English. The preservedweights
on intersected links indicate that around 10% of the co-occurrence frequencies
are not consistent with overlapped syntax dependencies. The proposedmeasure
of preservedweighted overlap seems adequate to quantify the similarity of word-
level layers in weighted and directedmultilayer networks of language.
The subword layer’s analysis reveals that the syllabic layerplays an important
role in the manifestation of principles governing the construction of word layer,
which is diﬀerent for the examined languages. The graphemic layers, on the
other hand, share characteristics, which are reﬂections of the high density of the
graphemic networks (almost complete graphs in both languages).
Theobtainedmultilayered languageanalysis resultsmanifestdiﬀerentdriving
principles beneath the co-occurrence, shuﬄed, syntactic, syllabic and graphemic
layers, which was not obvious through the analysis of isolated layers. In order
to obtain deeper insight into these relations we utilize the analysis of motifs,
which reveal a close topological structure in the syntactic and syllabic layers of
both languages. The correlations of themotifs’ frequencies aremore emphasized
inCroatian. The triad signiﬁcance proﬁles (TSP) are correlated between syntax
and syllables regardless of the language, while English additionally exhibits a
correlation between co-occurrence and syntax layers. It seems that the observed
TSP correlations reﬂect the properties of the Croatian - the free word-order
which caused diﬀerent characterizations of the co-occurrence and syntax layers.
Moreover, the high ﬂectivity of Croatian is reﬂected in many suﬃxes realized
by syllables. Therefore, the structure of layers also reﬂects the morphological
properties inherent to the language, which should we examine more deeply in
Our ﬁndings are in line with previous observations in language networks
research. For instance, Ferrer i Cancho  reports that the amount of syntac-
tically incorrect links in co-occurrence networks can increase to a high of 70%,
and elaborates: ”About 90% of syntactic relationships take place at a distanceMultilayer Network of Language 17
lower or equal than two, but word co-occurrence networks lack a linguistically
precise deﬁnition of link and fail in capturing the characteristic long-distance
correlations of words in sentences.” This adequately explains the driving princi-
ple of the CO-SIN relationships which we have conﬁrmed in this research. Still,
an explanationof the linguistic grounding for the SIN-SYL relationships remains
an open challenge.
Our results strongly suggest that thereare somepropertieswhichare inherent
in the word-level layers and not for the subword layers; while some are inherent
in theword-subword relations.More precisely, it seems that syntax and syllables
exhibit inﬂuences of the same linguistic phenomena.
Conclusion. Inthis researchweuse themultilayernetworks framework toexplore
various language subsystems interactions. Multilayer networks are constructed
fromﬁve variations of the same original text: three on theword-level (syntax, co-
occurrence and its shuﬄed counterpart) and two on the subword level (syllables
and graphemes). The analysis and comparison of layers at word and subword
levels is employed in order to determine the mechanism of mutual interactions
between diﬀerent linguistic units.
The presented ﬁndings corroborate that the multilayer framework canmeet
the demands in expressing the complex structure of language.According to these
results one cannotice substantial diﬀerences between the networks’ structures of
diﬀerent language layers, which are hidden during the exploration of an isolated
layer, regardless of modeled language (e.g. Croatian or English). Therefore, it is
important to include all language layers simultaneously in order to capture all
language characteristics in the systematic exploration.
The multilayer network framework is a powerful, consistent and systematic
approach to model several linguistic subsystems simultaneously and to provide
a more general view on language. The word-level layers can be represented as
multiplex networks (the coupled links have 1:1 or 0:1 inter-connections), while
the connections between word and subword layers are not coupled (have N:M
inter-connections). Hence, deﬁning the uniﬁed theoretical model for the mul-
tilayer language networks is essential for further endeavors in the research of
These ﬁndings reveal a variety of newand thrilling questions which will open
new paths for future research in network linguistics. To conclude, we are at
the very beginning of an exciting and challenging pursuit. Hence, our future re-
search plans involve: exploring the relationships of other languages’ subsystems
(i.e. morphological, phonetic), deﬁning the theoretical model capable of captur-
ing all structural variations of language subsystems’ relationships and eventually
explain the governing principle of mutual interactions and conceptual universal-
ities in natural languages.