Lexical similarity

10/28/2022

It is particularly surprising that Neko and Nekgini are on average only 61 percent similar, since they are both in the Gusap-Mot language family and many Neko and Nekgini speakers report that they are actually the same language. As a resource for further research that may require dates of known level of accuracy, we offer a list of ASJP time depths for nearly all the world’s recognized language families and for many subfamilies. Lexical similarity, both between languages and between villages within each language, is surprisingly low.

There are different ways to define the lexical similarity and the results vary accordingly. Lexical based model does not capture the actual meaning behind the words. Various short text similarity approaches have been proposed which are based on lexical matching, semantic knowledge background or combining models. A lexical similarity of 1 (or 100) would mean a total overlap between vocabularies, whereas 0 means there are no common words. Short text similarity deals with determining the closeness of two text mean the same thing by lexical or semantic. The discrepancies between estimated and calibration dates are found to be on average 29% as large as the estimated dates themselves, a figure that does not differ significantly among language families. In linguistics, lexical similarity is a measure of the degree to which the word sets of two given languages are similar. Automated judgments of lexical similarity for groups of related languages are calibrated with historical, epigraphic, and archaeological divergence dates for 52 language groups. CiteSeerX - Document Details (Isaac Councill, Lee Giles, Pradeep Teregowda): This paper presents a new approach for measuring semantic similarity/distance between words and concepts. The method, developed by the Automated Similarity Judgment Program (ASJP) consortium, is different from glottochronology in four major respects: (1) it is automated and thus is more objective, (2) it applies a uniform analytical approach to a single database of worldwide languages, (3) it is based on lexical similarity as determined from Levenshtein (edit) distances rather than on cognate percentages, and (4) it provides a formula for date calculation that mathematically recognizes the lexical heterogeneity of individual languages, including parent languages just before their breakup into daughter languages. This paper describes a computerized alternative to glottochronology for estimating elapsed time since parent languages diverged into daughter languages.

Automated Dating of the World’s Language Families Based on Lexical Similarity. Holman, Eric W Brown, Cecil H Wichmann, Søren Müller, André Velupillai, Viveka Hammarström, Harald Sauppe, Sebastian Jung, Hagen Bakker, Dik Brown, Pamela Belyaev, Oleg Urban, Matthias Mailhammer, Robert List, Johann-Mattis Egorov, Dmitry (2011). Lexical Similarity provides a measure of the similarity of two texts based on the intersection of the word sets of same or different languages.

0 Comments

Lexical similarity

Leave a Reply.

Author

Archives

Categories