Citation Export
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Oh, Yoon Mi | - |
dc.contributor.author | Pellegrino, François | - |
dc.date.issued | 2023-09-22 | - |
dc.identifier.issn | 1569-9978 | - |
dc.identifier.uri | https://aurora.ajou.ac.kr/handle/2018.oak/33731 | - |
dc.identifier.uri | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85174411875&origin=inward | - |
dc.description.abstract | There is high hope that corpus-based approaches to language complexity will contribute to explaining linguistic diversity. Several complexity indices have consequently been proposed to compare different aspects among languages, especially in phonology and morphology. However, their robustness against changes in corpus size and content hasn’t been systematically assessed, thus impeding comparability between studies. Here, we systematically test the robustness of four complexity indices estimated from raw texts and either routinely utilized in crosslinguistic studies (Type-Token Ratio and word-level Entropy) or more recently proposed (Word Information Density and Lexical Diversity). Our results on 47 languages strongly suggest that traditional indices are more prone to fluctuation than the newer ones. Additionally, we confirm with Word Information Density the existence of a cross-linguistic trade-off between word-internal and across-word distributions of information. Finally, we implement a proof of concept suggesting that modern deep-learning language models can improve the comparability across languages with non-parallel datasets. | - |
dc.description.sponsorship | Yoon Mi Oh was funded by Ajou University ( S-2019-G0001-00088 ). | - |
dc.language.iso | eng | - |
dc.publisher | John Benjamins Publishing Company | - |
dc.title | Towards robust complexity indices in linguistic typology A corpus-based assessment | - |
dc.type | Article | - |
dc.citation.endPage | 829 | - |
dc.citation.number | 4 | - |
dc.citation.startPage | 789 | - |
dc.citation.title | Studies in Language | - |
dc.citation.volume | 47 | - |
dc.identifier.bibliographicCitation | Studies in Language, Vol.47 No.4, pp.789-829 | - |
dc.identifier.doi | 10.1075/sl.22034.oh | - |
dc.identifier.scopusid | 2-s2.0-85174411875 | - |
dc.identifier.url | http://www.ingentaconnect.com/content/jbp/sl | - |
dc.subject.keyword | complexity metric robustness | - |
dc.subject.keyword | complexity trade-off | - |
dc.subject.keyword | linguistic typology | - |
dc.subject.keyword | morphological complexity | - |
dc.subject.keyword | non-parallel corpus | - |
dc.type.other | Article | - |
dc.identifier.pissn | 0378-4177 | - |
dc.description.isoa | true | - |
dc.subject.subarea | Language and Linguistics | - |
dc.subject.subarea | Communication | - |
dc.subject.subarea | Linguistics and Language | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.