查看原文
其他

刊讯|SSCI 期刊《计量语言学》2022年第3-4期

六万学者关注了→ 语言学心得 2024-02-19


好书推荐|《现代汉语合偶词研究》(留言赠书)

2023-04-18

刊讯|SSCI 期刊《语言学习与技术》2022年第2期

2023-04-17

刊讯|SSCI 期刊 RELC Journal 2022年第1-3期

2023-04-11

Journal of Quantitative Linguistics

Volume 29, Issue 3-4, 2022

Journal of Quantitative Linguistics(SSCI三区,2021 IF:0.761)2022年第3-4期共刊文12篇。其中,2022年第3期共发研究性论文5篇。研究论文涉及宗教术语提取、词源学中的语音概率估计、文本分类实验、模糊分析等。2022年第4期共发研究性论文7篇研究论文涉及疑问副词的指示性/虚拟语气交替、第二语言习得、文本类型、门泽拉特-阿尔特曼定律等。欢迎转发扩散!(2022年已更完)

往期推荐:

刊讯|SSCI 期刊《计量语言学》2022年第1-2期

目录


ARTICLES

Issue3

■ Revisiting Keyword Analysis in a Specialized Corpus: Religious Terminology Extraction, by Lien Hsin Yi, Pages 269–282.

■ Estimating Phonetic Probability in Etymology, by Stachowski Kamil, Pages 339–349.

■ Diachronic Distribution of Elemental Ordering in English , by Zhou Jiangping; Gao Yanmei, Pages 350–373.

■ Experiments in Text Classification: Analyzing the Sentiment of Electronic Product Reviews in Greek, by Bilianos Dimitris, Pages 374–386.

■ Derivational Suffix Productivity in Persian: A Fuzzy Analysis, by Aftabi Seyyedeh Zohreh; Ahangar Abbas Ali; Mishmast Nehi Hassan, Pages 387–411.


Issue4

■ Why Do Parameter Values in the Zipf-Mandelbrot Distribution Sometimes Explode?, by Mačutek Ján, Pages 413–424.

Interactive Heatmaps as an Improved Means of Analysing Complex Socio-dialectal Patterns: German Loans in Silesian,  by Zheng Lukun; Zheng Huiqiang; Kundu Chandra, Pages 425–449.

■ The Indicative/subjunctive Mood Alternation with Adverbs of Doubt in Spanish, by Hirota Harunobu, Pages 450–464.

■ A Zipfian Approach to Words in Contexts: The Cases of Modern English and Chinese, by Cong Jin, Pages 465–484.

■ Dependency Distance and Its Probability Distribution: Are They the Universals for Measuring Second Language Learners’ Language Proficiency?, by Hao Yuxin; Wang Xuelin; Lin Yanni, Pages 485–509.

■ Syntactic Complexity of Different Text Types: From the Perspective of Dependency Distance Both Linearly and Hierarchically, by Chen Ruina; Deng Sirui; Liu Haitao, Pages 510–540.

■ Menzerath-Altmann Law in Consecutive and Simultaneous Interpreting: Insights into Varied Cognitive Processes and Load

by Jiang Xinlei; Jiang Yue, Pages 541–559.

摘要

Revisiting Keyword Analysis in a Specialized Corpus: Religious Terminology Extraction

Lien Hsin Yi, Graduate School of Education, Ming Chuan University, Taoyuan District, Taiwan

Abstract This study investigates keyword extraction using a compiled Buddhist corpus. It sets out the fundamental mode of generation and refinement of keywords with statistical measures and manual screening with specific criteria. The Buddhist Word List contains 1244 keywords with 375 Pali words in Buddhist literacy. We compared the results of applying occurring frequency, log-likelihood (LL), and odds ratio (OR) in keyword analyses, each of which resulted in different keyword rankings. Our results show that statistical measures are useful for the identification of particular keywords in specific fields and OR is more effective in identifying technical terms. We demonstrate that multilevel keyword analysis is more effective at the identification of high-frequency technical words than either of these methods used alone. Multilevel methods are recommended for the creation of future domain-specific vocabulary lists to overcome the inherent flaws of individual analytic methods.


Estimating Phonetic Probability in Etymology 

Stachowski Kamil, Department of Applied Linguistics and Communication, Birkbeck, University of London, London, UK

Abstract An etymological proposition is often said to be probable or improbable from the phonetic point of view, and it is not rare for opinions to diverge on which it is. The estimation is typically purely intuitive, based on perceived similarity and no more than a handful of analogous examples. This paper proposes a method for quantifying the phonetic probability of an etymology and comparing it to the alternative hypothesis. It is intended to be used with sizable datasets, to produce a well-supported, objective verdict.


Diachronic Distribution of Elemental Ordering in English 

Zhou Jiangping; Gao Yanmei, School of Foreign Languages, China West Normal University, Nanchong, China; School of Foreign Languages, Peking University, Beijing, China

Abstract English elemental ordering in a non-canonical word order incorporates preposing, postposing and elemental reversal. This paper intends to explore how these types of elemental ordering are distributed during the last two centuries by employing the Corpus of Historical American English or COHA. The findings demonstrate that preposing has been increasing apparently but still in its inceptive phase; postposing, subsuming existential there and presentational there, generally keeps plateauing with existential there dominating presentational there; and elemental reversal experiences a trend of gradual decreasing, which is attributed to the dominance in occurrences of passive with a by phrase construction over those of inversion. This research provides us with a corpus linguistic insight of exploring elemental ordering, which distinguishes itself from other researches focusing on information status.


Experiments in Text Classification: Analyzing the Sentiment of Electronic Product Reviews in Greek

Bilianos Dimitris, Department of Italian Language and Literature, National and Kapodistrian University of Athens, Athens, Greece


Abstract Sentiment analysis, which deals with people’s sentiments as they appear in the growing amount of online social data, has been on the rise in the past few years. In its simplest form, sentiment analysis deals with the polarity of a given text, i.e., whether the opinion expressed in it is positive or negative. Sentiment analysis, or opinion mining applications on websites and the social media range from product reviews and brand reception to political issues and the stock market. The vast majority of the research in sentiment analysis has mostly dealt with English data, where there’s an abundance of readily available and annotated for sentiment corpora. With a few notable exceptions, the research in other minor languages such as Greek is lacking. This paper deals with sentiment analysis of electronic product reviews written in Greek. To this end, a small dataset of 480 positive and negative reviews is compiled and used, taken from the popular Greek e-commerce website, www.skroutz.gr. Different computational models for training and testing the dataset are evaluated, ranging from simple Naive Bayes with n-gram features to state-of-the-art BERT. The results look very promising for such a small corpus.




Derivational Suffix Productivity in Persian: A Fuzzy Analysis

Aftabi Seyyedeh Zohreh; Ahangar Abbas Ali; Mishmast Nehi Hassan, Department of English Language and Literature, Faculty of Literature and Humanities, University of Sistan and Baluchestan, Zahedan, Iran; Department of Mathematics and Its Applications, University of Sistan and Baluchestan, Zahedan, Iran

Abstract The main aim of this article is to introduce a new way of dealing with the vague concept of suffix productivity in Persian. This approach, that is fuzzy set theory, gives each suffix a degree of membership from [0,1] to different productivity categories. To estimate morphological productivity of Persian suffixes, first Baayen’s proposed measures, i.e. realized productivity, expanding productivity and potential productivity were applied to Bijankhan corpus. Correspondingly, 2.6 million words in the corpus were investigated and analysed using MATLAB and Microsoft Excel software. In the next step, the results of the three productivity measures were illustrated on separate fuzzy diagrams. The findings showed that the three measures employed could give a broader view of different aspects of derivational suffix productivity in Persian. Using fuzzy set theory makes it possible for a given suffix to belong simultaneously to different categories with different degrees of membership. According to the statistics of this research, suffixes – i and – e in Persian had the highest degrees of membership among the most productive suffixes up to now. Likewise, they continue to contribute the most to the growth rate of the contemporary Persian lexicon.The main aim of this article is to introduce a new way of dealing with the vague concept of suffix productivity in Persian. This approach, that is fuzzy set theory, gives each suffix a degree of membership from [0,1] to different productivity categories. To estimate morphological productivity of Persian suffixes, first Baayen’s proposed measures, i.e. realized productivity, expanding productivity and potential productivity were applied to Bijankhan corpus. Correspondingly, 2.6 million words in the corpus were investigated and analysed using MATLAB and Microsoft Excel software. In the next step, the results of the three productivity measures were illustrated on separate fuzzy diagrams. The findings showed that the three measures employed could give a broader view of different aspects of derivational suffix productivity in Persian. Using fuzzy set theory makes it possible for a given suffix to belong simultaneously to different categories with different degrees of membership. According to the statistics of this research, suffixes – i and – e in Persian had the highest degrees of membership among the most productive suffixes up to now. Likewise, they continue to contribute the most to the growth rate of the contemporary Persian lexicon.



Why Do Parameter Values in the Zipf-Mandelbrot Distribution Sometimes Explode?

Mačutek Ján, Mathematical Institute, Slovak Academy of Sciences, Bratislava, Slovakia; Department of Mathematics, Faculty of Natural Sciences, Constantine the Philosopher University in Nitra, Nitra, Slovakia

Abstract The Zipf-Mandelbrot distribution serves as a mathematical model for ranked frequencies in many areas of scientific research, including linguistics. Many linguistic units, like e.g., words or word n-grams, follow this distribution. However, in some cases, such as for graphemes in linguistics or species abundance and diversity data in biology, the parameters of the Zipf-Mandelbrot distribution are virtually uninterpretable, as their values strongly depend on the precision of numerical methods used to estimate them (values from several tens to several hundreds are not uncommon). It is shown in the paper that these values can be explained by the convergence to the geometric distribution, which forces both parameters of the Zipf-Mandelbrot distribution to increase to infinity while their ratio converges to a constant. Some examples which illustrate this limit behaviour are presented.



Interactive Heatmaps as an Improved Means of Analysing Complex Socio-dialectal Patterns: German Loans in Silesian 

Fekete István; Hentschel Gerd, School of Linguistics and Cultural Studies, Institute of Slavic Studies, University of Oldenburg, Oldenburg, Germany

Abstract This paper presents an application of interactive cluster heatmaps in sociolinguistics, a method hitherto scarcely employed in the field. To that end, we developed a statistical workflow to illustrate the method and analyse large-scale Silesian questionnaire data. In our quantitative-linguistic study we demonstrate how heatmaps can uncover information about complex patterns of regional variation, thereby highlighting the added value of the method relative to standard statistical procedures. Specifically, we show (i) how differences in language use between two regions can be determined by deploying the heatmap method but not with traditional significance/hypothesis testing statistical procedures or summary statistics, (ii) and how differences in cohesion and tightness in clusters can be examined via heatmaps but not using cluster analysis. We conclude that heatmaps are a valuable tool for assessing why and how certain word-items group together because of the regional distribution of their usage. A major advantage of the heatmap method is that it can handle two dimensions with hundreds of instantiations and illustrate their interrelations, which would pose problems for traditional statistical techniques. Heatmaps provide a novel and accessible way of exploring large-scale sociolinguistic data in their entirety and of generating further hypotheses.


The Indicative/subjunctive Mood Alternation with Adverbs of Doubt in Spanish

Hirota Harunobu, Graduate School of Global Studies, Tokyo University of Foreign Studies, Fuchu-shi, Japan

Abstract This study aims to analyse the indicative/subjunctive mood alternation in Spanish sentences with adverbs of doubt (acaso, posiblemente, probablemente, quizá, quizás, tal vez, seguramente, a lo mejor, igual). To this end, this study statistically analysed the linguistic and social factors conditioning the mood alternation in sentences with adverbs of doubt. A total of 1278 tokens were analysed. Each datum was annotated with verb type, verb aspect, verb person, distance between the adverb and the verb, sex, age, region, and education level. To exclude confounding factors, multivariable logistic regression was conducted, and the analysis yielded significant odds ratios (ORs) for 10 items, including sex, region, education level, adverbs (posiblemente, probablemente, quizá, quizás, tal vez), aspect, and distance between the verb and the adverb. These results show that these adverbs can be divided into two groups, where posiblemente, probablemente, quizá, quizás, and tal vez are more likely to co-occur with the subjunctive than the adverbs acaso, seguramente, a lo mejor, and igual. Furthermore, this study has shown that each adverb differs in the likelihood of co-occurring with the subjunctive, and that social factors of speakers affect the mood selection. Thus, an analysis of mood alternations should include social and linguistic factors.


A Zipfian Approach to Words in Contexts: The Cases of Modern English and Chinese

Cong Jin, School of Foreign Languages, Ludong University, Yantai, China

Abstract The system-level complexity of language has been thoroughly investigated in terms of Zipf’s law, whose quantitative features have proved to reflect text/language typology. This study extends the scope of Zipf’s law from the macroscopic scale of language to specific words in contexts, with the aim of examining its potential as an indicator of word typology. The focus is confined to the high-frequency words in English and Chinese as found in the FLOB and LCMC corpora. It has been found that the log–log rank-frequency distributions of contextual words of the words in question generally abide by the linear function y = ax+b. Moreover, it has been shown that an adjusted version of parameter a can help to distinguish the words in question’s classes. The contextual information as reflected by this Zipf-based index might be more important to the emergence of word classes of Chinese, which has no real inflection as a word-class indicator. From a Zipfian approach, the findings have preliminarily approved Saussure’s systems thinking regarding linguistic signs. Meanwhile, they may also contribute to such fields as usage-based linguistics.


Dependency Distance and Its Probability Distribution: Are They the Universals for Measuring Second Language Learners’ Language Proficiency?

Hao Yuxin; Wang Xuelin; Lin Yanni, Institute of Chinese Language and Culture Education, Huaqiao University, Xiamen, China; College of Foreign Studies, Guangxi Normal University, Guilin, China

Abstract Previous studies have shown that dependency distance and its probability distribution can be applied as syntactic indicators of English as interlanguage. However, the universal application of these indicators has not been verified from the perspective of language typology. The issues are addressed in the present study based on a treebank of Chinese interlanguage of English and Japanese native speakers. The findings are as follows: (1) with the improvement of L2 proficiency, the MDDs of learners with different native language backgrounds gradually approach that of the target language in different patterns, and dependency distance is of universal significance as a metric to measure the development of interlanguage’s syntactic complexity; (2) Chinese interlanguage also follows the principle of least effort, and its probability distribution of dependency distance, like those of natural languages, presents a power–law distribution, which can successfully fit the Zipf-Alekseev distribution; (3) the right truncated modified Zipf-Alekseev distribution can be used to measure Chinese interlanguage proficiency, and the fitting parameters of the probability distribution of dependency distance as a metric of interlanguage proficiency are also of universal value.


Syntactic Complexity of Different Text Types: From the Perspective of Dependency Distance Both Linearly and Hierarchically

Chen Ruina; Deng Sirui; Liu Haitao, College of Foreign Languages, Guizhou University, Guiyang, China; Centre for Linguistics and Applied Linguistics, Guangdong University of Foreign Studies, Guangzhou, China; Department of Linguistics, Zhejiang University, Hangzhou, China

Abstract Dependency distance (DD) is a well-established measure of syntactic complexity. Previous studies largely focused on the linear dimension, mostly by mean of dependency distance (MDD). In the present study, a new quantitative indicator –mean hierarchical dependency distance (MHDD), is proposed to discuss DD-related issues. Combining MHDD and MDD, the study investigates syntactic complexity of different texts, using strictly length-controlled sentences of 12 text types from the Freiburg-Brown corpus of American English. Correlations of MHDD and MDD have been identified, and possible reasons are discussed from the mathematical and theoretical perspectives.Mathematically, one is that the numerator of MHDD overlaps with the denominator of MDD, both being (n-1) where n is the number of words in the sentence. The other is that the denominator of MHDD (maximum hierarchical layer: MAXHL) and the numerator of MDD (sum of DD: SOD), are positively correlated. We believe that it is the positive correlation of SOD and MAXHL that ensures the change of MDD and MHDD in the same direction. It is also worth noting that both MAXHL and SOD seem to be minimized at their respective data spectrum, which foreshadows the dependency distance minimization (DDM) tendency on the hierarchical dimension.




Menzerath-Altmann Law in Consecutive and Simultaneous Interpreting: Insights into Varied Cognitive Processes and Load

Jiang Xinlei; Jiang Yue, School of Foreign Studies, Xi’an Jiaotong University, Xi’an, China; Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China

Abstract Notwithstanding theoretical simulations of distinctive cognitive processes and load of consecutive (CI) and simultaneous interpreting (SI), quantitative linguistic inquiry into their outputs is needed for solid empirical evidence. As a fundamental law of quantitative linguistics, Menzerath–Altmann Law (MAL) mirrors the economic processing of linguistic information and complex dynamic language system. Given its extensive validation at various linguistic levels and predictive power of its parameters in register, language and authorship differentiation, MAL is worthy of being applied to interpreting studies. We endeavour to investigate whether interpreted languages follow the MAL and reveal varied cognitive load of CI versus SI, as manifested by different MAL fitting models. Results show that (1) both CI and SI outputs follow the MAL; (2) SI processing involves more diversified structural information and shows a greater tendency of shortening the clauses of a sentence with increased sentence length, than CI processing, expressed by significantly higher a and lower b in SI models than that in CI models. Our findings suggest the disparate language representations are shaped by cognitive capacity limitations and interpreting modalities, and reveal how language system dynamically re-regulates and reorganizes the linguistic information to accommodate environmental settings from the perspective of synergetic linguistics.



期刊简介

The Journal of Quantitative Linguistics is interested in work which systematically applies or develops mathematical and/or statistical concepts and methods to theoretical understanding of language phenomena. This covers the range of synchronic and diachronic subdomains of linguistic theory, including contemporary and historical linguistics, sociolinguistics and dialectology, and cognitive, neural, and psycholinguistics as well as the various levels of analysis from phonetics through phonology, morphology, syntax, semantics, and pragmatics. The introduction of mathematical and statistical concepts and methods from the natural sciences, economics, and cognitive science is particularly encouraged, as is philosophical reflection on the relationship of quantitative linguistics as here understood to these other sciences.

《计量语言学》杂志关注系统应用或发展数学、统计学概念和方法,从理论层面理解语言现象的工作。杂志涵盖了语言学理论的历时和共时子领域,包括当代和历史语言学、社会语言学和方言学、认知、神经和心理语言学,以及从语音学到音系学、形态学、句法学、语义学和语用学的各个分析层面。杂志特别鼓励引入自然科学、经济学和认知科学中的数学和统计学概念和方法,以及对所理解的计量语言学与其他科学的关系进行哲学层面思考。


官网地址:

https://www.tandfonline.com/journals/njql20

本文来源:Journal of Quantitative Linguistics官网





课程推荐




好书推荐|《现代汉语合偶词研究》(留言赠书)

2023-04-18

刊讯|CSSCI 期刊《长江学术》2022年刊文(语言学)

2023-04-18

刊讯|SSCI 期刊《语言学习与技术》2022年第2期

2023-04-17

刊讯|《汉语国际教育学报》2022年第12辑

2023-04-16

刊讯|SSCI 期刊《语言》2022年第3-4期

2023-04-15

刊讯|《浙江大学学报(人社版)》2022年刊文(语言学)

2023-04-13

刊讯|SSCI 期刊 RELC Journal 2022年第1-3期

2023-04-11

刊讯|SSCI 期刊《心智与语言》2022年第3-5期

2023-04-10

刊讯|《古汉语研究》2022年第4期&2023年第1期

2023-04-08

刊讯|《语言文字应用》2023年第1期

2023-04-07

刊讯|《语言教育》2023年第1期

2023-04-05

刊讯|《世界汉语教学》2023年第2期

2023-04-04

刊讯|《海外华文教育》2022年第4期

2023-04-03

刊讯|SSCI 期刊 《多语与多元文化发展》2022年第7-10期

2023-04-02

刊讯|《汉语学习》2023年第1期

2023-04-01


欢迎加入
“语言学心得交流分享群”“语言学考博/考研/保研交流群”


请添加“心得君”入群

今日小编:讷  言

审   核:心得小蔓

转载&合作请联系

"心得君"

微信:xindejun_yyxxd

点击“阅读原文”可跳转下载

继续滑动看下一个

刊讯|SSCI 期刊《计量语言学》2022年第3-4期

六万学者关注了→ 语言学心得
向上滑动看下一个

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存