What this is

Word on the Street tracks how research attention in linguistics, applied linguistics, and language education shifts over time. It watches a set of about 200 named constructs (methods, theories, and objects of study) and measures how large a share of the field's published output each one accounts for, year by year. Months and years from now, you can see how the field actually moved over time.

Where the data comes from

Every count comes from OpenAlex, an open index of scholarly works whose metadata is released under a CC0 (public domain) licence. The corpus is the union of the two OpenAlex subfields that hold linguistics: Language and Linguistics (Arts and Humanities) and Linguistics and Language (Social Sciences). Both are included: the Arts and Humanities branch is several times larger, and scoping to one alone would silently drop most of the literature, including much of sociolinguistics, pragmatics, and language policy. In the last fully indexed year the corpus held about 75,000 works.

The constructs it tracks

The construct list builds itself. Most of it is OpenAlex's own topic keyphrases for the two linguistics subfields, after pruning the handful that are too broad to be useful. The rest is a small set of newer coinages the index is still catching up with, such as translanguaging, willingness to communicate, raciolinguistics, and ChatGPT. So OpenAlex and the field curate the vocabulary, and it grows on its own as the index adds keyphrases. In the dataset each construct carries a source of either openalex or curated so you can tell the two apart.

How trends are measured

Share of output. Each construct is measured as its share of all linguistics output that year: papers matching the construct divided by the corpus that year. The index grows every year, so a rising raw count often just reflects more indexing, while share stays comparable across time.
Matching. A paper counts toward a construct when the construct's phrase appears in its title or abstract (OpenAlex's title_and_abstract.search). A paper can match several constructs, so the shares do not sum to one.
Year over year, on the last complete year. The headline movements compare the most recent fully indexed year (2025) against the year before it. Ranking is by relative change in share, which is what lets a construct read as "up 224%".
The current year, by rank only. 2026 is shown as the lead, as an ordering of which constructs are gaining fastest so far. Its percentages stay unpublished, for the reason given in the next section.
Guards against false trends. A construct must clear a minimum yearly volume and keep moving the same direction across several consecutive years before it is treated as confirmed.

Why the current year is provisional

OpenAlex keeps adding records for a year well after that year ends, and a paper's abstract often arrives later still. Because a construct is matched on title and abstract text, the part of a year that is already indexed is richer in matchable text than the year as a whole will be. The visible effect is that, in the current year, almost every construct's share is inflated by a similar factor at once, so raw year-to-date growth would read as "everything rising" with no real losers. That is an artifact of indexing.

Two design choices follow. The firm percentages and the chart use only complete years, where the distortion has largely settled. The current year is summarised by rank, since the order of the fastest risers survives a distortion that lifts everything together. As OpenAlex finishes indexing, the recent years firm up.

What it covers, and what it misses

This is an observatory of the indexed, mostly-English, journal-article slice of the field. The main blind spots:

Books and edited volumes are under-represented, and they carry real theoretical weight here.
Non-English scholarship (for example Francophone, Hispanophone, and Sinophone work) is under-indexed.
Conference proceedings and grey literature are patchy; the computational-language slice lives largely on preprint servers and is only partially captured.
A meaningful share of records lack abstracts, and automatic language labels are imperfect.

These are trends in the indexed journal literature.

Underserved niches

The "underserved niches" on the front page flag constructs whose recent papers are cited at well above the field's rate while the literature on them is still thin. Citations build with age, so a naive reading would just track citation lag; instead each construct's early-citation rate (the share of its papers in a settled three-year cohort, 2022 to 2024, that have reached at least five citations) is divided by the field's own rate, which cancels that lag, and a niche is flagged only when that ratio is high and the cohort is still small.

The dataset

The full series is downloadable as one long-format CSV: one row per construct per year, every year on record. The latest year is the partial current year, flagged provisional above; the rest are complete years.

Column	Type	Meaning
year	integer	Publication year of the window.
construct_id	string	Stable identifier for the construct.
construct	string	Human-readable construct label.
source	string	openalex (a pruned topic keyphrase) or curated (a newer coinage added by hand).
papers	integer	Works that year whose title or abstract matched the construct.
corpus_papers	integer	All linguistics works indexed for that year (the denominator).
share	float	papers / corpus_papers: the construct's share of the field that year.

Construct series (CSV)

Every construct, every year on record, in one tidy file.

Download constructs.csv

Licence: the underlying metadata is OpenAlex, released under CC0. This derived dataset is shared under the same terms; please credit OpenAlex and link back to this site.

How it runs

No language model is involved at any stage; every figure is computed directly from the metadata. The paper lists carry titles, authors, venues, and links only; abstract text is never stored or shown. Each title links to the paper's DOI, with a separate open-access link where a free copy exists.

ما هذا الموقع

يتتبّع «حديث الساعة» كيف يتحوّل اهتمام البحث في اللسانيات واللسانيات التطبيقية وتعليم اللغات عبر الزمن. يرصد مجموعة من نحو 200 مفهوم (مناهج ونظريات وموضوعات دراسة) ويقيس حصّة كلٍّ منها من إنتاج المجال المنشور، عامًا بعد عام. وبعد أشهر وسنوات يمكنك أن ترى كيف تحرّك المجال فعلًا عبر الزمن.

مصدر البيانات

كل عدّ مستمدّ من OpenAlex، وهو فهرس مفتوح للأعمال العلمية تُنشَر بياناته الوصفية برخصة CC0 (الملكية العامة). والمدوّنة هي اتحاد حقلَي اللسانيات الفرعيين في OpenAlex: «اللغة واللسانيات» (الآداب والعلوم الإنسانية) و«اللسانيات واللغة» (العلوم الاجتماعية). وكلاهما مشمول: فرع الآداب والعلوم الإنسانية أكبر بأضعاف، وقصر النطاق على واحد وحده يُسقِط بصمت معظم الأدبيات، ومنها كثير من اللسانيات الاجتماعية والتداولية والسياسة اللغوية. وفي آخر عام مكتمل الفهرسة ضمّت المدوّنة نحو 75,000 عمل.

المفاهيم التي يتتبّعها

تبني قائمة المفاهيم نفسها بنفسها. معظمها عبارات موضوعية من OpenAlex نفسه لحقلَي اللسانيات الفرعيين، بعد تنقية القلّة المفرطة في العموم. وبقيّتها مجموعة صغيرة من المصطلحات الأحدث التي ما يزال الفهرس يلحق بها، مثل العبور اللغوي والرغبة في التواصل واللسانيات العِرقية وChatGPT. فينظّم OpenAlex والمجالُ المفرداتِ، وتنمو من تلقاء نفسها كلما أضاف الفهرس عبارات جديدة. ويحمل كل مفهوم في مجموعة البيانات حقلَ source بقيمة openalex أو curated لتمييز النوعين.

كيف تُقاس الاتجاهات

الحصّة من الإنتاج. يُقاس كل مفهوم بحصّته من إنتاج اللسانيات في ذلك العام: الأوراق المطابِقة للمفهوم مقسومةً على المدوّنة في ذلك العام. الفهرس ينمو كل عام، فالعدّ الخام المتزايد كثيرًا ما يعكس فهرسةً أكثر، والحصّة تبقى قابلة للمقارنة عبر الزمن.
المطابقة. تُحسَب الورقة ضمن مفهوم حين تظهر عبارته في عنوانها أو ملخّصها (خاصّية title_and_abstract.search في OpenAlex). وقد تطابق الورقة عدّة مفاهيم، فالحصص لا تجمع إلى واحد.
تغيّر سنوي على آخر عام مكتمل. تقارن التحرّكات الرئيسة آخرَ عام مكتمل الفهرسة (2025) بالعام السابق له. والترتيب بحسب التغيّر النسبي في الحصّة، وهو ما يجعل مفهومًا يُقرأ بـ«ارتفاع 224%».
العام الجاري بالترتيب فقط. يظهر 2026 في الصدارة، بوصفه ترتيبًا للمفاهيم الأسرع صعودًا حتى الآن. وتبقى نسبه غير منشورة، للسبب المذكور في القسم التالي.
حواجز ضدّ الاتجاهات الزائفة. على المفهوم أن يتجاوز حدًّا أدنى من الحجم السنوي وأن يستمرّ في الاتجاه نفسه عبر عدّة أعوام متتالية قبل أن يُعَدّ مؤكَّدًا.

لماذا العام الجاري أوّليّ

يواصل OpenAlex إضافة سجلّات لعامٍ ما بعد انتهائه بوقت طويل، وكثيرًا ما يصل ملخّص الورقة متأخّرًا أكثر. ولأن المفهوم يُطابَق على نصّ العنوان والملخّص، فإن الجزء المفهرَس من العام يكون أغنى بالنصّ القابل للمطابقة ممّا سيكون عليه العام كاملًا. والأثر الظاهر أن حصّة كل مفهوم تقريبًا في العام الجاري تنتفخ بعامل متقارب دفعةً واحدة، فيُقرأ النموّ الخام حتى تاريخه ارتفاعًا شاملًا بلا خاسرين. وذلك أثرٌ من الفهرسة.

ويترتّب على ذلك خياران في التصميم. تَستعمل النسبُ المؤكَّدة والرسمُ البياني الأعوامَ المكتملة وحدها، حيث استقرّ التشويه إلى حدّ كبير. ويُلخَّص العام الجاري بالترتيب، لأن ترتيب الأسرع صعودًا يصمد أمام تشويهٍ يرفع كل شيء معًا. وكلما أتمّ OpenAlex الفهرسة ترسّخت الأعوام الأخيرة.

ما يغطّيه وما يفوته

هذا مرصدٌ للشريحة المفهرَسة، الغالبة الإنجليزية، من مقالات الدوريات في المجال. وأبرز نقاط العمى:

الكتب والأعمال المحرَّرة مُمثَّلة تمثيلًا ناقصًا، وهي تحمل ثقلًا نظريًا حقيقيًا هنا.
البحث بغير الإنجليزية (مثل الأعمال الفرنسية والإسبانية والصينية) مفهرَس فهرسةً ناقصة.
وقائع المؤتمرات والأدبيات الرمادية متفاوتة، وشريحة اللسانيات الحاسوبية تعيش غالبًا على خوادم المسودّات وتُلتقَط جزئيًا.
نسبة معتبرة من السجلّات تخلو من الملخّصات، ووسوم اللغة الآلية غير دقيقة.

وهذه اتجاهات في الأدبيات الدورية المفهرَسة.

الثغرات البحثية

تُبرِز «الثغرات البحثية» في الصفحة الأولى مفاهيمَ تُقتبَس أوراقها الحديثة فوق معدّل المجال بكثير بينما ما يزال ما كُتب عنها قليلًا. والاقتباسات تتراكم مع العمر، فالقراءة الساذجة تتعقّب تأخّر الاقتباس وحده؛ ولذلك يُقسَم معدّل الاقتباس المبكّر لكل مفهوم (حصّة أوراقه التي بلغت خمسة اقتباسات على الأقلّ ضمن فترة مستقرّة من ثلاث سنوات، 2022 إلى 2024) على معدّل المجال نفسه، فيُلغى ذلك التأخّر، وتُعلَّم الثغرة حين تكون تلك النسبة مرتفعة والفترة لا تزال صغيرة.

مجموعة البيانات

السلسلة الكاملة متاحة للتنزيل في ملفّ CSV واحد بالتنسيق الطولي: صفّ لكل مفهوم في كل عام، لكل عام مسجَّل. وآخر عام هو العام الجاري الجزئي، وهو مُعلَّم أوّليًّا أعلاه، وما عداه أعوام مكتملة.

العمود	النوع	المعنى
year	عدد صحيح	سنة نشر النافذة.
construct_id	نصّ	مُعرِّف ثابت للمفهوم.
construct	نصّ	تسمية المفهوم المقروءة.
source	نصّ	openalex (عبارة موضوعية منقّاة) أو curated (مصطلح أحدث مُضاف يدويًا).
papers	عدد صحيح	أعمال ذلك العام التي طابق عنوانها أو ملخّصها المفهوم.
corpus_papers	عدد صحيح	كل أعمال اللسانيات المفهرَسة لذلك العام (المقام).
share	عدد عشري	papers / corpus_papers: حصّة المفهوم من المجال في ذلك العام.

سلسلة المفاهيم (CSV)

كل مفهوم، لكل عام مسجَّل، في ملفّ واحد مرتَّب.

تنزيل constructs.csv

الترخيص: البيانات الوصفية الأساسية من OpenAlex، منشورة برخصة CC0. وتُشارَك مجموعة البيانات المشتقّة بالشروط نفسها؛ يُرجى نسبة الفضل إلى OpenAlex ووضع رابط إلى هذا الموقع.

كيف يعمل

لا يتدخّل أيّ نموذج لغوي في أيّ مرحلة، فكل رقم محسوب مباشرة من البيانات الوصفية. وتحمل قوائم الأوراق العناوين والمؤلّفين والأوعية والروابط فقط، ولا يُخزَّن نصّ الملخّص أو يُعرَض. ويرتبط كل عنوان بمُعرِّف DOI للورقة، مع رابط وصول مفتوح منفصل حيثما توفّرت نسخة مجانية.