What this is
Word on the Street tracks how research attention in linguistics, applied linguistics, and language education shifts over time. It watches a set of about 200 named constructs (methods, theories, and objects of study) and measures how large a share of the field's published output each one accounts for, year by year. The site rebuilds itself automatically every week, and every run is committed to a public repository, so the record only grows: months and years from now you can see how the field actually moved, not just where it stands today.
Where the data comes from
Every count comes from OpenAlex, an open index of scholarly works whose metadata is released under a CC0 (public domain) licence. The corpus is the union of the two OpenAlex subfields that hold linguistics: Language and Linguistics (Arts and Humanities) and Linguistics and Language (Social Sciences). Both are included: the Arts and Humanities branch is several times larger, and scoping to one alone would silently drop most of the literature, including much of sociolinguistics, pragmatics, and language policy. In the last fully indexed year the corpus held about 75,000 works.
The constructs it tracks
The list of constructs is not hand-curated week to week. Most of it is OpenAlex's own topic keyphrases for the two linguistics subfields, after pruning the handful that are too broad to be useful. The remainder is a small set of newer coinages the index is still catching up with, such as translanguaging, willingness to communicate, raciolinguistics, and ChatGPT. So the vocabulary is curated by OpenAlex and the field rather than by one person's taste, and it grows on its own as the index adds keyphrases. In the dataset each construct carries a source of either openalex or curated so you can tell the two apart.
How trends are measured
- Share, not raw counts. Each construct is measured as its share of all linguistics output that year: papers matching the construct divided by the corpus that year. The index grows every year, so a rising raw count is often just more indexing. Share is what stays comparable across time.
- Matching. A paper counts toward a construct when the construct's phrase appears in its title or abstract (OpenAlex's title_and_abstract.search). A paper can match several constructs, so the shares do not sum to one.
- Year over year, on the last complete year. The headline movements compare the most recent fully indexed year (2025) against the year before it. Ranking is by relative change in share, which is what lets a construct read as "up 224%".
- The current year, by rank only. 2026 is shown as the lead, but only as an ordering of which constructs are gaining fastest so far. Its percentages are not published, for the reason given in the next section.
- Guards against false trends. A construct must clear a minimum yearly volume and keep moving the same direction across several consecutive years before it is treated as confirmed.
Why the current year is provisional
OpenAlex keeps adding records for a year well after that year ends, and a paper's abstract often arrives later still. Because a construct is matched on title and abstract text, the part of a year that is already indexed is richer in matchable text than the year as a whole will be. The visible effect is that, in the current year, almost every construct's share is inflated by a similar factor at once, so raw year-to-date growth would read as "everything rising" with no real losers. That is an artifact of indexing, not a real trend.
Two design choices follow. The firm percentages and the chart use only complete years, where the distortion has largely settled. The current year is summarised by rank, since the order of the fastest risers survives a distortion that lifts everything together, even though the size of the move does not. As OpenAlex finishes indexing, the recent years firm up, and because every weekly run is committed, you can watch that happen in the repository's history.
What it covers, and what it misses
This is an observatory of the indexed, mostly-English, journal-article slice of the field. The main blind spots:
- Books and edited volumes are under-represented, and they carry real theoretical weight here.
- Non-English scholarship (for example Francophone, Hispanophone, and Sinophone work) is under-indexed.
- Conference proceedings and grey literature are patchy; the computational-language slice lives largely on preprint servers and is only partially captured.
- A meaningful share of records lack abstracts, and automatic language labels are imperfect.
So read these as trends in the indexed journal literature, not a complete census of the field.
Underserved niches
The "underserved niches" on the front page flag constructs whose recent papers are cited at well above the field's rate while the literature on them is still thin. Citations build with age, so a naive reading would just track citation lag; instead each construct's early-citation rate (the share of its papers in a settled three-year cohort, 2022 to 2024, that have reached at least five citations) is divided by the field's own rate, which cancels that lag, and a niche is flagged only when that ratio is high and the cohort is still small. Read it as a tip about where attention may be heading, not a verdict.
The dataset
The full series is downloadable as one long-format CSV: one row per construct per year, every year on record. The latest year is the partial current year and is flagged provisional above; the rest are complete years.
| Column | Type | Meaning |
|---|---|---|
| year | integer | Publication year of the window. |
| construct_id | string | Stable identifier for the construct. |
| construct | string | Human-readable construct label. |
| source | string | openalex (a pruned topic keyphrase) or curated (a newer coinage added by hand). |
| papers | integer | Works that year whose title or abstract matched the construct. |
| corpus_papers | integer | All linguistics works indexed for that year (the denominator). |
| share | float | papers / corpus_papers: the construct's share of the field that year. |
Construct series (CSV)
Every construct, every year on record, in one tidy file.
Download constructs.csvLicence: the underlying metadata is OpenAlex, released under CC0. This derived dataset is shared under the same terms; please credit OpenAlex and link back to this site.
How it runs
The data comes from OpenAlex and refreshes once a week. No language model is involved at any stage; every figure is computed directly from the metadata, so nothing here is generated or guessed. The paper lists carry titles, authors, venues, and links only; abstract text is never stored or shown. Each title links to the paper's DOI, with a separate open-access link where a free copy exists.
Data as of the latest weekly run.