The major accomplishment of this project is the successful identification of NLMs that distinguish AD and PD patients from healthy controls at the probabilistic subject-level. Notably, these markers generalize across both Latin-American and European samples, highlighting their robustness and cross-country validity.
All participants completed verbal fluency tasks, which involved generating words within specific categories. Unlike traditional methods that simply count total valid responses, we employed a novel approach, extracting multiple psycholinguistic properties (e.g. frequency, granularity, phonological neighbourhood, word length, familiarity, and imageability) from each spoken word. These features were then fed into machine learning models, to evaluate whether they could discriminate patients from healthy controls.
Our analysis revealed distinct linguistic profiles for AD and PD. In AD, NLMs achieved a strong classification performance (AUC = 0.9) with key features including word frequency, granularity, and phonological neighbourhood. AD patients tended to produce high-frequency, conceptually imprecise words (e.g. “flower” instead of “rose”) with similar phoneme sequences. These language markers were also linked to atrophy in temporal regions, reduced fMRI connectivity in the default-mode network, and EEG hypoconnectivity in temporo-parietal regions within the beta band (15–30 Hz).
For PD, a good classification performance was also observed (AUC = 0.84) with this group showing a different linguistic pattern. PD patients favoured concrete words (e.g. “piano” over “symphony”), and produced semantically closer, less varied concepts. This pattern correlated with impaired inhibition, as measured by Hayling test scores, suggesting difficulties in suppressing dominant concepts to shift to new categories (e.g. switching from “domestic animals” to “wild animals”). Additionally, the preference for concrete words was negatively correlated with cognitive status (MoCA scores), suggesting that greater cognitive impairment leads to reliance on more accessible sub-domains of semantic memory. Similarly, NLMs were correlated with aberrant connectivity in fMRI sensorimotor and salience networks, both of which are commonly disrupted in PD patients.
Finally, the language markers identified in the Latin American Spanish-speaking sample generalized well to the European Spanish-speaking sample for both AD (AUC = 0.9) and PD (AUC = 0.81) patients. This underscores the potential of NLMs as reliable tools for the detection and monitoring of neurodegenerative diseases across diverse linguistic and cultural contexts.