The Small World of Words Project: Modeling the Mental Lexicon Based on Word Associations

Gert Storms, University of Leuven, Belgium

Simon De Deyne, University of Adelaide, Australia

Since the 1970s, researchers have forwarded the idea of semantic networks, coupled with an activation spreading process, to account for a number of empirically observed phenomena, like semantic priming, false word recognition, etc. These ideas, however, remained mainly hypothetical constructs that were hardly testable, due to a lack of means to build networks with a realistic number of words. To model meaning in the mental lexicon as a semantic network we are using a crowd-sourced project in which people generate the first word that comes to mind for an extensive list of cue words.

The Small World of Words project has been used to construct a novel word association database which is currently the largest of its kind in English and Dutch. In contrast to a thesaurus or dictionary, this lexicon provides insight into what words and what part of their meaning are central in the human mind. Making this data available enables psychologists, linguists, neuro-scientists and others to test new theories about how we represent and process language.

The scope of the project has changed considerably since its conception in 2003. The current project is supported by academics across the globe and now includes 13 major world languages. In the near future, we hope to boost the further development of a true multilingual lexicon.

We invite you to support this project by participating in the studies in one of the languages displayed below. It will take just five to eight minutes of your time.

The Project’s Methodology and Current Results

The two largest databases derived from the Small World of Words project consists of over 5 million associations between 16,000 Dutch words, provided by approximately 100,000 volunteering participants in Flanders and the Netherlands, and over 12,000 English words collected by nearly 80,000 participants. The associations were collected online by asking participants to write down three associations for each cue word from a set of 16 cues. The current dataset contains cue words presented to at least 100 participants. From this data, we derived a semantic network which has provided the foundation to study different aspects of word meaning.

First, we showed that these networks are highly structured and consist of central ‘hub’ nodes in the network tended to be highly frequently used words that are acquired early.

Second, by collecting three associations per cue (in contrast to previous work), the networks are denser which allows a distributional semantic approach. This is important in investigating semantic similarity or relatedness. Using similarities derived for each word pair, we constructed a low multidimensional dimensional space and predicted ratings of affective variables like valence, arousal, dominance, and masculinity versus femininity of words, as well as age of acquisition and concreteness. The results showed that this method yields better predictions of the empirically obtained ratings than previously presented methods based on word collocations in text corpora. A similar distributional approach was used in a direct comparison of association based and collocation based predictions. We showed that the first clearly outperformed the latter in accounting for relatedness judgments (Likert scale type ratings) of participants. Furthermore, in a series of experiments, we showed that similarities between completely unrelated, as well as between related entities, obtained in a triadic comparison task, were much better predicted by word associations than by text-based models. In these studies we showed that a random walk model, implemented on the association network, which makes use of the global structure of the network, yields clearly better predictions of different semantic criterion variables than a simple model derived from direct associations only, which only takes into account the local structure of the associations.

Image: Visualization of the first 7,000 words responses of SWOW project. Two examples, for the nodes "brain" and "heart" are shown. The size of the nodes indicates how central they are in the network.

The semantic network has also proven to be valuable in other fields as well. The similarities derived from the network were validated in an Event Related Potential study, where subjects had to evaluate whether two consecutively presented words were related. The magnitude of the N400 measure was shown to be related to the network derived similarity of the word pairs.

Finally, the potential of the association approach to do cross-linguistic comparisons was piloted by studying the derived meaning of emotional terms in an American English and in a Flemish Belgian population. The results showed both culturally consistent (e.g., anger) and inconsistent (e.g., shame) emotions, a finding that is in line with cross-cultural studies that used completely different methodologies.

You can find more details and current results on two web resources: https://smallworldofwords.org/en/project/home or visit the old web page for links to papers, downloads and additional visualizations.

Intechopen's linguistics books currently open for submissions