tools – Morphemes & Meaning Lab

(mostly) free tools we use to:

design/norm materials

The English Lexicon Project: The English Lexicon Project (supported by the National Science Foundation) affords access to a large set of lexical characteristics, along with behavioral data from visual lexical decision and naming studies of 40,481 words and 40,481 nonwords.
The SUBTLEX project at Universiteit Gent. Frequency measures and POS weighting for American English, Dutch and Chinese derived from movie subtitles.
Lexique: Developed by Boris New and Christophe Pallier, Lexique is a database and a search tool for norming French lexical stimuli.
VIEW (Variation in English Words and Phrases): Mark Davies’ handy web interface that makes searching the British National Corpus
easy
COCA (Corpus of Contemporary American English) also by Mark Davies, with a very similar interface to the VIEW site.
ARC Nonword Database: Generates nonword materials that conform to a wide choice of properties. Maintained by Macquarie University.
Wuggy: a pseudoword generator particularly geared towards making nonwords for psycholinguistic experiments. Wuggy makes pseudowords in Basque, Dutch, English, French, German, Serbian (Cyrillic and Latin), Spanish, and Vietnamese.By The Center for Reading Research at Ghent.
Semantic Space Model Demo: A tool for calculating the number of semantic neighbours a word has as determined by the frequency distributions of words occurring in the environment of the target word in large corpora. By Scott MacDonald.
The MRC Psycholinguistic Database: Great tool for either generating or rating stimulus items based on up to 25 different properties with any of dozens of restrictions.
The LSA (Latent Semantic Analysis) interface tool: Hosted at UC Boulder, this tool provides an easy interface to doing LSA.
VALEX: a new large valency (subcategorization) lexicon for English verbs which is suitable for (statistical) natural language processing (NLP), linguistic and psycholinguistic use. The lexicon was developed by members of the Natural Language and Information Processing Group at the University of Cambridge Computer Laboratory.
The Unified Verb Index: a system which merges links and web pages from four different natural language processing projects focused on coding verb valency and argument structure.
Speech and Hearing Lab Neighborhood Database: From Washington University in St. Louis.
The Word Frequency Lists: Lists of very frequent words in a variety of corpora compiled by Rob Waring.
The Phonotactic Probability Calculator: by Mike Vitevitch at the University of Kansas.
IPhOD: The Irvine Phonotactic Online Dictionary. A large collection of English words and pseudowords suitable for research on phonological processes.

run experiments

DMDX: experiment running software written by Jonathon Foster. DMDX has extremely reliable timing control, making it the best software available for priming experiments, where stimulus duration and ISI are of the utmost importance. Only runs on Windows.
Linger: written by Doug Rohde, based on TCL/Tk this tool is particularly well suited to self-paced reading experiments.
Ibex Farm: Alex Drummond’s javascript based tool for running online experiments.
Prolific: A portal that allows you to post your online studies and rapidly recruit participants, and pay them a fair fee for their time.
Gorilla: experiment designing and running software developed by Cauldron to support a wide range of behavioural experiments. Very flexible pricing schemes, and excellent tech support make it as well suited to a pilot project by a novice user as to a large scale project. We increasingly use it to host all our online experiments.

analyse data

R: command-line based statistical analysis package. A good way to be 100% certain you know exactly how your analyses are calculated. Shravan Vasishth’s introductory stats textbook The foundations of statistics: A simulation-based approach uses R to illustrate the relevant concepts and analysis options, so it’s a great way to learn R. Harald Baayan’s textbook Analyzing Linguistic Data: A practical introduction to Statistics using R does exactly what the title suggests. (an earlier version is available as a pdf on Baayen’s website)
Eelbrain: an open-source Python package for statistical analysis of electrical brain activity (MEG and EEG). Developed by Christian Brodbeck at the Neuroscience of Language Lab at New York University.

report results

LaTeX: For document preparation. Especially useful for linguists as it allows tree drawing, ipa fonts, semantic denotation writing, autosegmental phonological representations, perfectly aligned glosses, etc without all the fuss and bother and ugliness of Word. Essex University’s Latex4Linguists page is a great resource for getting started and finding the right packages
Google Documents: A great way to collaborate on projects. We use the spreadsheet app to organize subject recruitment, scheduling and running, and the document app for everything from planning projects, to maintaining a tech report on an experiment in progress, to collaborating on the final write up.