How concordance works — Cullen Project Explorer

What is a concordance?

A concordance (or KWIC — Key Word In Context) display shows every occurrence of a search term in the corpus, centered in its surrounding context. The left context is right-aligned and the right context is left-aligned, so the search term forms a visible vertical column. This format, standard in corpus linguistics since the 1960s, lets you see collocational patterns at a glance: what words typically appear near your search term.

How the search works

The Explorer uses SQLite's FTS5 full-text search engine to find matching letters. The search is case-insensitive and matches whole words. Multi-word phrases are matched as consecutive tokens. After identifying matching letters, the system tokenizes the prose text (splitting on whitespace) and extracts a window of N tokens on each side of each match.

Window size

The context window (±5, ±7, or ±10 tokens) controls how much surrounding text is shown. The default of ±7 tokens is a standard concordance window — enough context to see the immediate collocational environment without overwhelming the display.

Sorting

Three sort orders are available. "Chronological" orders by letter date. "By right context" sorts alphabetically on the first word after the search term — this clusters similar right-collocates together (e.g., searching for "pain" and sorting by right context groups "pain in," "pain of," "pain was" together). "By left context" does the same for the word immediately before the search term.

Text normalization

The prose text used for concordance has been extracted from the TEI XML with editorial apparatus removed (notes, annotations, page breaks) and recipes excluded. Editorial choices (abbreviation expansions, regularized spellings, scribal substitutions) use the expanded or regularized form. The full normalization pipeline is documented on the corpus methodology page.