Methodology
Voice classification
Every letter in the corpus is assigned a voice label based on the relationship between the author and the patient under discussion. This page explains the classification rule.
The six voice labels
| Voice | Rule | Approx. count |
|---|---|---|
| Cullen | All outgoing letters (Cullen is the author). | ~3,000 |
| Attending physician | Incoming; author's case role is "Patient's Physician / Surgeon / Apothecary," or author has the medical-professional flag with no case role. | ~1,500 |
| Patient | Incoming; author's case role is "Patient." | ~550 |
| Family | Incoming; author's case role is "Patient's Relative / Spouse / Friend." | ~250 |
| Peer physician | Incoming; author's case role is "Other Physician / Surgeon." | ~100 |
| Excluded | Incoming; no classifiable case role and no medical-professional flag. | ~200 |
Counts are approximate. Exact figures shift as the underlying database is updated.
How the rule works
The classification uses two pieces of metadata from the Cullen
Project database: the letter's direction (outgoing or incoming)
and the author's case role — the relationship between the
letter's author and the patient discussed in the letter. Every
outgoing letter is classified as cullen regardless of case role, since Cullen authored all outgoing
correspondence.
For incoming letters, the author's case role determines the
voice. A letter from someone identified as the patient's
physician is classified as attending_physician; a letter from the patient themselves as patient; a letter from a relative or friend of the patient as family; and a letter from a physician with no patient relationship as peer_physician.
When no case role is recorded but the author is flagged as a
medical professional in the persons table, the letter falls back
to attending_physician. Letters with no classifiable role are marked excluded and omitted from voice-stratified analyses.
Limitations
This classification operates at the letter level. When an attending physician quotes or paraphrases a patient's words — a common practice in consultation letters — the patient's voice appears within a physician-classified letter. The current version of the Explorer cannot identify these embedded voices; a future span-level annotation pass would be needed.
Approximately 2–3% of incoming letters are excluded as unclassifiable. These are primarily letters where the author's relationship to the patient could not be determined from the available metadata.