Voice classification — Cullen Project Explorer

The six voice labels

Voice	Rule	Approx. count
Cullen	All outgoing letters (Cullen is the author).	~3,000
Attending physician	Incoming; author's case role is "Patient's Physician / Surgeon / Apothecary," or author has the medical-professional flag with no case role.	~1,500
Patient	Incoming; author's case role is "Patient."	~550
Family	Incoming; author's case role is "Patient's Relative / Spouse / Friend."	~250
Peer physician	Incoming; author's case role is "Other Physician / Surgeon."	~100
Excluded	Incoming; no classifiable case role and no medical-professional flag.	~200

Counts are approximate. Exact figures shift as the underlying database is updated.

How the rule works

The classification uses two pieces of metadata from the Cullen Project database: the letter's direction (outgoing or incoming) and the author's case role — the relationship between the letter's author and the patient discussed in the letter. Every outgoing letter is classified as cullen regardless of case role, since Cullen authored all outgoing correspondence.

For incoming letters, the author's case role determines the voice. A letter from someone identified as the patient's physician is classified as attending_physician; a letter from the patient themselves as patient; a letter from a relative or friend of the patient as family; and a letter from a physician with no patient relationship as peer_physician.

When no case role is recorded but the author is flagged as a medical professional in the persons table, the letter falls back to attending_physician. Letters with no classifiable role are marked excluded and omitted from voice-stratified analyses.

Limitations

This classification operates at the letter level. When an attending physician quotes or paraphrases a patient's words — a common practice in consultation letters — the patient's voice appears within a physician-classified letter. The current version of the Explorer cannot identify these embedded voices; a future span-level annotation pass would be needed.

Approximately 2–3% of incoming letters are excluded as unclassifiable. These are primarily letters where the author's relationship to the patient could not be determined from the available metadata.