Topic · AI BIRP note generator

AI BIRP note generator that runs on your Mac — Behavior / Intervention / Response / Plan drafted locally, audio never leaves the device

BIRP is the documentation format auditors love and clinicians grumble about — and there's a reason for both. By labeling Intervention as its own paragraph, BIRP makes it trivially easy for a Medicaid reviewer or a managed-care auditor to confirm that the session content matches the billed code; by separating Behavior, Response, and Plan, it gives a structured paper trail that holds up under utilization review. A BIRP generator built for the format respects that structure: the Intervention paragraph carries the weight, Behavior anchors the session start, Response shows the work landing, Plan closes the loop. TherapyDraft is the BIRP generator built around all four — and it does the work on your Mac, with no network socket open for the audio or the draft.

TL;DR

TherapyDraft is an AI BIRP note generator for US mental-health clinicians that runs entirely on the clinician's M-series Mac. Record a session, click "Draft BIRP," and in two to four minutes you have a Behavior / Intervention / Response / Plan draft sized for the median audit-friendly progress note. The Intervention paragraph is the longest by design — typically 130–200 words — because BIRP exists to make the intervention legible. The audio file, the transcript, and the draft never open a network socket because the macOS network-sandbox entitlement on those code paths is set to deny by design. Output pastes cleanly into SimplePractice, TherapyNotes, TheraNest, Valant, Credible, and any other EHR with a structured progress-note form. There is a 10-session free trial; paid is $39 per month or $349 per year.

Why BIRP, and why a generator built specifically for BIRP

BIRP — Behavior, Intervention, Response, Plan — is the dominant format in community mental-health centers, Medicaid-funded outpatient clinics, managed-care contexts, and any practice where progress notes are read by someone other than the clinician on a regular basis. The structural reason is straightforward: BIRP labels Intervention as its own paragraph, so a reviewer scanning a chart can confirm in seconds that the session content matches the billed CPT code. SOAP buries intervention inside Assessment or splits it across Subjective and Objective; DAP folds it into Data along with everything else. BIRP pulls it out and puts it under its own header. That is the format's whole job, and it does it well.

That structural emphasis matters for AI-assisted drafting in a way it doesn't for DAP or SOAP. The recurring failure mode of cloud BIRP scribes is the opposite of the SOAP-Objective-overflow problem: instead of dumping verbatim transcript into a field, the model generates a generic intervention paragraph that names the modality at a high level ("CBT-based intervention focused on cognitive restructuring") without grounding the intervention in what actually happened during the session. An auditor reading "CBT-based intervention focused on cognitive restructuring" learns nothing they couldn't have inferred from the billed code; the documentation adds no value. An AI SOAP generator built for private-practice mental-health work has its own structural failure mode — the Subjective/Objective split — but BIRP's failure mode is generic Intervention, and a generator tuned for BIRP has to be tuned to avoid it.

TherapyDraft's prompt scaffolding for BIRP is calibrated against roughly a hundred anonymized real BIRP notes contributed by beta clinicians during the supervised testing phase, with a deliberate weighting toward CMHC, Medicaid, and managed-care practice contexts where BIRP is the house format. The model is told, in effect, that Intervention is where the chart earns its keep — name the modality, name what was actually done in this session, name the skill or technique by its proper name when one applies, and ground every claim in a paraphrase of what the transcript shows. The output reflects that — Intervention carries the documentation weight, Behavior anchors the starting state, Response shows the work landing, Plan closes the loop.

The BIRP-drafting workflow on a Mac

Record or import the audio. Use TherapyDraft's built-in recorder, your Mac's microphone, a USB lavalier, or an exported audio file from any session-recording setup. Telehealth recordings work; in-person recordings work; phone-call exports work. The file is stored in TherapyDraft's local Application Support directory and stays there until you delete it.
Pick BIRP and pick the voice preset. The format dropdown has SOAP, DAP, BIRP, GIRP, and a Custom mode. Choose BIRP. The voice preset toggles whether Behavior is third-person or first-person, whether Intervention names modality at the top of the paragraph or weaves it into the body, whether Response is structured by domain (cognitive / affective / behavioral / interpersonal) or written as a single paragraph, and whether Plan numbers its items or bullets them. Defaults are calibrated to the CMHC-and-managed-care BIRP standard; first-time customization is one click per toggle, and the preset is persistent.
Draft locally. whisper.cpp transcribes the audio on your Mac (real-time factor under 1.0× on M2 and faster), then Qwen 2.5 14B-Instruct (4-bit MLX) runs the BIRP-shaped prompt against the transcript and your style examples. End-to-end on a 50-minute session: 90–150 seconds on M2, 60–100 seconds on M3 or M4. The macOS network-sandbox entitlement on the recording, transcription, and inference paths is set to deny by design; there is no socket through which audio or text can leave.
Review and paste. Read the draft in TherapyDraft's editor, fix anything the model got wrong, then click the EHR-paste preset for your chart-of-record. Output is plain-text labeled paragraphs that drop cleanly into the Behavior, Intervention, Response, and Plan fields of any standard BIRP form. Custom forms with extra fields (Modality, CPT, Time-in-session, Risk Statement) work too — the Custom-form-aware preset reads your field labels and emits paragraphs to match.
Sign in your EHR. The note is yours; the AI authored a draft, you authored the note. The audit trail in your EHR shows you as the author of every field because every field was edited by you while logged in. That is the correct legal posture for any audit context and matches the way every cloud BIRP scribe quietly handles authorship in practice.

End-to-end time on a 50-minute session: roughly two to four minutes of local drafting plus one to three minutes of clinical review and editing in your EHR. Net displacement of typing burden: about 15–25 minutes per session, which is the same range cloud BIRP scribes report — without the audio leaving your machine, and without the documentation sitting in any vendor's tenant where an audit-letter reach could find it through somebody else's discovery process.

A sample BIRP output from the same anonymized fake transcript used on the DAP and SOAP pages

Below is a draft TherapyDraft produced from the same anonymized fake 50-minute transcript that drove the sample on the DAP-generator page and the SOAP-private-practice page, with the format set to BIRP. The "client" is composite; nothing in this transcript corresponds to a real session or a real person. Names, demographics, and clinical detail have been generalized. Reusing the same transcript across DAP, SOAP, and BIRP samples is deliberate — it lets a clinician compare the three reshapings side-by-side and see that the format-awareness is real, not cosmetic. The same session content takes a different shape in each format because the formats themselves are different, not because the model is generating three templated variants of the same thing.

Behavior. Client presented on time for a regular weekly session, well-groomed and reasonably rested. Affect was reactive and somewhat constricted at intake, broadening over the session as rapport re-established; congruence with reported content was good throughout. Speech was at normal rate and prosody, with a noticeable softening when discussing the values conflict. Engagement was active. No psychomotor abnormalities observed. Client opened the session by reporting a difficult workplace interaction the prior Tuesday in which a senior colleague attributed an error to client in a meeting that included client's direct supervisor, and described an escalating pattern of shortened sleep onset, two missed gym sessions, and weekend rumination peaking Sunday afternoon. No SI / HI reported or elicited at intake or during session.

Intervention. CBT-and-ACT-integrative session focused on three labeled tasks. (1) Cognitive-defusion skills review: clinician and client revisited paced breathing and the brief written reframe practiced in prior weeks, with client reporting in-vivo use of both Sunday afternoon to interrupt rumination. Clinician reflected the partial success of the reframe ("got me out of the loop but didn't change how I feel about her") and validated the distinction between rumination interruption and emotional resolution as separate clinical targets. (2) Values-clarification revisit anchored to session 6's worksheet: clinician guided client through identifying which of client's previously articulated values the colleague interaction conflicts with, and client identified competence-at-work as the load-bearing value in conflict. Clinician used reflective listening and open-ended questions throughout this segment without offering interpretation. (3) Behavioral rehearsal preparation: clinician proposed drafting a 3–4 sentence script for a direct conversation with the colleague before next session, to be roleplayed in session with role reversal, and client agreed and named willingness to attempt the conversation in vivo within two weeks of next session. No exposure work, no formal cognitive restructuring beyond the reframe revisit, no homework assignment beyond the script draft.

Response. Client engaged actively across all three intervention segments. Client volunteered the values-clarification frame from session 6 without prompting and modulated effectively from rumination-recall mode to skills-practice mode mid-session, which clinician notes as a meaningful change from intake-period sessions. Articulation of the values conflict (competence versus avoidance) was sharper than at intake and reflects integration of session-6 work; client's framing of the reframe's partial success — distinguishing rumination interruption from emotional resolution — was unprompted and clinically apt. Client expressed willingness to draft the conversation script and attempt in-vivo conversation, with appropriate hedging. Affect modulation across the hour and the in-vivo use of session-6 skills support continued progress on the recognize-rumination-earlier treatment goal. No regression from prior session noted.

Plan. Client to draft a 3–4 sentence script for the colleague conversation prior to next session. In session, roleplay the conversation twice with role reversal. Continue weekly cadence; revisit values-clarification follow-up at session 12 per treatment plan. Clinician to send the cognitive-defusion handout discussed in session 4 as a reference for the rumination-interruption work. CPT 90834 (45-minute psychotherapy); session ran 50 minutes per recording timestamps.

Notice the shape. Behavior is anchored in observable detail at session start — about 130 words. Intervention is the longest paragraph at about 230 words because that is what BIRP exists to do; it names the integrative modality (CBT-and-ACT), labels three discrete intervention segments, and grounds each segment in what actually happened during the session rather than reciting a generic intervention summary. Response is about 140 words and tracks the client's engagement and clinical change-markers. Plan is six concrete next-step items in five short sentences, with the CPT-and-time line included for billing-context completeness.

Compare against the DAP version of this same session — Data was 240 words and absorbed the entire narrative, Assessment was 75 words and stayed on clinical reasoning. The BIRP reshaping moves the narrative observation into Behavior, pulls the intervention work into its own labeled paragraph (which DAP buries inside Data), and uses Response to do the work that DAP's Assessment does, with a slightly different audit-readability framing. Same session content, three formats, three different things foregrounded. That is the format-awareness, not a relabeling.

What "BIRP" means here, briefly

A BIRP note is a progress note structured as four labeled paragraphs:

Behavior. What the clinician observed and what the client reported about presentation at the session start. Includes mental-status data (affect, speech, psychomotor activity, attention, thought process), the presenting concerns the client raised this session, and any contextual life events that frame the session content. Anchors the chart in the session's starting state. Length: typically 90–140 words for a 50-minute session.
Intervention. What the clinician did in the session, named at the level of modality and at the level of specific technique. Includes the labeled intervention modality (CBT, ACT, EFT, IFS, DBT, motivational interviewing, psychodynamic, integrative, exposure-based), the discrete intervention segments within the session, and what the clinician did in each segment grounded in what actually happened per the transcript. The longest paragraph by design — this is what BIRP exists to surface.
Response. How the client responded to the intervention. Includes engagement level, change markers within the session, articulation of treatment-goal-relevant content, affect modulation, and any clinical-judgment notes about whether the work landed. Tracks the gap between what the clinician did and what shifted as a result.
Plan. The next-step list. Includes homework, between-session tasks, the cadence of upcoming sessions, any referrals, focus areas for the next session, and the CPT-and-time-in-session line if the practice's documentation standard includes it. Functionally a short numbered or bulleted list.

BIRP is the dominant format in community mental-health centers, Medicaid-funded outpatient clinics, managed-care contexts, and ABA-adjacent behavioral-health practices. SOAP is more common in primary-care and integrated-medical settings. DAP is dominant in private-practice cash-pay talk-therapy. SOAP for private practice is the second-most-common format among LMFTs and LCSWs whose insurance panels or supervisors prefer the medical-format default. GIRP is the close-cousin format that replaces Behavior with Goal — used in practices where every session is documented as explicitly tied back to a written treatment plan. TherapyDraft generates all four formats; this page is about BIRP specifically because it's the right answer for clinicians whose audit context wants the intervention surfaced under its own header.

Why a local BIRP generator is structurally different from a cloud BIRP generator

Every cloud BIRP scribe in market in 2026 — Mentalyc, Upheal, Blueprint, Supanote, Freed, CliniScripts, and the in-product AI scribes shipping inside major EHRs — works the same way at the architectural layer. Audio uploads to the vendor's cloud. The vendor's cloud forwards it to an AI subprocessor for transcription and drafting. The transcript and draft come back, get stored in the vendor's tenant of that subprocessor's storage, and surface in the clinician's web UI. Every link in that chain is covered by the vendor's BAA, and every link is also a new line in the practice's subprocessor inventory and a new breach surface.

For a BIRP-using practice the architectural property of running locally has a sharper edge than for a private-practice DAP-using clinician. CMHC, Medicaid-funded, and managed-care documentation is explicitly designed to be read by reviewers who are not the clinician — utilization reviewers, audit auditors, contract compliance staff, supervisor sign-off chains. That makes the documentation higher-stakes than a private-practice DAP that lives in the clinician's chart and rarely leaves it. The audit trail of who has had access to the underlying session content matters not only for the clinician's threat model but for the practice's compliance posture; the longer that trail is, the more lines need to appear in any future breach-notification calculus.

The architectural property of running the BIRP generator locally is not better security in the conventional sense — it's the absence of a category of risk. There is no subprocessor chain because there is no cloud. There is no breach surface at the AI vendor because there is no AI vendor. There is no audio in transit because there is no transit. The Mac is the cloud, and the only thing that ever leaves it for this workflow is the typed-or-pasted note that you type-or-paste into your EHR yourself, after reviewing it. For a CMHC clinician whose chart is going to be read by three or four different non-clinician roles, that property collapses the whom-has-touched-the-raw-audio question to one entry: you.

For practices whose threat model is satisfied by a signed BAA with a reputable cloud scribe, the cloud options are perfectly defensible. The deeper-dive comparison is in the pricing comparison page, and the architectural-vs-contractual argument is laid out in the BAA explainer. This page is for clinicians and practices who already know what they want and are searching for the tool that ships it.

Pricing and free trial

TherapyDraft is $39 per month or $349 per year for the Solo plan, which includes unlimited DAP / SOAP / BIRP / GIRP drafts, all EHR paste presets, the tamper-evident inference attestation log, and the one-shot template matching from your own example notes. Group pricing is $29 per seat per month for practices of 3+ seats — the price most relevant to CMHC and Medicaid-funded outpatient clinics where BIRP is the house format. There is a free trial of 10 sessions with no credit card required — see the free-trial page for the trial mechanics.

Pricing context against cloud BIRP scribes: Mentalyc starts at $19.99/mo, Supanote at $39/mo, Upheal at $29/mo, Blueprint at $0.99 per session, and Freed at $99/mo. TherapyDraft sits at the median of that range and uses the architectural-vs-contractual difference, not price, as the wedge. The full breakdown is in the pricing comparison.