Voice notes to tasks, without the cleanup

Q: How does quik.md turn voice notes into tasks?

You tap the mic, speak, and stop. Audio is streamed to OpenAI Whisper for transcription, then the transcript runs through the same classification path as typed input. The model decides whether each fragment is a task or a note, picks a project, and writes the next step in plain English. The audio itself is discarded after transcription. Nothing is kept server-side.

Q: Can it handle long, rambling memos with multiple ideas?

Yes. A forty-second walking memo with three buried todos lands as three separate items, each classified and routed on its own. The model treats the transcript as a stream of intents, not one giant blob. You will still want to glance over the result, but most of the parsing is done before you look.

Q: Does it work offline?

Voice falls back to your browser's built-in Web Speech API when you are offline or on free tier. Quality is lower, but capture still happens and the transcript is queued for AI organize when you come back online. Firefox does not ship Web Speech, so it gets a one-shot toast and no destructive action.

Q: Is voice a Pro feature?

Voice capture itself is not Pro-gated. Any user can speak into quik on a supported browser. The Whisper-backed server transcription and AI organize step run on Pro. Free tier still gets the transcript via Web Speech and an upgrade chip on each queued item.

Q: Where do my voice notes end up?

Inside the project the model picked, at the routing confidence threshold of 0.80. Anything below lands in Inbox, where you can drag it into the right project or accept the suggestion. Every item is markdown by default, so you can copy or export at any time.

Turn voice notes into clean tasks with quik.md. Speak a fragment on a walk, get a parsed next step in the right project, exportable as markdown.

Updated April 25, 2026 · 6 min read

Warm paper editorial illustration of a microphone resting on a notebook with handwritten cards fanning out into project trays — Speak the fragment, quik writes the next step. No cleanup pass needed.

Most voice memos die unsorted in a recording app. quik.md transcribes, classifies, and routes them into the right project with the next step already written. The whole capture takes a few seconds, and the result is a clean task you can act on without editing.

Why voice notes usually rot

A voice memo is the easiest thing to capture and the hardest thing to revisit. You record a fragment on a walk, get back to your desk, and the file sits in a recording app with a name like New Memo 47.m4a. Re-listening costs more than the idea is worth, so most memos quietly die.

quik.md collapses that loop. The mic is the same single capture surface as typing. You speak, it files. The next step is written before you sit down.

How the voice flow works inside quik

Four moves. First, you tap the mic in the inbox. No project picker, no tag chooser. Second, audio streams to OpenAI Whisper for transcription. Third, the transcript goes through the same parser as typed capture: classification, project routing, next-step extraction. Fourth, the item lands in the routed project (or Inbox at low confidence) as a markdown task with a clean verb.

For the deeper version of this loop see voice-to-task capture. For how the parser decides what to file where, see AI task routing.

A real walking memo, parsed

Raw voice transcript:

"ok onboarding feels long, we should probably cut step three, also remind me to follow up with priya on the partnership deck thursday, and that pricing anchor idea, two columns with annual toggle on by default."

quik turns that into three items:

Note in Onboarding Revamp: "Onboarding feels long. Consider cutting step 3."
Task in Q2 Launch, due Thursday: "Follow up with Priya on partnership deck."
Note in Pricing: "Pricing anchor idea: two-column layout, annual toggle on by default."

One memo, three intents, three project landings. You did not open three apps and you did not file anything.

What the transcription stack actually keeps

The audio itself is streamed through Whisper and dropped after the response. quik does not persist voice files. The transcript is stored as the item content; the metadata sits in ai_runs for rate-limit accounting only. If you need legal-grade retention, this is the wrong tool — and that is on purpose.

When voice beats typing

Walking, driving, between meetings — any context where typing breaks the thought.
Long reflective dumps where you cannot keep up with your own brain.
Multi-intent rambles where filing into separate apps would cost more than the ideas are worth.

When typing still wins

Short, clean tasks with a known due date. "Email Sam Friday" is faster typed than spoken.
Sensitive content you would rather not transcribe.
Code snippets, URLs, or technical strings that voice models still get wrong.

Voice on free tier and offline

Free tier users get the Web Speech API path: in-browser transcription, no server call, no Whisper. Quality drops, but capture works. Items land with ai_status='queued' and run through the AI organize step the next time you go online and have Pro. The AI inbox design assumes capture must always succeed even when intelligence cannot.

Firefox does not ship SpeechRecognition. quik shows a one-shot toast there instead of failing silently. No destructive action.

How this compares to other voice-to-task tools

Tool	Voice path	Routing	Markdown export	Best for
Apple Reminders	Siri only	None	Partial	Light reminders inside Apple
Todoist	Mobile dictation	Limited NLP	Yes	Disciplined list-keepers
Otter.ai	Long meetings	None	Limited	Meeting transcripts, not tasks
Voice memos app	Raw audio only	None	None	Capture, no routing
quik.md	Whisper + AI organize	Project-aware	Native	Messy capture into clean tasks

For more comparisons, see quik vs Apple Reminders and quik vs Todoist.

Who voice-first capture is for

Founders who think on walks and lose half the ideas before they sit back down.
Field workers and clinicians who cannot type between encounters.
PMs taking back-to-back meetings who need action items without a separate transcription pass.
Writers and researchers who reach for the mic when typing breaks the sentence.

If you live in a structured Obsidian vault and only want long-form notes, see quik vs Obsidian first — quik is a capture surface, not a knowledge graph.

Markdown export and portability

Every voice-captured item is markdown by default. Copy-as-markdown on any item, export a project as a single markdown file, or move the whole thing to a markdown-native workflow. Lock-in is a product smell, and voice is no exception.

FAQ

How does quik.md turn voice notes into tasks?

You tap the mic, speak, and stop. Audio is streamed to Whisper for transcription, then the transcript runs through the same classification path as typed input. The model decides task vs note, picks a project, and writes the next step in plain English. Audio is discarded after transcription.

Can it handle long, rambling memos with multiple ideas?

Yes. A forty-second memo with three buried todos lands as three separate items, each classified and routed on its own. quik treats the transcript as a stream of intents, not one giant blob.

Does it work offline?

Voice falls back to the browser's Web Speech API when you are offline or on free tier. The transcript queues for AI organize when you come back online. Firefox does not ship Web Speech, so it shows a one-shot toast.

Is voice a Pro feature?

Voice capture itself is not Pro-gated. Whisper-backed server transcription and the AI organize step run on Pro. Free tier still captures via Web Speech.

Where do my voice notes end up?

Inside the project the model picked at the 0.80 routing threshold. Lower confidence lands in Inbox. Every item is markdown by default.