Voice notes to tasks, without the cleanup
Turn voice notes into clean tasks with quik.md. Speak a fragment on a walk, get a parsed next step in the right project, exportable as markdown.
Updated April 25, 2026 · 6 min read

Most voice memos die unsorted in a recording app. quik.md transcribes, classifies, and routes them into the right project with the next step already written. The whole capture takes a few seconds, and the result is a clean task you can act on without editing.
Why voice notes usually rot
A voice memo is the easiest thing to capture and the hardest thing to revisit. You record a fragment on a walk, get back to your desk, and the file sits in a recording app with a name like New Memo 47.m4a. Re-listening costs more than the idea is worth, so most memos quietly die.
quik.md collapses that loop. The mic is the same single capture surface as typing. You speak, it files. The next step is written before you sit down.
How the voice flow works inside quik
Four moves. First, you tap the mic in the inbox. No project picker, no tag chooser. Second, audio streams to OpenAI Whisper for transcription. Third, the transcript goes through the same parser as typed capture: classification, project routing, next-step extraction. Fourth, the item lands in the routed project (or Inbox at low confidence) as a markdown task with a clean verb.
For the deeper version of this loop see voice-to-task capture. For how the parser decides what to file where, see AI task routing.
A real walking memo, parsed
Raw voice transcript:
"ok onboarding feels long, we should probably cut step three, also remind me to follow up with priya on the partnership deck thursday, and that pricing anchor idea, two columns with annual toggle on by default."
quik turns that into three items:
- Note in
Onboarding Revamp: "Onboarding feels long. Consider cutting step 3." - Task in
Q2 Launch, due Thursday: "Follow up with Priya on partnership deck." - Note in
Pricing: "Pricing anchor idea: two-column layout, annual toggle on by default."
One memo, three intents, three project landings. You did not open three apps and you did not file anything.
What the transcription stack actually keeps
The audio itself is streamed through Whisper and dropped after the response. quik does not persist voice files. The transcript is stored as the item content; the metadata sits in ai_runs for rate-limit accounting only. If you need legal-grade retention, this is the wrong tool — and that is on purpose.
When voice beats typing
- Walking, driving, between meetings — any context where typing breaks the thought.
- Long reflective dumps where you cannot keep up with your own brain.
- Multi-intent rambles where filing into separate apps would cost more than the ideas are worth.
When typing still wins
- Short, clean tasks with a known due date. "Email Sam Friday" is faster typed than spoken.
- Sensitive content you would rather not transcribe.
- Code snippets, URLs, or technical strings that voice models still get wrong.
Voice on free tier and offline
Free tier users get the Web Speech API path: in-browser transcription, no server call, no Whisper. Quality drops, but capture works. Items land with ai_status='queued' and run through the AI organize step the next time you go online and have Pro. The AI inbox design assumes capture must always succeed even when intelligence cannot.
Firefox does not ship SpeechRecognition. quik shows a one-shot toast there instead of failing silently. No destructive action.
How this compares to other voice-to-task tools
| Tool | Voice path | Routing | Markdown export | Best for |
|---|---|---|---|---|
| Apple Reminders | Siri only | None | Partial | Light reminders inside Apple |
| Todoist | Mobile dictation | Limited NLP | Yes | Disciplined list-keepers |
| Otter.ai | Long meetings | None | Limited | Meeting transcripts, not tasks |
| Voice memos app | Raw audio only | None | None | Capture, no routing |
| quik.md | Whisper + AI organize | Project-aware | Native | Messy capture into clean tasks |
For more comparisons, see quik vs Apple Reminders and quik vs Todoist.
Who voice-first capture is for
- Founders who think on walks and lose half the ideas before they sit back down.
- Field workers and clinicians who cannot type between encounters.
- PMs taking back-to-back meetings who need action items without a separate transcription pass.
- Writers and researchers who reach for the mic when typing breaks the sentence.
If you live in a structured Obsidian vault and only want long-form notes, see quik vs Obsidian first — quik is a capture surface, not a knowledge graph.
Markdown export and portability
Every voice-captured item is markdown by default. Copy-as-markdown on any item, export a project as a single markdown file, or move the whole thing to a markdown-native workflow. Lock-in is a product smell, and voice is no exception.
FAQ
How does quik.md turn voice notes into tasks?
You tap the mic, speak, and stop. Audio is streamed to Whisper for transcription, then the transcript runs through the same classification path as typed input. The model decides task vs note, picks a project, and writes the next step in plain English. Audio is discarded after transcription.
Can it handle long, rambling memos with multiple ideas?
Yes. A forty-second memo with three buried todos lands as three separate items, each classified and routed on its own. quik treats the transcript as a stream of intents, not one giant blob.
Does it work offline?
Voice falls back to the browser's Web Speech API when you are offline or on free tier. The transcript queues for AI organize when you come back online. Firefox does not ship Web Speech, so it shows a one-shot toast.
Is voice a Pro feature?
Voice capture itself is not Pro-gated. Whisper-backed server transcription and the AI organize step run on Pro. Free tier still captures via Web Speech.
Where do my voice notes end up?
Inside the project the model picked at the 0.80 routing threshold. Lower confidence lands in Inbox. Every item is markdown by default.