What is a confidence floor in task routing?

A confidence floor is the minimum probability an AI router needs before auto-filing an item. Below the floor, the item goes to the inbox and waits for a human. Most production routers set this at 0.8 for project assignment and 0.75 for tag extraction.

AI Task Routing: How It Works and Why Accuracy Matters

Q: What is AI task routing?

AI task routing is the decision an AI task manager makes about where a new item belongs: which project, whether it is a todo or a note, what the first next step is, and whether it duplicates something you already captured. It is the layer between transcription and storage.

Q: How accurate is AI task routing in 2026?

Good AI task routers hit 85 to 92 percent accuracy on project assignment when the user has 5 to 30 existing projects with clear names. Accuracy drops fast when project names are vague, when there are more than 50 projects, or when the input is intentionally ambiguous.

Q: Can AI task routing work offline?

Not usefully. On-device models exist for transcription, but routing with decent accuracy still needs a server round trip to a larger model. The pragmatic answer is to capture locally and route when the network is available, keeping the capture UX unblocked.

Q: How does AI task routing handle duplicates?

Good routers embed every capture and compute cosine similarity against recent items. If the similarity is above a threshold, around 0.88, the router flags the new item as a possible duplicate. It does not merge automatically; it surfaces the match and lets the user decide.

AI task routing decides which project a new thought belongs to. Here is how it works, how accurate it is in 2026, and the rules that keep it honest.

By Ege BeşeApril 6, 20269 min read

AI task routing is the decision layer between "here is a thought" and "here is a structured task in the right project". A router reads a captured input, decides whether it is a todo or a note, picks a project from the user's existing list, extracts a due date if one is hiding in the language, and flags possible duplicates. When routing works, the user barely notices it. When it fails, the failure is the single most expensive mistake an AI task manager can make.

This guide covers how AI task routing works in 2026, the accuracy numbers you should expect, and the rules that separate good routers from loud ones. For the wider picture of where routing fits inside an AI task manager, see our pillar on AI task managers.

Overhead flat-lay of a warm-paper desk with five small paper envelopes arranged along a soft routing line, a single card above them. — Every capture picks a route. A good router picks the same one you would have.

What is AI task routing and why does it matter?

AI task routing is the invisible core of an AI task manager. Without a router, the tool is a fancy transcription layer that dumps every input into a flat inbox. With a good router, captures land in the right project with a draft next step and a due date, and the user spends almost zero time filing. The gap between those two experiences is large enough that routing accuracy is effectively the product.

The routing layer has four jobs, in order:

Classification. Is this a todo, a note, or a question?
Project assignment. Which existing project does this belong to, or should we create a new one?
Field extraction. What is the title, the next step, the due date, the people involved?
Duplicate detection. Does this match something we already captured recently?

Each job has its own failure mode, its own confidence threshold, and its own cost when it goes wrong.

How does AI task routing actually work?

AI task routing in 2026 runs a three-model pipeline: a classifier for type, an embedding model for project matching, and a large language model for extraction. The pipeline is wrapped in a rules engine that applies confidence floors and falls back to the inbox when any of the models are uncertain.

The pipeline in order:

Pre-process. Deterministic date parsing runs first on phrases like "tomorrow", "next Friday", "in two hours". A rules engine is more predictable than asking an LLM to do date math.
Classify. A small classifier decides todo vs note vs question. Passive-voice detection is key here: "I noticed we ship too many tickets on Fridays" is a note, not a todo.
Embed. The input is embedded with a sentence-transformer or an OpenAI embedding model. The embedding is compared against the embeddings of the user's existing project descriptions.
Extract. A larger LLM fills in the structured fields: title, next step, tags, involved people. The LLM is constrained to a JSON schema to prevent hallucination.
Deduplicate. The capture's embedding is compared against recent captures. Cosine similarity above 0.88 flags a possible duplicate.
Apply confidence floors. If project match is below 0.8, send to inbox. If the extracted due date confidence is below 0.6, drop the due date, keep the todo.

How accurate is AI task routing in 2026?

Good AI task routers hit 85 to 92 percent project assignment accuracy in 2026 when two conditions hold: projects have clear names, and there are fewer than 30 of them. Below those conditions, accuracy falls off fast. With 50 projects and vague names like "Stuff" or "Work Tomorrow", accuracy drops into the 60s and the user stops trusting the router.

The three factors that decide routing accuracy, in order:

Project name clarity. A project called "Q2 launch" routes better than one called "Work". The router has more to anchor on.
Project count. Ten projects is easy. Thirty is hard. Fifty is the edge of useful. More than fifty projects means the user needs a different organization strategy before the router can help.
Input phrasing. "follow up with Maya about the onboarding doc" routes well. "that thing we talked about" does not route at all.

What is a confidence floor and why does it matter?

A confidence floor is the minimum probability an AI router needs before taking an action. Below the floor, the router demotes the item: sends it to the inbox instead of auto-filing, drops the due date instead of guessing, holds back on creating a new project. Confidence floors are the single most important design decision in a routing system because they decide how often the user experiences an auto-file as "correct" vs "spooky".

Three floors that matter most in production:

Decision	Typical floor	Below the floor
Project assignment	0.80	Send to inbox
New project creation	0.80	Demote to inbox, log trace
Due date extraction	0.60	Keep todo, drop the date
Tag assignment	0.75	Drop the tag

These numbers are not arbitrary. They come from user studies where trust decay was measured as a function of false-positive rate. Once auto-filing is wrong more than one time in five, users start re-reading every card to double-check. Once that check habit forms, the AI gain is zero regardless of accuracy.

How does AI task routing handle duplicates?

AI task routing handles duplicates by embedding every capture and comparing it against recent captures using cosine similarity. Above a threshold, usually 0.88, the router flags the new item as a possible duplicate of the nearest match. It does not merge automatically. Auto-merging is too destructive for the user to trust; surfacing is enough.

The duplicate detection pipeline is short:

Embed the new capture.
Compute cosine similarity against embeddings of items captured in the last 30 days.
If max similarity > 0.88, attach the match to the new item's metadata.
Render a "you may have captured this before" hint in the UI.
Let the user merge, keep both, or dismiss.

Can AI task routing work offline?

AI task routing cannot work offline in a useful way in 2026. Transcription has viable on-device models (Whisper small, Apple's on-device speech model) but project routing still needs either a cloud LLM call or a specialized on-device model that no consumer product has shipped yet. The pragmatic answer is to decouple capture from routing: let capture happen locally and instantly, and let routing happen when the network is available.

For a longer discussion of the capture side specifically, see our guide on voice-to-task capture. For how confidence floors cascade into the daily inbox ritual, see inbox zero with AI.

What should I look for in a router before committing to an AI task manager?

Look for these five properties, in order. If a router is missing one, expect to lose trust in it within a month.

A visible confidence score on every auto-routed item. Hidden confidence is a smell. It usually means the system cannot distinguish its own certain decisions from its uncertain ones.
A demotion path to the inbox. When the router is uncertain, the item belongs in the inbox, not in a random project.
A trace of the routing decision. "Why did this land in the Launch project?" should have an answer visible to the user, not just to the developer.
Duplicate surfacing, not auto-merge. See above.
A published eval or at least published accuracy numbers. Vendors who will not publish accuracy numbers rarely have good ones.

The first two are non-negotiable for trust. The last three are the difference between a router you use and a router you believe.

References

OpenAI Embeddings documentation, OpenAI.
Sentence-Transformers library, UKP Lab, TU Darmstadt.
Chrono.js natural language date parser, Wanasit Tanakitrungruang.
STS-B semantic similarity benchmark, GLUE benchmark suite.
OpenAI Whisper, Radford et al., 2022.

ShareShare on X

Keep reading