Skip to content
โ† stanwood.dev

Which Model?

Stop overthinking it. Match the shape of the job to the shape of the model.

Good questions to ask first:

is quality or speed the bottleneck? does it need tools? how much context must fit? what mistake is expensive?

Opinionated recommendations, not gospel. Model landscape changes fast โ€” the directory below notes what's shifted since the snapshot. Last reviewed June 2026.

Quick picks

Skip the quiz. Six common asks, six ways to narrow the field.

Write polished prose
Pick for voice writing quality over benchmark rank

Use a model whose drafts need the fewest style edits. The best writing pick is the one you rewrite least.

CheckAsk for three versions and choose the one with the strongest default taste.

Code a feature in your app
Pick for tool use repo access, tests, edit loop

Coding work needs context and verification more than a clever single answer. Favor models and apps that can inspect files, edit, and run tests.

CheckGive it a small real bug and see whether it changes the right file.

Crack hard math, logic, or research
Pick reasoning slower, pricier, more deliberate

Reasoning models are worth the extra latency when each mistake is expensive and the answer has multiple dependent steps.

CheckAsk for assumptions, not just the final answer.

Summarize a huge document or codebase
Pick long context fits the source without chopping

The model cannot reason over pages it never saw. Context window matters most when the source is long and cross-referenced.

CheckAsk it to cite where in the source each claim came from.

Run high-volume production tasks
Pick cheap and fast latency and unit cost first

For tagging, extraction, routing, and simple Q&A, the right answer is usually the cheapest model that clears your eval.

CheckRun a 50-item sample before paying flagship prices.

Self-host or keep data local
Pick open weights control beats convenience

Open-weight models trade managed polish for privacy, offline use, customization, and predictable infrastructure costs.

CheckRead the license before building a business on it.

Don't see your ask? Scroll to the full directory and compare the traits underneath the recommendation.

Decode the AI lingo

Why these picks are these picks. Eight terms that quietly decide every choice.

working memory

Context window

Everything you paste โ€” prompt, attachments, prior chat โ€” has to fit. Bigger window means more code or more pages of a contract before you start chunking.

Use bigger windows for whole repos, long contracts, or many attached PDFs. Use smaller ones when the prompt is short and latency matters.

thinks before it answers

Reasoning model

A second class of model that runs an internal "scratchpad" before responding. Slower and pricier per query, but lands harder problems โ€” math, multi-step logic, code refactors.

Best for hard math, planning, and codebase refactors. Overkill for rewriting a paragraph or classifying a support ticket.

handles more than text

Multimodal

Reads images, audio, video, or PDFs as input โ€” not just text. Native multimodal models trained on all four feel sharper than ones bolted onto text-only bases.

Choose this when the task includes screenshots, charts, PDFs, audio, or video. Text-only work does not need the extra surface area.

the unit you pay by

Tokens

Sub-words the model reads and writes โ€” roughly 0.75 word per token. Pricing is "$X per million input tokens / $Y per million output." Output is usually 3โ€“5ร— more expensive than input.

Long inputs and long outputs both cost money. Summaries are cheap; repeated analysis loops over giant files are where bills grow.

fast vs. flagship

Tier (Fast ยท Mid ยท Flagship)

Each lab ships a tiered family. Fast tiers are 5โ€“20ร— cheaper and answer in <1s โ€” built for production pipelines. Flagship tiers cost more, think harder, and only make sense when quality matters more than latency.

Start with the cheapest tier that can do the job, then move up only when you see quality failures you can name.

can you run it yourself

Open vs. closed weights

Open-weight models publish the trained parameters โ€” you can self-host, fine-tune, or run offline. Closed-weight models live behind an API only. Open trades some quality for full control and zero per-call cost.

Use open weights for privacy, offline use, fine-tuning, or predictable unit economics. Use hosted APIs when managed quality matters more.

can it call other things

Tool use / agents

Whether the model can call APIs, run code, search the web, or read files mid-conversation โ€” instead of just generating text. The foundation under every "agent" headline.

Tool access is what turns a chat answer into a workflow: search, fetch, calculate, write, verify, repeat.

how recent its world is

Knowledge cutoff

The date training data ends. Anything after โ€” a release, a news event, a library version โ€” the model has to be told or look up. Models with web search smooth this over; pure-LLM answers don't.

For current events, prices, docs, or laws, use a model with retrieval or bring the sources yourself. Guessing from memory is the trap.

Match the term to the task. The Quick Picks above are shorthand โ€” these are the dials underneath them.

All 15 model families, side-by-side

A deliberately stable snapshot as of April 2026. The labels are family-level on purpose: exact release names move faster than this page should.

Showing 15 models, sorted by their strongest practical signal.

#1

Claude Opus

Anthropic

10 elite

The flagship shape. Use it when the answer has to be deeply considered.

Reasoning
10
Coding
9
Writing
9
Speed
3
Cost Efficiency
2
Long Context
10
  • hardest reasoning tasks
  • complex code architecture
  • nuanced long-form writing
#2

DeepSeek reasoning

DeepSeek

10 elite

Open-source reasoning powerhouse. Shockingly cheap.

Reasoning
10
Coding
9
Writing
7
Speed
5
Cost Efficiency
10
Long Context
7
  • hard reasoning on a budget
  • open-source reasoning tasks
  • complex code problems
#3

Midjourney

Midjourney

10 elite

The artist. Don't ask it to write code.

Image Generation
10
Image Understanding
โ€”
Speed
6
Cost Efficiency
5
Open Source
โ€”
Ecosystem
4
  • stunning visuals
  • creative direction
  • artistic imagery
#4

OpenAI reasoning

OpenAI

10 elite

The heavyweight thinker. Bring your hard problems.

Reasoning
10
Coding
9
Writing
6
Speed
3
Cost Efficiency
3
Long Context
7
  • hard math and logic
  • complex code problems
  • deep analysis
#5

Claude Sonnet

Anthropic

9 elite

The sweet spot. Smart, fast, and surprisingly affordable.

Reasoning
9
Coding
9
Writing
9
Speed
7
Cost Efficiency
6
Long Context
9
  • thoughtful writing
  • long documents
  • careful reasoning
#6

Flux Pro

Black Forest Labs

9 elite

The new hotness in image gen. Punches up.

Image Generation
9
Image Understanding
โ€”
Speed
7
Cost Efficiency
6
Open Source
7
Ecosystem
4
  • high-quality image generation
  • photorealistic outputs
  • open-weight option
#7

Gemini Pro

Google

9 elite

The context monster. Feed it your whole codebase.

Reasoning
9
Coding
8
Writing
7
Speed
6
Cost Efficiency
5
Long Context
10
  • massive documents
  • video understanding
  • multimodal analysis
#8

OpenAI mini reasoning

OpenAI

9 elite

The smart cheap one. Reasoning without the flagship price tag.

Reasoning
9
Coding
9
Writing
5
Speed
7
Cost Efficiency
8
Long Context
7
  • reasoning on a budget
  • hard math and code
  • high-volume reasoning tasks
#9

DALL-E

OpenAI

8 strong

Easy image gen inside the OpenAI world.

Image Generation
8
Image Understanding
โ€”
Speed
7
Cost Efficiency
6
Open Source
โ€”
Ecosystem
9
  • quick image generation
  • text in images
  • ChatGPT integration
#10

GPT flagship

OpenAI

8 strong

The all-around OpenAI pick. Strong ecosystem, strong tool use.

Reasoning
8
Coding
8
Writing
7
Speed
8
Cost Efficiency
6
Long Context
7
  • all-around tasks
  • tool integrations
  • multimodal work
#11

Llama

Meta

8 strong

The open-weight workhorse. Good when control matters.

Reasoning
8
Coding
8
Writing
7
Speed
7
Cost Efficiency
9
Long Context
9
  • self-hosting
  • privacy-sensitive work
  • multimodal open-source tasks
#12

Gemini Flash

Google

7 strong

Fast, cheap, and surprisingly capable.

Reasoning
7
Coding
7
Writing
6
Speed
9
Cost Efficiency
9
Long Context
9
  • fast multimodal tasks
  • large context on a budget
  • quick prototyping
#13

Mistral Large

Mistral

7 strong

The European contender. Solid all-around.

Reasoning
7
Coding
7
Writing
7
Speed
7
Cost Efficiency
7
Long Context
6
  • European data residency
  • multilingual work
  • balanced open-source option
#14

Claude Haiku

Anthropic

6 solid

Claude's fast little sibling. Great bang for buck.

Reasoning
6
Coding
7
Writing
6
Speed
9
Cost Efficiency
9
Long Context
7
  • high-volume processing
  • quick classification
  • budget API usage
#15

GPT mini

OpenAI

5 solid

Cheap, fast, and good enough for most things.

Reasoning
5
Coding
6
Writing
5
Speed
9
Cost Efficiency
9
Long Context
6
  • high-volume tasks
  • quick classification
  • budget-friendly apps

Models retire and new ones land. Spot something missing? let me know.