Cost Per Token for AI Founders: 2026 Guide to Margins

The cheapest model on the pricing page can still wreck your margins. If you build AI products and don’t understand cost per token, you’re pricing blind.

This number is the cost of using a model to read text and generate text back. It sounds technical, but it reaches straight into your product margins, pricing, runway, and whether growth makes you richer or poorer.

So let’s strip the jargon out of it and turn it into something you can use.

The headline model price is only the sticker. Your product design sets the real bill.

How tokens turn into real spending

Tokens sound like a model issue. They’re really a finance issue. Every prompt, reply, and extra bit of context adds spend.

What a token is, in simple terms

A token is a chunk of text. It’s not always a full word.

As a rough rule, one token is about four characters, or around three-quarters of a word. “Hello world” is only two words, but it can count as three tokens once the model splits it up. That is why short-looking text can cost more than you expect.

The useful way to think about it is simple. The model doesn’t bill by sentence or by user. It bills by counted pieces of text.

Why input and output tokens cost different amounts

Input tokens are everything you send in. That includes the user’s message, your system prompt, earlier chat history, retrieved documents, and any uploaded context.

Output tokens are what the model writes back. On many 2026 pricing pages, output is still several times more expensive than input, because generating fresh text takes more compute. So if your product produces long answers, a cheap-looking model can become expensive fast.

A founder will often trim the prompt and miss the bigger issue. The reply is the part that keeps running the meter.

Why one small feature can use a lot of tokens

A simple chatbot feature rarely makes one call with one short prompt. In practice, it may send instructions, past messages, search results, tool output, and then ask for a final answer. That is a lot of text before the user even sees a reply.

The same problem shows up in summaries, document review, meeting notes, and AI agents. A feature that looks neat and light in the product can burn through thousands of tokens in one session.

How to calculate your true cost per token

Provider pricing pages give you the rate. Founders need the real maths behind one request, one user, and one month.

The formula every founder should use

Start with this:

Cost per request = (input tokens x input price / 1,000,000) + (output tokens x output price / 1,000,000)

If your request uses 2,000 input tokens and 800 output tokens, and your provider charges $3 per million input tokens and $15 per million output tokens, that request costs $0.018. Most providers quote in US dollars, so model it there first, then roll it into your sterling cash plan.

When you build this into an investor-grade SaaS financial model, token spend stops being a guess. You can see what it does to margin, burn, and runway.

Why average usage matters more than best-case pricing

Demo numbers are usually flattering. The prompt is short, the reply is tidy, and nobody uploads a messy document or asks six follow-up questions.

Real customers behave differently. They paste large blocks of text, ask for rewrites, keep long chat threads alive, and come back with edge cases. A cheap model in the demo can still be expensive in the wild because your average usage is nowhere near the best case.

So don’t price off your cleanest workflow. Price off what customers actually do.

A quick example of monthly AI spend

Say you have 800 paying users. Each user triggers 80 AI actions a month, and each action costs $0.015 in model spend.

That puts your monthly bill at $960 before you add retries, failed calls, logging, retrieval, or other bits around the workflow. Push usage a little higher, or let the model write longer answers, and the number moves quickly.

That is the point. A modest user base can create a real bill long before you feel “at scale”.

What drives token costs up in AI products

Model choice matters, but product decisions usually shape the bill more than founders expect.

Long prompts, chat history, and hidden context

Every extra line in your system prompt has a cost. Every previous message you resend has a cost. Every retrieved document chunk has a cost.

Teams often stuff prompts with rules, examples, and fallback instructions because it feels safer. Sometimes it helps. Sometimes it turns into a permanent tax on every single request.

If you resend 1,500 tokens of instructions each time, you’re paying for those instructions each time.

Larger models and richer outputs

The best model is not always the right model. Plenty of workflows do not need top-tier reasoning.

Classification, tagging, short rewrites, basic extraction, and first-pass summaries can often run well on cheaper models. Save premium models for harder tasks, high-value actions, or cases where quality changes the outcome.

Output length matters too. If you ask for detailed, polished, multi-paragraph responses everywhere, spend rises with it. When a short answer does the job, ask for a short answer.

Agents, tools, and repeated calls

AI agents can look efficient from the outside and be expensive underneath. One user action may trigger planning, search, tool use, review, and a final drafted response.

Each step can mean another model call. If the same data gets analysed more than once, the cost stacks again. Think of it like a taxi meter. The product may feel smooth, but the counter keeps running in the background.

How AI founders can lower cost without hurting product quality

This is where good product judgment meets good finance. Cost control is part of building a strong SaaS business, not a tidy-up job for later.

Use the right model for the job

Match model strength to task difficulty. Use lighter models for simple actions and keep premium models for work that needs better reasoning or judgement.

A common pattern is routing. Start with the cheaper model, then escalate only when confidence is low or the task is high stakes. Most users won’t care which model wrote the answer. They care that it works, it’s fast, and it’s worth the price.

Cut prompt length and reuse context wisely

Shorter prompts often improve both cost and clarity. Remove duplicate instructions, tighten your system messages, and summarise old chat history instead of replaying everything.

Reuse repeated context where you can. If the same document, company profile, or instruction set appears again and again, store the useful parts and avoid paying to rebuild them every time. Good prompt design is often about subtraction.

Track usage by customer, feature, and workflow

You need to know where spend comes from. That means tracking AI cost by customer, by feature, and by workflow, not as one lump in a monthly cloud bill.

When you do that, pricing gets better. Product decisions get better. You also spot loss-making behaviour early, whether that is one heavy customer on a flat plan or one flashy feature nobody pays enough to use.

If one customer uses 10 times more AI than everyone else, that isn’t a small detail. It is a margin problem.

How token cost should shape pricing and fundraising decisions

This is the commercial side of the story. If token economics are fuzzy, pricing is shaky and runway planning is worse.

Build pricing around gross margin, not guesswork

Founders need to know what it costs to serve a customer before setting the price. That sounds obvious. It gets missed all the time in AI products.

Flat pricing is risky when usage varies a lot. One customer may ask for five quick outputs a week. Another may run long documents through your product every day. If both pay the same, your margin can swing wildly.

That is why many AI SaaS companies end up with allowances, overages, credit systems, or premium usage tiers. The exact model can vary. The maths underneath it cannot.

Show investors you understand unit economics

Investors will ask what happens to margins as usage grows. If the answer is vague, confidence drops.

Clear token tracking shows that you understand your numbers, your pricing, and your path to scale. It also helps you explain whether usage growth improves revenue quality or only inflates infrastructure cost. That is where cohort-based revenue modelling for SaaS earns its keep, because it ties customer behaviour back to retention, payback, and margin.

Strong AI businesses do not only show growth. They show controlled growth.

Protect runway before costs scale faster than revenue

Token spend often rises before revenue catches up. Free trials, pilots, onboarding support, and product experiments all push usage up early.

If you wait until the monthly bill looks painful, you are late. Founders need to watch token cost alongside cash burn, forecast the impact of product changes, and stress-test what happens when usage doubles. AI bills do not wait politely for your next round.

Runway is not only about hiring pace. In an AI SaaS business, it is also about what every customer action costs you.

The number you can’t afford to ignore

Cost per token sounds like plumbing. It isn’t. It is one of the simplest ways to protect margin, price with confidence, and stop growth turning into a cash problem.

Founders who understand token usage build tighter products and cleaner unit economics. They raise with better answers and scale with fewer nasty surprises.

If you can explain your AI cost per request, per user, and per month, you are already building the proper way. That is the sort of finance discipline tech companies need if they want to grow, raise, and exit well.

Kishen Patel, BFP ACA Founder, Consult EFC · ICAEW Chartered Accountant · Fractional CFO

Over 12 years across Big Four audit, investment banking and corporate advisory. Kishen works with UK SaaS and AI companies on financial strategy, fundraising and board-level CFO support. ICAEW regulated. Big Four trained. Based in London.

Cost Per Token for AI Founders: What It Really Costs