Show HN: Price Per Token – LLM API Pricing Data

numlocked · 2025-07-25T17:45:53 1753465553

(I work at OpenRouter)

We have solved this problem by working with the providers to implement a prices and models API that we scrape, which is how we keep our marketplace up to date. It's been a journey; a year ago it was all happening through conversations in shared Slack channels!

The pricing landscape has become more complex as providers have introduced e.g. different prices for tokens depending on prompt length, caching, etc.

I do believe the right lens on this is actually the price per token by endpoint, not by model; there are fast/slow versions, thinking/non-thinking, etc. that can sometimes also vary by price.

The point of this comment is not to self promote, but we have put a huge amount of work into figuring all of this out, and have it all publicly available on OpenRouter (admittedly not in such a compact, pricing-focused format though!)

tekacs · 2025-07-25T18:33:38 1753468418

I tried making it compact and easy just now! Thanks so much for the effort!

https://github.com/tekacs/llm-pricing

sureglymop · 2025-07-26T10:02:13 1753524133

Hey! I am currently very close to having blown through the yearly token budget of my Kagi subscription, which I used mainly for Kagi Assistant.

Does OpenRouter offer a similar way to select any model in a user friendly way? Meaning, can I add some budget, hook it up to a desktop application and then just have a convenient selection of models that I can start using?

And also, which model would you use to not break the bank? Currently I am mostly using Gemini 2.5 Pro but perhaps there is already a cheaper and better option.

osigurdson · 2025-07-26T00:48:24 1753490904

By endpoint, do you mean price by token by API shape? Perhaps my phrasing is even more confusing but that is how I see it. I.e. there are API "shapes" for which as long as the shape of the API is the same, my application can use it interchangeably with others. Other dimensions are quality, speed, acceptable error rates, etc., which naturally influence pricing.

sophia01 · 2025-07-25T14:53:42 1753455222

But the data is... wrong? Google Gemini 2.5 Flash-Lite costs $0.10/mtok input [1] but is shown here as $0.40/mtok?

[1] https://ai.google.dev/gemini-api/docs/pricing#gemini-2.5-fla...

alexellman · 2025-07-25T15:07:53 1753456073

the data is not wrong you are reading my table wrong

edit: my bad I was wrong shouldnt have responded like this

unglaublich · 2025-07-25T18:01:01 1753466461

Ouch, bad response for someone with a business!

lippihom · 2025-07-27T23:42:44 1753659764

Unglaublich!

GaggiX · 2025-07-25T15:15:13 1753456513

The input is wrong tho

Your website reports 0.30$ for input and that wouldn't make any sense as it would be priced the same as the bigger Flash model.

alexellman · 2025-07-25T15:48:14 1753458494

ok yeah fixed that one, sorry...

Imustaskforhelp · 2025-07-25T16:23:13 1753460593

such level of condescending behaviour when you yourself are wrong is not allowed.

Put a really really bad taste in my mouth.

gompertz · 2025-07-25T17:47:31 1753465651

First poster could have approach better too. Like "Cool site! I think I may see an error on one item?". Instead of going right to a 'wrong' angle as if all the data should be discredited. I get highly triggered by this too.

arccy · 2025-07-25T18:05:11 1753466711

this overly positive attitude triggers a bunch of people too. wrong data should just be called out, especially if that's your main selling point.

mynameisvlad · 2025-07-26T00:16:44 1753489004

Nobody disagrees with that. But tact exists.

copperx · 2025-07-25T17:57:32 1753466252

But why is condescension tolerable when the person is right?

Imustaskforhelp · 2025-07-25T18:56:34 1753469794

It is not but its order of magnitudes worse if the person is wrong.

awongh · 2025-07-25T14:09:58 1753452598

This is great, but as others have mentioned the UX problem is more complicated than this:

- for other models there are providers that serve the same model with different prices

- each provider optimizes for different parameters: speed, cost, etc.

- the same model can still be different quantizations

- some providers offer batch pricing (e.g., Grok API does not)

And there are plenty of other parameters to filter over- thinking vs. non-thinking, multi-modal or not, etc. not to even mention benchmarks ranking.

https://artificialanalysis.ai gives a blended cost number which helps with sorting a bit, but a blended cost model for input/output costs are going to change depending on what you're doing.

I'm still holding my breath for a site that has a really nice comparison UI.

Someone please build it!

numlocked · 2025-07-25T17:47:30 1753465650

(I work at OpenRouter)

We have a simple model comparison tool that is not-at-all-obvious to find on the website, but hopefully can help somewhat. E.g.

https://openrouter.ai/compare/qwen/qwen3-coder/moonshotai/ki...

pzo · 2025-07-26T06:08:03 1753510083

but this is is not much user friendly unless you already know what models you want to compare. I would prefer I switch some toggles or make kind of a query what kind of models I'm looking for my use case and then sort by speed or price at the end e.g query:

I want model:

1) with audio input

2) minimum 50 tps speed

3) max price less than $1 input and less than $3 output

4) need to support english only or need to support polish etc.

sort by WER or some benchmark, dispaly charts etc.

edit:

extra bonus if I can tell how big typical my prompt will be like 20 seconds audio ant it will figure out how many tokens it will be because e.g. gemini 2.0 flash hide it very deep that its supposed to be 32 tokens per 1 second. Same hard to find how many tokens is for image input sometimes. Would be good where I can also attach some text or sample input for a query to do the calculation

Ideally write such prompt and it show me results or map this prompt to SQL that executes in your data so I can tweak SQL query on website. It doesn't have to be SQL can be some other simple but deterministic query language.

zeroCalories · 2025-07-25T17:16:46 1753463806

I think it would be very hard to make a fair comparison. Best you could do is probably make the trade-offs clear and let people make their own choices. I think it could be cool to make something like a token exchange where people put up their requirements, and then companies offer competing services that fit those requirements. Would be cool to let random people offer to their compute, but you would need to find a way to handle people lying about their capabilities or stealing data.

alexellman · 2025-07-25T14:24:27 1753453467

would a column for "provider" meaning the place you are actually making the call to solve this

svachalek · 2025-07-25T14:46:34 1753454794

Please not benchmark ranking. We've encouraged this nonsense far too long already.

pierre · 2025-07-25T13:44:24 1753451064

Main issue is that token are not equivalent across provider / models. With huge disparity inside provider beyond the tokenizer model:

- An image will take 10x token on gpt-4o-mini vs gpt-4.

- On gemini 2.5 pro output token are token except if you are using structure output, then all character are count as a token each for billing.

- ...

Having the price per token is nice, but what is really needed is to know how much a given query / answer will cost you, as not all token are equals.

alexellman · 2025-07-25T14:10:28 1753452628

yeah I am going to add an experiment that runs everyday and the cost of that will be a column on the table. It will be something like summarize this article in 200 words and every model gets the same prompt + article

bigiain · 2025-07-26T02:11:55 1753495915

For me, and I suspect a lot of other HN readers, a comparison/benchmark on a coding task would be more useful. Something small enough that you can affordably run it every day across a reasonable range of coding focused models, but non trivial enough to be representative of day to day AI assisted coding.

One other idea - for people spending $20 or $200/month for AI coding tools, a monitoring service that tracks and alerts on detected pricing changes could be something worth paying for. I'd definitely subscribe at $5/month for something like that, and I'd consider paying more, possibly even talking work into paying $20 or $30 per month.

BonoboIO · 2025-07-25T13:48:01 1753451281

On gemini 2.5 pro output token are token except if you are using structure output, then all character are count as a token each for billing.

Can you elaborate this? I don’t quite understand the difference.

rsanek · 2025-07-25T22:52:55 1753483975

I hadn't heard of this before either and can't find anything to support it on the pricing page.

https://ai.google.dev/gemini-api/docs/tokens

mythz · 2025-07-25T13:55:33 1753451733

There was a time when it was unbelievably frustrating to navigate the bunch of marketing pages required to find the cost of a newly announced model, now I just look at OpenRouter to find pricing.

CharlesW · 2025-07-25T14:06:47 1753452407

Site is down as I type this, but a shout-out to Simon Willison's LLM pricing calculator: https://www.llm-prices.com/

paradite · 2025-07-25T15:54:11 1753458851

It's actually more complex than just input and output tokens, there are more pricing rules by various providers:

- Off-peak pricing by DeepSeek

- Batch pricing by OpenAI and Anthropic

- Context window differentiated pricing by Google and Grok

- Thinking vs non-thinking token pricing by Qwen

- Input token tiered pricing by Qwen coder

I originally posted here: https://x.com/paradite_/status/1947932450212221427

criddell · 2025-07-25T14:07:01 1753452421

If you had a $2500ish budget for hardware, what types of models could you run locally? If $2500 isn't really enough, what would it take?

Are there any tutorials you can recommend for somebody interested in getting something running locally?

cogman10 · 2025-07-25T14:42:06 1753454526

This is where you'd start for local: https://ollama.com/

You can, almost, convert the number of nodes to gb of memory needed. For example, Deepseek-r1:7b needs about 7gb of memory to run locally.

Context window matters, the more context you need, the more memory you'll need.

If you are looking for AI devices at $2500, you'll probably want something like this [1]. A unified memory architecture (which will mean LPDDR5) will give you the most memory for the least amount of money to play with AI models.

[1] https://frame.work/products/desktop-diy-amd-aimax300/configu...

mark_l_watson · 2025-07-25T14:46:45 1753454805

I bought a Mac Mini M2Pro 32G 18 months ago for $1900. It is sufficient to run good up to and including 40B local models that are quantized.

When local models don’t cut it, I like Gemini 2.5 flash/pro and gemini-cli.

There are a lot of good options for commercial APIs and for running local models. I suggest choosing a good local and a good commercial API, and spend more time building things than frequently trying to evaluate all the options.

criddell · 2025-07-25T15:30:24 1753457424

Are there any particular sources you found helpful to get started?

It's been a while since I checked out Mini prices. Today, $2400 buys an M4 Pro with all the cores, 64GB RAM, and 1TB storage. That's pleasantly surprising...

mark_l_watson · 2025-07-25T15:58:35 1753459115

You can read my book on local models with Ollama free online: https://leanpub.com/ollama/read

criddell · 2025-07-25T16:33:56 1753461236

Awesome, thanks!

dstryr · 2025-07-25T14:54:42 1753455282

I would purchase [2] used 3090's as close to $600 as you can. The 3090 still remains the price-performance king.

yieldcrv · 2025-07-25T15:51:46 1753458706

the local side of things with an $7,000 - $10,000 machine (512gb fast memory, cpu and disk) can almost reach parity with regard to text input and output and 'reasoning', but lags far behind for multimodal anything: audio input, voice output, image input, image output, document input.

there are no out the box solutions to run a fleet of models simultaneously or containerized either

so the closed source solutions in the cloud are light years ahead and its been this way for 15 months now, no signs of stopping

omneity · 2025-07-25T15:58:07 1753459087

Would running vLLM in docker work for you, or do you have other requirements?

yieldcrv · 2025-07-25T16:15:30 1753460130

its not an image and audio model, so I believe it wouldn't work for me by itself

would probably need multiple models running in distinct containers, with another process coordinating them

redox99 · 2025-07-25T17:36:38 1753464998

Kimi and deepseek are the only models that don't feel like a large downgrade from the typical providers.

skeezyboy · 2025-07-25T15:57:00 1753459020

you can run ollama stuff with just a decent cpu for some of them

NitpickLawyer · 2025-07-25T14:08:49 1753452529

> The only place I am aware of is going to these provider's individual website pages to check the price per token.

Openrouter is a good alternative. Added bonus that you can also see where the open models come in, and can make an educated guess on the true cost / size of a model, and how likely it is it's currently subsidised.

danenania · 2025-07-25T16:04:35 1753459475

OpenRouter also has an endpoint for listing models (with pricing info) in its api: https://openrouter.ai/docs/overview/models

A limitation though, at least the last time I checked, is that you only get a single provider returned per model. That’s fine for the major commercial models that all have the same pricing on each provider, but makes it hard to rely on for open source models, which tend to have many providers offering them at different price points (sometimes very different price points—like 5x or 10x difference).

bananapub · 2025-07-25T13:27:36 1753450056

surprising that you didn't find any of the existing ones, including our own simonw's: https://www.llm-prices.com

jjani · 2025-07-25T15:41:40 1753458100

Another one, with many more models: https://www.helicone.ai/llm-cost

alexellman · 2025-07-25T13:35:35 1753450535

I searched around on Google and couldn't find anything

xnx · 2025-07-25T13:46:22 1753451182

https://www.google.com/search?q=llm+price+comparison

callbacked · 2025-07-25T13:29:38 1753450178

Awesome list, any chance of adding OpenRouter? Looking at their website seems like it would be a pain to scrape all of that due to the site's layout.

murshudoff · 2025-07-25T13:37:40 1753450660

https://openrouter.ai/docs/api-reference/list-available-mode... OpenRouter has an endpoint to get models and their pricing

alexellman · 2025-07-25T13:32:55 1753450375

Yeah I am going to be adding more sources like that and Groq but just wanted to start with the basics and see if it resonated

nisegami · 2025-07-25T13:38:48 1753450728

How consistent is the tokenization across different model families? It always served as a mental hangup for me when comparing LLM inference pricing.

alexellman · 2025-07-25T13:56:18 1753451778

They all tokenize a little differently so they are not exactly 1-1. However I plan on addressing this by having each model complete a test task and getting the actual price from each api + token count to make a real 1-1 comparison.

esafak · 2025-07-25T14:48:14 1753454894

And please timestamp the benchmarks, and rerun them periodically, so vendors can't quietly cost optimize the model when no-one's looking.

nisegami · 2025-07-25T13:57:40 1753451860

Ah, that's a great idea and would be a welcome addition to the site.

aaronharnly · 2025-07-25T13:50:05 1753451405

Can you gather historical information as well? I did a bit of spelunking of the Wayback Machine to gather a partial dataset for OpenAI, but mine is incomplete. Future planning is well-informed by understanding the trends — my rough calculation was that within a model family, prices drop by about 40-80% per 12 months.

alexellman · 2025-07-25T13:58:23 1753451903

Yeah I am planning on setting up automatic scraping and just having my own database. Maybe could add historical data beyond as well but just gonna save all my own data for now

can16358p · 2025-07-25T13:41:49 1753450909

Does anyone know why o1-pro is more expensive than o3-pro?

infecto · 2025-07-25T13:45:30 1753451130

That’s been the case for a lot of the models. As they release new models those models often include optimizations that bring runtime costs down. I don’t have the data to back it up but it’s felt like chip cycles. There is a new model that is better but more expensive. Then further iterations on that model being costs down.

uponasmile · 2025-07-25T13:34:13 1753450453

Well done. The UX is solid. Clean, intuitive, and the use of color makes everything instantly clear

alexellman · 2025-07-25T13:36:08 1753450568

thanks I appreciate that

tekacs · 2025-07-25T18:07:58 1753466878

I've run into this a ton of times and these websites all kinda suck. Someone mentioned the OpenRouter /models endpoint in a sibling comment here, so I quickly threw this together just now. Please feel free to PR!

https://github.com/tekacs/llm-pricing

  llm-pricing

  Model                                     | Input | Output | Cache Read | Cache Write
  ------------------------------------------+-------+--------+------------+------------
  anthropic/claude-opus-4                   | 15.00 | 75.00  | 1.50       | 18.75      
  anthropic/claude-sonnet-4                 | 3.00  | 15.00  | 0.30       | 3.75       
  google/gemini-2.5-pro                     | 1.25  | 10.00  | N/A        | N/A        
  x-ai/grok-4                               | 3.00  | 15.00  | 0.75       | N/A        
  openai/gpt-4o                             | 2.50  | 10.00  | N/A        | N/A        
  ...

---

  llm-pricing calc 10000 200 -c 9500 opus-4 4.1

  Cost calculation: 10000 input + 200 output (9500 cached, 5m TTL)
  
  Model                      | Input     | Output    | Cache Read | Cache Write | Total    
  ---------------------------+-----------+-----------+------------+-------------+----------
  anthropic/claude-opus-4    | $0.007500 | $0.015000 | $0.014250  | $0.178125   | $0.214875
  openai/gpt-4.1             | $0.001000 | $0.001600 | $0.004750  | $0.000000   | $0.007350
  openai/gpt-4.1-mini        | $0.000200 | $0.000320 | $0.000950  | $0.000000   | $0.001470
  openai/gpt-4.1-nano        | $0.000050 | $0.000080 | $0.000237  | $0.000000   | $0.000367
  thudm/glm-4.1v-9b-thinking | $0.000018 | $0.000028 | $0.000333  | $0.000000   | $0.000378

---

  llm-pricing opus-4 -v

  === ANTHROPIC ===

  Model: anthropic/claude-opus-4
    Name: Anthropic: Claude Opus 4
    Description: Claude Opus 4 is benchmarked as the world's best coding model, at time of release, 
    bringing sustained performance on complex, long-running tasks and agent workflows. It sets new 
    benchmarks in software engineering, achieving leading results on SWE-bench (72.5%) and 
    Terminal-bench (43.2%).
    Pricing:
      Input: $15.00 per 1M tokens
      Output: $75.00 per 1M tokens
      Cache Read: $1.50 per 1M tokens
      Cache Write: $18.75 per 1M tokens
      Per Request: $0
      Image: $0.024
    Context Length: 200000 tokens
    Modality: text+image->text
    Tokenizer: Claude
    Max Completion Tokens: 32000
    Moderated: true

tekacs · 2025-07-26T03:17:25 1753499845

Cache pricing tweaked & fixed since the above.

jjani · 2025-07-26T08:46:25 1753519585

This looks cool, would like to try it out but cargo version is old (doesn't match readme) and it's not clear what platforms the binaries support.

tekacs · 2025-07-26T22:28:44 1753568924

Pushed a new version to cargo, for now. Github Releases is giving me headaches. >.<

tekacs · 2025-07-26T20:43:28 1753562608

Will fix!

ashwindharne · 2025-07-25T14:29:01 1753453741

KV caching is priced and managed quite differently between providers as well. Seeing as it becomes a huge chunk of the actual tokens used, wondering if there's an easy way to compare across providers.

lucasoshiro · 2025-07-25T13:58:37 1753451917

"OpenAI, Anthropic, Google and more", where "and more" = 0. Where's Gemma, DeepSeek, etc?

The UI, however, is really clean and straight to the point. I like the interface, but miss the content

hopelite · 2025-07-25T14:07:37 1753452457

That was my first thought too.

Mistral, Llama, Kimi, Qwen…?

v5v3 · 2025-07-25T15:19:09 1753456749

Same.

Not a site if value unless it contains whole of the market.

kb_geek · 2025-07-25T13:55:18 1753451718

Nice! It will be good to also pull in leaderboard rankings and/or benchmarks for each of these models, so we understand capability perhaps from lmsys (not sure if there is a better source)

Fripplebubby · 2025-07-25T15:49:00 1753458540

Maybe I am blinded by my own use case, but I find the caching pricing and strategy (since different providers use a different implementation of caching as well as different pricing) to be a major factor rather than just the "raw" per token cost, and that is missing here, as well as on the Simon Willison site [1]. Do most people just not care / not use caching that much that it matters?

[1] https://llm-prices.com/

MattSayar · 2025-07-25T16:19:26 1753460366

I know at least a couple LLM providers will do some caching for you automatically now, which muddies the waters a bit. [0]

[0] https://developers.googleblog.com/en/gemini-2-5-models-now-s...

l5870uoo9y · 2025-07-25T14:01:17 1753452077

It appears that GPT-4.1 is missing, but nano and mini are there.

nikvdp · 2025-07-25T16:22:12 1753460532

there's also http://llmprices.dev. similar, but with a searchbox for quick filtering

james2doyle · 2025-07-25T16:02:05 1753459325

Nice. I think I prefer https://models.dev/ as it seems more complete

fronty · 2025-07-25T16:46:26 1753461986

We are working on a similar problem, https://apiraces.com, to personalize the cost calculation of your llm api use case,

We have uploaded mostly the openrouter api models, but trying to do it in a useful way to personalize calculation and comparison. If someone would like to test or have a demo, we will be glad for any feedback.

techbuilder4242 · 2025-07-25T17:14:41 1753463681

Cool!

jalopy · 2025-07-25T14:02:02 1753452122

Super valuable resource - thanks!

What tools / experiments out there exist to exercise these cheaper models to output more tokens / use more CoT tokens to achieve the quality of more expensive models?

eg, Gemini 2.5 flash / pro ratio is 1 1/3 for input, 1/8 for output... Surely there's a way to ask Flash to critique it's work more thoroughly to get to Pro level performance and still save money?

binarymax · 2025-07-25T15:09:45 1753456185

Does anyone have an API that maintains a list of all model versions for a provider? I hand-update OpenAI into a JSON file that I use for cost reporting in my apps (and in an npm package called llm-primitives).

Here's the current version:

    const pricesPerMillion = {
        "o1-2024-12-17": { input: 15.00, output: 60.00 },
        "o1-mini-2024-09-12": { input: 1.10, output: 4.40 },
        "o3-mini-2025-01-31": { input: 1.10, output: 4.40 },
        "gpt-4.5-preview-2025-02-27": { input: 75.00, output: 150.00 },
        "gpt-4o": { input: 5.00, output: 15.00 },
        "gpt-4o-2024-08-06": { input: 2.50, output: 10.00 },
        "gpt-4o-2024-05-13": { input: 5.00, output: 15.00 },
        "gpt-4o-mini": { input: 0.15, output: 0.60 },
        "gpt-4o-mini-2024-07-18": { input: 0.15, output: 0.60 },
        "gpt-4-0613": { input: 30.00, output: 60.00 },
        "gpt-4-turbo-2024-04-09": { input: 10.00, output: 30.00 },
        "gpt-3.5-turbo": { input: 0.003, output: 0.006 },
        "gpt-4.1": { input: 2.00, output: 8.00 },
        "gpt-4.1-2025-04-14": { input: 2.00, output: 8.00 },
        "gpt-4.1-mini": { input: 0.40, output: 1.60 },
        "gpt-4.1-mini-2025-04-14": { input: 0.40, output: 1.60 },
        "gpt-4.1-nano": { input: 0.10, output: 0.40 },
        "gpt-4.1-nano-2025-04-14": { input: 0.10, output: 0.40 },
        "gpt-4o-audio-preview-2024-12-17": { input: 2.50, output: 10.00 },
        "gpt-4o-realtime-preview-2024-12-17": { input: 5.00, output: 20.00 },
        "gpt-4o-mini-audio-preview-2024-12-17": { input: 0.15, output: 0.60 },
        "gpt-4o-mini-realtime-preview-2024-12-17": { input: 0.60, output: 2.40 },
        "o1-pro-2025-03-19": { input: 150.00, output: 600.00 },
        "o3-pro-2025-06-10": { input: 20.00, output: 80.00 },
        "o3-2025-04-16": { input: 2.00, output: 8.00 },
        "o4-mini-2025-04-16": { input: 1.10, output: 4.40 },
        "codex-mini-latest": { input: 1.50, output: 6.00 },
        "gpt-4o-mini-search-preview-2025-03-11": { input: 0.15, output: 0.60 },
        "gpt-4o-search-preview-2025-03-11": { input: 2.50, output: 10.00 },
        "computer-use-preview-2025-03-11": { input: 3.00, output: 12.00 }
    };

I would love to replace this with an API call.

urbandw311er · 2025-07-25T15:12:07 1753456327

Check out the source code to the vercel AI SDK. I’ve noticed that they broker calls out to various LLMs and then seem to return the cost as part of the response. So I’m thinking that this data could well be in there somewhere. Away from my desk right now so can’t check.

stogot · 2025-07-25T15:15:29 1753456529

I do this with other tools

1. Pull some large tech company’s open source’ tools’ JS file 2. Extract an internal JSON blob that contains otherwise difficult information 3. Parse it and use what I need from within it for my tool

zerocool0101 · 2025-07-25T15:25:36 1753457136

this is a snippet of the structure of the JSON file for this website if it helps: { "provider_id": 6, "provider": 7, "input_price_per_1m_tokens": 8, "output_price_per_1m_tokens": 9, "response_time_ms": 10, "actual_cost_usd": 11, "input_cost_per_word_usd": 12, "output_cost_per_word_usd": 13, "has_tiered_pricing": 14 }, "anthropic:claude-opus-4", "Anthropic Claude Opus 4", 15, 75, 1443.85168793832, 0.045, 0.00006, 0.0003, false, { "provider_id": 16, "provider": 17, "input_price_per_1m_tokens": 18, "output_price_per_1m_tokens": 8, "response_time_ms": 19, "actual_cost_usd": 20, "input_cost_per_word_usd": 21, "output_cost_per_word_usd": 12, "has_tiered_pricing": 14 }, "anthropic:claude-sonnet-4", "Anthropic Claude Sonnet 4", 3, 1568.72692800385, 0.009, 0.000012, { "provider_id": 23, "provider": 24, "input_price_per_1m_tokens": 25, "output_price_per_1m_tokens": 26, "response_time_ms": 27, "actual_cost_usd": 28, "input_cost_per_word_usd": 29, "output_cost_per_word_usd": 30, "has_tiered_pricing": 14 }, "anthropic:claude-haiku-3.5", "Anthropic Claude Haiku 3.5", 0.8, 4, 2141.1094386851, 0.0024, 0.0000032, 0.000016, { "provider_id": 32, "provider": 33, "input_price_per_1m_tokens": 8, "output_price_per_1m_tokens": 9, "response_time_ms": 34, "actual_cost_usd": 11, "input_cost_per_word_usd": 12, "output_cost_per_word_usd": 13, "has_tiered_pricing": 14 }, "anthropic:claude-opus-3", "Anthropic Claude Opus 3", 2538.34107347902, { "provider_id": 36, "provider": 37, "input_price_per_1m_tokens": 18, "output_price_per_1m_tokens": 8, "response_time_ms": 38, "actual_cost_usd": 20, "input_cost_per_word_usd": 21, "output_cost_per_word_usd": 12, "has_tiered_pricing": 14 }, "anthropic:claude-sonnet-3.7", "Anthropic Claude Sonnet 3.7", 2513.9738537193, { "provider_id": 40, "provider": 41, "input_price_per_1m_tokens": 42, "output_price_per_1m_tokens": 43, "response_time_ms": 44, "actual_cost_usd": 45, "input_cost_per_word_usd": 46, "output_cost_per_word_usd": 47, "has_tiered_pricing": 14 }, "anthropic:claude-haiku-3", "Anthropic Claude Haiku 3", 0.25, 1.25, 2874.71054013884, 0.00075, 0.000001, 0.000005, { "provider_id": 49, "provider": 50, "input_price_per_1m_tokens": 51, "output_price_per_1m_tokens": 52, "response_time_ms": 53, "actual_cost_usd": 54, "input_cost_per_word_usd": 55, "output_cost_per_word_usd": 56, "has_tiered_pricing": 14 }, "open-ai:open-ai-gpt-4.1-mini", "Open AI Open AI GPT-4.1-mini", 0.4, 1.6, 2903.77470624506, 0.001, 0.0000016, 0.0000064, { "provider_id": 58, "provider": 59, "input_price_per_1m_tokens": 60, "output_price_per_1m_tokens": 51, "response_time_ms": 61, "actual_cost_usd": 62, "input_cost_per_word_usd": 63, "output_cost_per_word_usd": 55, "has_tiered_pricing": 14 }, "open-ai:open-ai-gpt-4.1-nano", "Open AI Open AI GPT-4.1-nano", 0.1, 2650.13976342621, 0.00025, 4e-7, { "provider_id": 65, "provider": 66, "input_price_per_1m_tokens": 9, "output_price_per_1m_tokens": 67, "response_time_ms": 68, "actual_cost_usd": 69, "input_cost_per_word_usd": 13, "output_cost_per_word_usd": 70, "has_tiered_pricing": 14 },

antimatter15 · 2025-07-25T16:41:07 1753461667

The `ccusage` npm package pulls prices and other information from LiteLLM which has a lot of diferent models: https://raw.githubusercontent.com/BerriAI/litellm/main/model...

d4rkp4ttern · 2025-07-27T10:58:46 1753613926

There’s also this: https://models.dev/

StratusBen · 2025-07-25T13:56:29 1753451789

The http://ec2instances.info/ of the LLM era ;)

eugene3306 · 2025-07-25T14:39:46 1753454386

what's point of comparing token prices? especially for thinking models.

Just now I was testing the new Qwen3-thinking model. I've run the same prompt five times. The costs I got, sorted: 0.0143, 0.0288, 0.0321, 0.0389, 0.048 . And this is for single model.

Also, in my experience, sonnet-4 is cheaper than gemini-2.5-pro, despite token costs being higher.

eugene3306 · 2025-07-25T14:41:35 1753454495

I think the proper way of estimating the cost is the cost of entire run of a test. Like in aider's leaderboard.

manishsharan · 2025-07-25T16:14:18 1753460058

Is there a reason why you have not added DeepSeek and Qwen and Meta ?

You should also aggregate prices from Vertex and AWS Bedrock .

jacob019 · 2025-07-25T14:00:32 1753452032

Love it! It's going on my toolbar. I face the same problem, constantly trying to hunt down the latest pricing which is often changing. I think it's great that you want to add more models and features, but maybe keep the landing page simple with a default filter that just shows the current content.

alexellman · 2025-07-25T14:03:56 1753452236

Yeah want to keep it really simple. Appreciate it!

ieuanking · 2025-07-27T17:01:59 1753635719

my friend and I built something similar https://app.ubik.studio/all-models

iambateman · 2025-07-25T15:34:39 1753457679

This is cool! Two requests:

- Filter by model "power" or price class. I want compare the mini models, the medium models, etc.

- I'd like to see a "blended" cost which does 80% input + 20% output, so I can quickly compare the overall cost.

Great work on this!

alexellman · 2025-07-25T15:38:21 1753457901

thanks for the feedback!

antoineMoPa · 2025-07-25T13:38:34 1753450714

It would be fun to compare with inference providers (groq/vertex ai, etc.).

alexellman · 2025-07-25T13:53:44 1753451624

yes going to add that

ssalka · 2025-07-25T22:42:42 1753483362

I'd love to see this data joined with common benchmarks, in order to see which models get you the most "bang for your buck", i.e. benchmark score / token cost

cchance · 2025-07-27T17:06:20 1753635980

Seems odd to not have r1, qwen etc, groq, etc

OutOfHere · 2025-07-25T17:51:18 1753465878

It doesn't even list the price for GPT-4.1 (full model). This means it's not thorough and it doesn't try. What an immediate disappointment.

alienbaby · 2025-07-25T14:39:59 1753454399

I'd like to be able to compare prices to determine things like;

Should I use copilot pro in agent mode with sonnet 4, or is it cheaper to use claude with sonnet 4 directly?

generalizations · 2025-07-25T18:03:21 1753466601

This is awesome! I wonder how possible it is to incorporate benchmarks - maybe as a filter? Since not all tokens are as useful as others. Heh.

julianozen · 2025-07-25T17:34:09 1753464849

Keeping this up to date would be a good use for an agent. Companies might even pay for something like this

croes · 2025-07-25T18:04:31 1753466671

Or a page scraper

Fanofilm · 2025-07-25T15:29:32 1753457372

They should add grok. I use grok.

alexellman · 2025-07-25T15:37:15 1753457835

I just added grok

forrestthewoods · 2025-07-25T17:03:58 1753463038

Neat. Would love to see this plotted on a Pareto curve to show quality of said tokens.

dgrin91 · 2025-07-25T14:50:24 1753455024

Cool site. Would be interesting to add a time dimension to track prices over time

intellectronica · 2025-07-25T17:07:37 1753463257

Should read "Up to date prices for Closed American LLM APIs"

hagope · 2025-07-25T17:44:50 1753465490

this is great, I've always wanted something like this, do you think you can add other model metadata, like api name (`gemini-2.5-pro`), context length, modalities, etc

cahaya · 2025-07-25T13:40:12 1753450812

Nice! Missing a cost calculator with input and output fields.

alexellman · 2025-07-25T13:55:07 1753451707

Can add for the future

jimbo808 · 2025-07-25T16:18:49 1753460329

Are we really at a point already where we're treating tokens as a commodity? I certainly would not consider a token generated by Claude or Gemini to be of similar value to a token by Copilot, for example.

sshah_24 · 2025-07-25T14:17:35 1753453055

can we not just self host, expose things through VPN, and something that needs sharing with the world, then tunnel through some cloud server to keep the internal servers secure?

I am newly to this hobby, but would like to know more about what experienced person things and do.

amelius · 2025-07-25T15:38:42 1753457922

Ok, that's price per token, but tells me nothing about the IQ of the models.

BartjeD · 2025-07-25T14:25:23 1753453523

Mistral is missing

peterspath · 2025-07-25T13:53:04 1753451584

I am missing Grok

alexellman · 2025-07-25T15:37:23 1753457843

added

krashidov · 2025-07-25T18:23:08 1753467788

Where is Claude 3.5 Sonnet? Arguably the best model still lol

DrJid · 2025-07-25T13:51:13 1753451473

This is actually really awesome To see. Opened my eyes a bit. Ignore the haters.

2025-07-25T12:39:41 1753447181

[deleted]

dust42 · 2025-07-25T13:36:59 1753450619

tldr; low effort website that only contains 26 Google, OpenAI and Anthropic models and only input and output prices but no info about prompt cache and prompt cache prices. For a list of 473 models of 60+ providers with input, output, context, prompt caching and usage: https://openrouter.ai/models (no affiliation)

jvanderbot · 2025-07-25T13:38:30 1753450710

You know, I immediately found this useful and bookmarked it. I'm not sure why "Simple" and "focused" means "bad"

sam-cop-vimes · 2025-07-25T14:23:47 1753453427

Don't use it then. No need to shit on the effort someone has put into. Clearly it is a starting point which they seem keen to iterate on. If you can't say something kind, best not to say anything.

dust42 · 2025-07-25T16:36:27 1753461387

Look, it takes two prompts and two minutes to create this page:

https://claude.ai/share/20b36bd3-d817-4228-bc33-aa7c4910bc2b (the preview seems to only work in Chrome, for Firefox you have to download the html).

Plus maybe half an hour to verify and correct the prices and another few minutes for domain and hosting.

The author posted it himself, so why not spend an hour or two more and have a decent list with at least half a dozen providers and 100 models? In its current state it is just a mockup.

It is since 3 hours on the top of the front page, if the author had added one per minute to the .json, then it would now be 200 models.

ramon156 · 2025-07-25T13:44:13 1753451053

So would adding cache prices make it less "low effort blog spam"?

dust42 · 2025-07-25T13:50:12 1753451412

Well, openrouter lists currently 473 models of dozens of providers. Three providers and 26 models is 2 minutes of Google search. I stand by my word to call that low effort.

alexellman · 2025-07-25T13:54:48 1753451688

I wanted to gauge interest before adding 500 models

I plan on adding cache prices and making the list more comprehensive

dust42 · 2025-07-25T14:01:18 1753452078

As user murshudoff mentioned elsewhere in the discussion, openrouter has an endpoint to get the prices. Takes 1 minutes to get them.

alexellman · 2025-07-25T14:31:40 1753453900

then use OpenRouter, totally fine by me. Thought a dedicated website just for this would be useful.

alexc05 · 2025-07-25T14:45:35 1753454735

open router doesn't have that really slick graph though :)