Omniracle

Why are open-weight AI models like DeepSeek V4 so much cheaper than proprietary models from Anthropic and OpenAI?

Open-weight models like DeepSeek V4 slash costs in several concrete ways that proprietary services like ChatGPT or Claude just can’t match. It’s not magic — it’s a mix of licensing, architecture, market competition, and provider business models. Here’s the breakdown.

1. No licensing fees and you can run it yourself

Proprietary models live behind a paywall: you pay either per token or a subscription, and the company sets the price. Open-weight models let you skip the license tax entirely. You’re free to fine-tune, adapt, or run the model on your own hardware under permissive licenses (like MIT or Apache 2.0) [1][2][4]. One source puts it bluntly: “there’s no licensing software, but you still have to run the model” [3]. That means you trade a recurring bill for your own compute – and if you already have hardware, the marginal cost drops to almost nothing.

2. The architecture does more with less compute

DeepSeek V4 uses a Mixture of Experts (MoE) design where only a fraction of the 1.6‑trillion‑parameter brain ever lights up per request. In practice, you pay for the 13‑billion or 49‑billion active parameters, not the whole monster [22]. It also packs other clever tricks:

  • Compressed long‑context memory that dodges the full price of classic attention [18][19].
  • Manifold‑constrained hyper‑connections that reshape how information flows between blocks [21].

All of that adds up to “unprecedented efficiency” [20], meaning each token costs far fewer GPU cycles compared to dense models like GPT‑4o. That efficiency directly feeds into the sticker price.

3. Open‑weight competition drives prices into the floor

When a model is open‑weight, any provider can host it. That creates a race to the bottom on pricing. DeepSeek V4 Pro Max, for example, is tracked across six different API providers, and five of them cluster around the same blended price of $2.17 per million tokens [11]. No single company sits on a monopoly; you can always switch to a cheaper host.

The results are eye‑popping:

  • DeepSeek V4 Flash can cost as little as $0.14 per million input tokens (cache miss) and $0.28 per million output tokens [7][8].
  • With caching, input tokens crash to $0.0028 per million [9].
  • Real‑world users report spending only $0.30 per day for development use [5]. Overall, DeepSeek V4 pricing is described as 20–50× cheaper than OpenAI [12].

4. Proprietary AI companies have fat margins (or are trying to get there)

The other side of the coin is that OpenAI and Anthropic aren’t aiming for rock‑bottom prices. They’re for‑profit businesses with high margin goals:

  • OpenAI’s API gross margin hovered around 75% in mid‑2024 and was still at ~55% after a price cut [13]. Its compute margin jumped to roughly 70% by late 2025 [14].
  • Anthropic had negative 94% gross margins in 2024 and expects to reach 77% only by 2028 [15].
  • Even GitHub Copilot was reportedly losing $20+ per user per month at a $10 subscription, just from the compute bill [16].

Those numbers show why proprietary APIs aren’t priced at bare‑bones cost — they need to cover massive infrastructure and eventually turn a profit. Open‑weight models don’t carry that baggage when you self‑host, and even when you use a third‑party service, the provider’s margin is typically slimmer because they compete fiercely.

5. Inference technology keeps getting cheaper — and open models reap the rewards

The entire industry is slashing inference costs, and open‑weight models benefit the most because you can pick optimized hardware and runtime stacks. Some data points:

  • Switching from a closed‑source model to an open‑source one on NVIDIA Blackwell cut one healthcare company’s costs by 90% (10× reduction) [24].
  • Running an open‑source model dropped from $2 per hour to $0.65 per hour for one user [25].
  • Overall LLM inference cost for the same quality declines by 10× every year [26].
  • The cheapest model at a given performance tier is often an open‑weight variant served by a provider like Together.ai (e.g., $0.06 per million tokens for Llama 3.2 3B) [27].
  • Industry guides even list “use open‑source models” as a top‑5 method to reduce inference spend [28].

This rapid improvement pulls the floor out from under proprietary pricing, widening the gap even further.


In a nutshell: Open‑weight models are cheaper because they remove licensing fees, use super‑efficient architectures, spark provider warfare, and are carried by the same rapid inference‑cost decline that proprietary APIs try to monetize. Meanwhile, the big AI labs are chasing margins that keep their prices far above raw compute cost.