AI Models

Explore thousands of AI models across various platforms

Major Models Price Comparison

Top Companies

Models	OpenRouter	OpenAI Official	SiliconFlow	ModelScope	Anthropic Official	DeepSeek Official	NVIDIA NIM	Google AI Studio
GPT-4oFeatured	$2.50/1M Tokens	$2.50/1M Tokens	—	—	—	—	—	—
o1-previewFeatured	$15.00/1M Tokens	$15.00/1M Tokens	—	—	—	—	—	—
Claude 3.5 SonnetFeatured	$3.00/1M Tokens	—	—	—	$3.00/1M Tokens	—	—	—
Claude 3 OpusFeatured	$15.00/1M Tokens	—	—	—	$15.00/1M Tokens	—	—	—
DeepSeek V3Featured	$0.15/1M Tokens	—	FREE10 RMB Gift	—	—	$0.14/1M Tokens	—	—
DeepSeek R1Featured	—	—	FREE10 RMB Gift	—	—	$0.55/1M Tokens	FREEFully Free	—
Llama 3.1 405BFeatured	$0.80/1M Tokens	—	—	—	—	—	FREEFully Free	—
Gemini 1.5 ProFeatured	$3.50/1M Tokens	—	—	—	—	—	—	FREEFree Tier
Gemini 1.5 FlashFeatured	$0.07/1M Tokens	—	—	—	—	—	—	FREEFree Tier
Qwen 2.5 72BFeatured	—	—	$0.60/1M Tokens	FREEDaily 2000	—	—	—	—
Kimi V1Featured	—	—	—	FREEDaily 2000	—	—	—	—
GLM-4Featured	—	—	—	FREEDaily 2000	—	—	—	—
GPT-4 TurboFeatured	$10.00/1M Tokens	$10.00/1M Tokens	—	—	—	—	—	—
Ernie 4.0Featured	—	—	—	—	—	—	—	—

Browse All Models

2M ContextMultimodalVideo Analysis

Input Price

$3.50/1M

nvidia

📝Text

Latency

1092ms

Context

1000K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.50/1M

Qwen: Qwen3.7 Plus

qwen

📝Text

Qwen3.7-Plus is a cost-effective model in Alibaba's Qwen3.7 series. It supports text and image input with text output, building on the series' text capabilities with a comprehensive upgrade to its...

Latency

898ms

Context

1000K

include_reasoninglogprobsmax_tokens

Input Price

$0.32/1M

MiniMax: MiniMax M3

minimax

📝Text

MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding,...

Latency

595ms

Context

1049K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.30/1M

StepFun: Step 3.7 Flash

stepfun

📝Text

Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It pairs a 196B-parameter language backbone with a vision encoder for native image and video understanding, activating roughly 11B parameters...

Latency

1014ms

Context

256K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.20/1M

Anthropic: Claude Opus 4.8 (Fast)

anthropic

📝Text

Fast-mode variant of [Opus 4.8](/anthropic/claude-opus-4.8) - identical capabilities with higher output speed at 2x pricing relative to regular Opus 4.8. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode

Latency

1473ms

Context

1000K

include_reasoningmax_tokensreasoning

Input Price

$10.00/1M

Anthropic: Claude Opus 4.8

anthropic

📝Text

Claude Opus 4.8 is Anthropic's most capable generally available model in the Opus family. It supports text, image, and file inputs with text output, with reasoning support and a 1M-token...

Latency

1418ms

Context

1000K

include_reasoningmax_tokensreasoning

Input Price

$5.00/1M

Qwen: Qwen3.7 Max

qwen

📝Text

Qwen3.7-Max is the flagship model in Alibaba's Qwen3.7 series. It supports text input and output and is designed for agent-centric workloads, with particular strengths in coding, office and productivity tasks,...

Latency

1162ms

Context

1000K

include_reasoninglogprobsmax_tokens

Input Price

$1.25/1M

xAI: Grok Build 0.1

x-ai

📝Text

Grok Build 0.1 is xAI’s fast coding model trained specifically for agentic software engineering workflows. It supports text and image inputs with text output, and is optimized for interactive coding...

Latency

960ms

Context

256K

frequency_penaltyinclude_reasoninglogprobs

Input Price

$1.00/1M

Google: Gemini 3.5 Flash

google

📝Text

Gemini 3.5 Flash is Google's high-efficiency multimodal model, bringing near-Pro level coding and reasoning at Flash-tier cost and speed. It is highly optimized for coding proficiency and parallel agentic execution...

Latency

1057ms

Context

1049K

include_reasoningmax_tokensreasoning

Input Price

$1.50/1M

Anthropic: Claude Opus 4.7 (Fast)

anthropic

📝Text

Fast-mode variant of [Opus 4.7](/anthropic/claude-opus-4.7) - identical capabilities with higher output speed at premium 6x pricing. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode

Latency

1437ms

Context

1000K

include_reasoningmax_tokensreasoning

Input Price

$30.00/1M

Perceptron: Perceptron Mk1

perceptron

📝Text

Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model for video and embodied reasoning.** It accepts image and video inputs paired with natural language queries, and produces detailed visual understanding...

Latency

1462ms

Context

33K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$0.15/1M

inclusionAI: Ring-2.6-1T

inclusionai

📝Text

Ring-2.6-1T is a 1T-parameter-scale thinking model with 63B active parameters, built for real-world agent workflows that require both strong capability and operational efficiency. It is optimized for coding agents, tool...

Latency

1255ms

Context

262K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$0.07/1M

Google: Gemini 3.1 Flash Lite

google

📝Text

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic...

Latency

1013ms

Context

1049K

include_reasoningmax_tokensreasoning

Input Price

$0.25/1M

OpenAI: GPT Chat Latest

openai

📝Text

GPT Chat Latest points to OpenAI's stable API alias `chat-latest` that always resolves to the latest Instant chat model used in ChatGPT. As OpenAI rolls out new Instant model updates...

Latency

1345ms

Context

400K

frequency_penaltylogit_biaslogprobs

Input Price

$5.00/1M

xAI: Grok 4.3

x-ai

📝Text

Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual...

Latency

567ms

Context

1000K

frequency_penaltyinclude_reasoninglogprobs

Input Price

$1.25/1M

IBM: Granite 4.1 8B

ibm-granite

📝Text

Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-token context window and is designed for enterprise tasks...

Latency

1054ms

Context

131K

frequency_penaltylogprobsmax_tokens

Input Price

$0.05/1M

Mistral: Mistral Medium 3.5

mistralai

📝Text

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex...

Latency

536ms

Context

262K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$1.50/1M

Owl Alpha

openrouter

📝Text

Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation, automated workflows, and complex instruction execution....

Latency

1093ms

Context

1049K

frequency_penaltylogit_biasmax_tokens

Input Price

$0.00/1M

NVIDIA: Nemotron 3 Nano Omni (free)

nvidia

📝Text

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...

Latency

1261ms

Context

256K

include_reasoningmax_tokensreasoning

Input Price

$0.00/1M

Poolside: Laguna XS.2 (free)

poolside

📝Text

Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai/), their efficient coding agent series. It combines tool calling and reasoning capabilities with a compact footprint, offering...

Latency

1457ms

Context

262K

include_reasoningmax_tokensreasoning

Input Price

$0.00/1M

Poolside: Laguna XS.2

poolside

📝Text

Latency

892ms

Context

262K

include_reasoningmax_tokensreasoning

Input Price

$0.10/1M

Poolside: Laguna M.1 (free)

poolside

📝Text

Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai/), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 256K...

Latency

634ms

Context

262K

include_reasoningmax_tokensreasoning

Input Price

$0.00/1M

openrouter

📝Text

The Pareto Router maintains a tiered shortlist of strong coding models, ranked by [Artificial Analysis](https://artificialanalysis.ai/) coding percentiles. Set min_coding_score between 0 and 1 on the [pareto-router plugin](https://openrouter.ai/docs/guides/routing/routers/pareto-router#the-min_coding_score-parameter) to control how...

Latency

598ms

Context

2000K

Input Price

$0.00/1M

MoonshotAI: Kimi K2.6

moonshotai

📝Text

Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX generation, and multi-agent orchestration. It handles complex end-to-end coding tasks across Python, Rust, and Go, and...

Latency

843ms

Context

262K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.66/1M

Anthropic: Claude Opus 4.7

anthropic

📝Text

Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the coding and agentic strengths of Opus 4.6, it delivers stronger performance on...

Latency

814ms

Context

1000K

include_reasoningmax_tokensreasoning

Input Price

$5.00/1M

Anthropic: Claude Opus 4.6 (Fast)

anthropic

📝Text

Fast-mode variant of [Opus 4.6](/anthropic/claude-opus-4.6) - identical capabilities with higher output speed at premium 6x pricing. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode

Latency

785ms

Context

1000K

include_reasoningmax_tokensreasoning

Input Price

$30.00/1M

Z.ai: GLM 5.1

z-ai

📝Text

GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minute-level interactions, GLM-5.1 can work independently and continuously on...

Latency

1071ms

Context

203K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.98/1M

Google: Gemma 4 26B A4B (free)

google

📝Text

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

Latency

1150ms

Context

262K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$0.00/1M

Google: Gemma 4 26B A4B

google

📝Text

Latency

1210ms

Context

262K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.06/1M

Google: Gemma 4 31B (free)

google

📝Text

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

Latency

1478ms

Context

262K

include_reasoningmax_tokensmin_p

Input Price

$0.00/1M

nvidia

📝Text

Latency

1420ms

Context

1000K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.08/1M

ByteDance Seed: Seed-2.0-Lite

bytedance-seed

📝Text

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across...

Latency

1029ms

Context

262K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$0.25/1M

Qwen: Qwen3.5-9B

qwen

📝Text

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...

Latency

1316ms

Context

262K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.10/1M

OpenAI: GPT-5.4 Pro

openai

📝Text

GPT-5.4 Pro is OpenAI's most advanced model, building on GPT-5.4's unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. It features a 1M+ token context window (922K input, 128K...

Latency

1314ms

Context

1050K

include_reasoningmax_completion_tokensmax_tokens

Input Price

$30.00/1M

OpenAI: GPT-5.4

openai

📝Text

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for...

Latency

1130ms

Context

1050K

include_reasoningmax_completion_tokensmax_tokens

Input Price

$2.50/1M

Inception: Mercury 2

inception

📝Text

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving...

Latency

779ms

Context

128K

include_reasoningmax_tokensreasoning

Input Price

$0.25/1M

OpenAI: GPT-5.3 Chat

openai

📝Text

GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly...

Latency

811ms

Context

128K

max_completion_tokensmax_tokensresponse_format

Input Price

$1.75/1M

Google: Gemini 3.1 Flash Lite Preview

google

📝Text

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across...

Latency

753ms

Context

1049K

include_reasoningmax_tokensreasoning

Input Price

$0.25/1M

ByteDance Seed: Seed-2.0-Mini

bytedance-seed

📝Text

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding,...

Latency

1225ms

Context

262K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$0.10/1M

Qwen: Qwen3.5-35B-A3B

qwen

📝Text

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...

Latency

564ms

Context

262K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.14/1M

Qwen: Qwen3.5-27B

qwen

📝Text

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...

Latency

938ms

Context

262K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.20/1M

Qwen: Qwen3.5-122B-A10B

qwen

📝Text

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...

Latency

693ms

Context

262K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.26/1M

Qwen: Qwen3.5-Flash

qwen

📝Text

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...

Latency

1451ms

Context

1000K

include_reasoningmax_tokenspresence_penalty

Input Price

$0.07/1M

LiquidAI: LFM2-24B-A2B

liquid

📝Text

LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment. Built as a 24B parameter Mixture-of-Experts model with only 2B active parameters per...

Latency

978ms

Context

128K

frequency_penaltylogit_biasmax_tokens

Input Price

$0.03/1M

Google: Gemini 3.1 Pro Preview Custom Tools

google

📝Text

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party...

Latency

844ms

Context

1049K

include_reasoningmax_tokensreasoning

Input Price

$2.00/1M

OpenAI: GPT-5.3-Codex

openai

📝Text

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results...

Latency

1472ms

Context

400K

include_reasoningmax_completion_tokensmax_tokens

Input Price

$1.75/1M

AionLabs: Aion-2.0

aion-labs

📝Text

Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and conflict into stories, making narratives feel more engaging....

Latency

945ms

Context

131K

include_reasoningmax_tokensreasoning

Input Price

$0.80/1M

Google: Gemini 3.1 Pro Preview

google

📝Text

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...

Latency

750ms

Context

1049K

include_reasoningmax_tokensreasoning

Input Price

$2.00/1M

Anthropic: Claude Sonnet 4.6

anthropic

📝Text

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with...

Latency

1352ms

Context

1000K

include_reasoningmax_completion_tokensmax_tokens

Input Price

$3.00/1M

Qwen: Qwen3.5 Plus 2026-02-15

qwen

📝Text

The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of...

Latency

802ms

Context

1000K

include_reasoninglogprobsmax_tokens

Input Price

$0.26/1M

Qwen: Qwen3.5 397B A17B

qwen

📝Text

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...

Latency

1034ms

Context

256K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.39/1M

MiniMax: MiniMax M2.5

minimax

📝Text

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...

Latency

1499ms

Context

205K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.12/1M

Z.ai: GLM 5

z-ai

📝Text

GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading...

Latency

1035ms

Context

203K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.60/1M

Qwen: Qwen3 Max Thinking

qwen

📝Text

Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it...

Latency

1359ms

Context

262K

include_reasoninglogprobsmax_tokens

Input Price

$0.78/1M

Anthropic: Claude Opus 4.6

anthropic

📝Text

Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective...

Latency

1104ms

Context

1000K

include_reasoningmax_completion_tokensmax_tokens

Input Price

$5.00/1M

Qwen: Qwen3 Coder Next

qwen

📝Text

Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total parameters and only 3B activated per...

Latency

1478ms

Context

262K

frequency_penaltylogit_biaslogprobs

Input Price

$0.11/1M

Free Models Router

openrouter

📝Text

The simplest way to get free inference. openrouter/free is a router that selects free models at random from the models available on OpenRouter. The router smartly filters for models that...

Latency

1292ms

Context

200K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$0.00/1M

StepFun: Step 3.5 Flash

stepfun

📝Text

Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token....

Latency

1034ms

Context

262K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.09/1M

MoonshotAI: Kimi K2.5

moonshotai

📝Text

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed...

Latency

561ms

Context

262K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.38/1M

Upstage: Solar Pro 3

upstage

📝Text

Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With 102B total parameters and 12B active parameters per forward pass, it delivers exceptional performance while maintaining computational efficiency. Optimized...

Latency

895ms

Context

128K

include_reasoningmax_tokensreasoning

Input Price

$0.15/1M

MiniMax: MiniMax M2-her

minimax

📝Text

MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. Designed to stay consistent in tone and personality, it supports rich message...

Latency

996ms

Context

66K

max_tokenstemperaturetop_p

Input Price

$0.30/1M

Writer: Palmyra X5

writer

📝Text

Palmyra X5 is Writer's most advanced model, purpose-built for building and scaling AI agents across the enterprise. It delivers industry-leading speed and efficiency on context windows up to 1 million...

Latency

822ms

Context

1040K

max_tokensstoptemperature

Input Price

$0.60/1M

LiquidAI: LFM2.5-1.2B-Thinking (free)

liquid

📝Text

LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG—while still running comfortably on edge devices. It supports long context (up to 32K tokens) and is...

Latency

898ms

Context

33K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$0.00/1M

LiquidAI: LFM2.5-1.2B-Instruct (free)

liquid

📝Text

LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model built for fast on-device AI. It delivers strong chat quality in a 1.2B parameter footprint, with efficient edge inference and broad runtime support.

Latency

583ms

Context

33K

frequency_penaltymax_tokensmin_p

Input Price

$0.00/1M

OpenAI: GPT Audio

openai

📝Text

The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Audio is priced...

Latency

1209ms

Context

128K

frequency_penaltylogit_biaslogprobs

Input Price

$2.50/1M

OpenAI: GPT Audio Mini

openai

📝Text

A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Input is priced at $0.60 per million...

Latency

686ms

Context

128K

frequency_penaltylogit_biaslogprobs

Input Price

$0.60/1M

Z.ai: GLM 4.7 Flash

z-ai

📝Text

As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning,...

Latency

654ms

Context

203K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.06/1M

OpenAI: GPT-5.2-Codex

openai

📝Text

GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....

Latency

820ms

Context

400K

include_reasoningmax_completion_tokensmax_tokens

Input Price

$1.75/1M

ByteDance Seed: Seed 1.6 Flash

bytedance-seed

📝Text

Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of...

Latency

1241ms

Context

262K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$0.07/1M

ByteDance Seed: Seed 1.6

bytedance-seed

📝Text

Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window.

Latency

1034ms

Context

262K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$0.25/1M

MiniMax: MiniMax M2.1

minimax

📝Text

MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world...

Latency

762ms

Context

205K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.29/1M

Z.ai: GLM 4.7

z-ai

📝Text

GLM-4.7 is Z.ai’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while...

Latency

1363ms

Context

203K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.40/1M

Google: Gemini 3 Flash Preview

google

📝Text

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool...

Latency

1343ms

Context

1049K

include_reasoningmax_tokensreasoning

Input Price

$0.50/1M

NVIDIA: Nemotron 3 Nano 30B A3B (free)

nvidia

📝Text

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...

Latency

801ms

Context

256K

include_reasoningmax_tokensreasoning

Input Price

$0.00/1M

NVIDIA: Nemotron 3 Nano 30B A3B

nvidia

📝Text

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...

Latency

1001ms

Context

262K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.05/1M

OpenAI: GPT-5.2 Chat

openai

📝Text

GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on...

Latency

1361ms

Context

128K

max_completion_tokensmax_tokensresponse_format

Input Price

$1.75/1M

OpenAI: GPT-5.2 Pro

openai

📝Text

GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for complex tasks that require step-by-step reasoning,...

Latency

622ms

Context

400K

include_reasoningmax_tokensreasoning

Input Price

$21.00/1M

OpenAI: GPT-5.2

openai

📝Text

GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly...

Latency

1357ms

Context

400K

include_reasoningmax_completion_tokensmax_tokens

Input Price

$1.75/1M

Mistral: Devstral 2 2512

mistralai

📝Text

Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring...

Latency

1419ms

Context

262K

frequency_penaltymax_tokenspresence_penalty

Input Price

$0.40/1M

Relace: Relace Search

relace

📝Text

The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a codebase and return relevant files to the user request. In contrast to RAG, relace-search performs agentic...

Latency

1315ms

Context

256K

max_tokensresponse_formatseed

Input Price

$1.00/1M

Z.ai: GLM 4.6V

z-ai

📝Text

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts...

Latency

995ms

Context

131K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$0.30/1M

qwen

📝Text

Latency

1441ms

Context

262K

frequency_penaltylogit_biaslogprobs

Input Price

$0.09/1M

Qwen: Qwen Plus 0728 (thinking)

qwen

📝Text

Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.

Latency

649ms

Context

1000K

include_reasoningmax_tokenspresence_penalty

Input Price

$0.26/1M

Qwen: Qwen Plus 0728

qwen

📝Text

Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.

Latency

535ms

Context

1000K

logprobsmax_tokenspresence_penalty

Input Price

$0.26/1M

NVIDIA: Nemotron Nano 9B V2 (free)

nvidia

📝Text

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...

Latency

1355ms

Context

128K

include_reasoningmax_tokensreasoning

Input Price

$0.00/1M

MoonshotAI: Kimi K2 0905

moonshotai

📝Text

Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32...

Latency

917ms

Context

262K

frequency_penaltylogit_biasmax_tokens

Input Price

$0.60/1M

Qwen: Qwen3 30B A3B Thinking 2507

qwen

📝Text

Qwen3-30B-A3B-Thinking-2507 is a 30B parameter Mixture-of-Experts reasoning model optimized for complex tasks requiring extended multi-step thinking. The model is designed specifically for “thinking mode,” where internal reasoning traces are separated...

Latency

1396ms

Context

131K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.08/1M

Nous: Hermes 4 70B

nousresearch

📝Text

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either...

Latency

609ms

Context

131K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$0.13/1M

Nous: Hermes 4 405B

nousresearch

📝Text

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with...

Latency

851ms

Context

131K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$1.00/1M

DeepSeek: DeepSeek V3.1

deepseek

📝Text

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context...

Latency

647ms

Context

164K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.21/1M

Mistral: Mistral Medium 3.1

mistralai

📝Text

Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances...

Latency

1187ms

Context

131K

frequency_penaltymax_tokenspresence_penalty

Input Price

$0.40/1M

Z.ai: GLM 4.5V

z-ai

📝Text

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding,...

Latency

506ms

Context

66K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$0.60/1M

AI21: Jamba Large 1.7

ai21

📝Text

Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybrid SSM-Transformer architecture with a 256K context...

Latency

1265ms

Context

256K

max_tokensresponse_formatstop

Input Price

$2.00/1M

OpenAI: GPT-5 Chat

openai

📝Text

GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications.

Latency

1160ms

Context

128K

max_tokensresponse_formatseed

Input Price

$1.25/1M

OpenAI: GPT-5

openai

📝Text

GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy...

Latency

1443ms

Context

400K

include_reasoningmax_completion_tokensmax_tokens

Input Price

$1.25/1M

OpenAI: GPT-5 Mini

openai

📝Text

GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks. It provides the same instruction-following and safety-tuning benefits as GPT-5, but with reduced latency and cost....

Latency

655ms

Context

400K

include_reasoningmax_completion_tokensmax_tokens

Input Price

$0.25/1M

OpenAI: GPT-5 Nano

openai

📝Text

GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While limited in reasoning depth compared to its larger...

Latency

1094ms

Context

400K

include_reasoningmax_completion_tokensmax_tokens

Input Price

$0.05/1M

OpenAI: gpt-oss-120b (free)

openai

📝Text

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...

Latency

1239ms

Context

131K

include_reasoningmax_tokensmin_p

Input Price

$0.00/1M

OpenAI: gpt-oss-120b

openai

📝Text

Latency

1341ms

Context

131K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.03/1M

OpenAI: gpt-oss-20b (free)

openai

📝Text

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...

Latency

1293ms

Context

131K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$0.00/1M

qwen

📝Text

Latency

655ms

Context

1049K

frequency_penaltylogit_biaslogprobs

Input Price

$0.22/1M

ByteDance: UI-TARS 7B

bytedance

📝Text

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...

Latency

1147ms

Context

128K

frequency_penaltylogit_biaslogprobs

Input Price

$0.10/1M

google

📝Text

Latency

1059ms

Context

1049K

include_reasoningmax_tokensreasoning

Input Price

$1.25/1M

DeepSeek: R1 0528

deepseek

📝Text

May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active...

Latency

717ms

Context

164K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.50/1M

Anthropic: Claude Opus 4

anthropic

📝Text

Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in...

Latency

1462ms

Context

200K

include_reasoningmax_tokensreasoning

Input Price

$15.00/1M

Anthropic: Claude Sonnet 4

anthropic

📝Text

Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%),...

Latency

635ms

Context

1000K

include_reasoningmax_tokensreasoning

Input Price

$3.00/1M

Google: Gemma 3n 4B

google

📝Text

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks...

Latency

1023ms

Context

33K

frequency_penaltylogit_biasmax_tokens

Input Price

$0.06/1M

Mistral: Mistral Medium 3

mistralai

📝Text

Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost...

Latency

917ms

Context

131K

frequency_penaltymax_tokenspresence_penalty

Input Price

$0.40/1M

google

📝Text

Latency

1471ms

Context

131K

frequency_penaltylogit_biasmax_tokens

Input Price

$0.05/1M

Cohere: Command A

cohere

📝Text

Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases. Compared to other leading proprietary...

Latency

1354ms

Context

256K

frequency_penaltymax_tokenspresence_penalty

Input Price

$2.50/1M

OpenAI: GPT-4o-mini Search Preview

openai

📝Text

GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.

Latency

1230ms

Context

128K

max_tokensresponse_formatstructured_outputs

Input Price

$0.15/1M

OpenAI: GPT-4o Search Preview

openai

📝Text

GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.

Latency

1219ms

Context

128K

max_tokensresponse_formatstructured_outputs

Input Price

$2.50/1M

Reka Flash 3

rekaai

📝Text

Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function calling. Featuring a...

Latency

1107ms

Context

66K

frequency_penaltyinclude_reasoningmax_tokens

Input Price

$0.10/1M

inflection

📝Text

Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. It has access to recent news. For emotional...

Latency

1185ms

Context

max_tokensstoptemperature

Input Price

$2.50/1M

nousresearch

📝Text

Latency

1445ms

Context

131K

frequency_penaltylogit_biasmax_tokens

Input Price

$1.00/1M

openai

📝Text

Latency

809ms

Context

128K

frequency_penaltylogit_biaslogprobs

Input Price

$0.15/1M

Google: Gemma 2 27B

google

📝Text

Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). Gemma models are well-suited for a variety of...

Latency

1142ms

Context

frequency_penaltymax_tokenspresence_penalty

Input Price

$0.65/1M

OpenAI: GPT-4o (2024-05-13)

openai

📝Text

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as...

Latency

782ms

Context

128K

frequency_penaltylogit_biaslogprobs

Input Price

$5.00/1M

OpenAI: GPT-4o

openai

📝Text

Latency

601ms

Context

128K

frequency_penaltylogit_biaslogprobs

Input Price

$2.50/1M

openai

📝Text

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.

Latency

830ms

Context

frequency_penaltylogit_biaslogprobs

Input Price

$1.00/1M

Auto Router

openrouter

📝Text

Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output. To see which model was used,...

Latency

678ms

Context

2000K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$0.00/1M

OpenAI: GPT-3.5 Turbo Instruct

openai

📝Text

This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Sep 2021.

Latency

1443ms

Context

frequency_penaltylogit_biaslogprobs

Input Price

$1.50/1M

OpenAI: GPT-3.5 Turbo 16k

openai

📝Text

This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up...

Latency

1014ms

Context

16K

frequency_penaltylogit_biaslogprobs

Input Price

$3.00/1M

Mancer: Weaver (alpha)

mancer

📝Text

An attempt to recreate Claude-style verbosity, but don't expect the same level of coherence or memory. Meant for use in roleplay/narrative situations.

Latency

833ms

Context

frequency_penaltylogit_biaslogprobs

Input Price

$0.75/1M

ReMM SLERP 13B

undi95

📝Text

A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge

Latency

673ms

Context

frequency_penaltylogit_biaslogprobs

Input Price

$0.45/1M

MythoMax 13B

gryphe

📝Text

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge

Latency

870ms

Context

frequency_penaltylogit_biaslogprobs

Input Price

$0.06/1M

OpenAI: GPT-3.5 Turbo

openai

📝Text

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.

Latency

1034ms

Context

16K

frequency_penaltylogit_biaslogprobs

Input Price

$0.50/1M

OpenAI: GPT-4

openai

📝Text

OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning...

Latency

1287ms

Context

frequency_penaltylogit_biaslogprobs

Input Price

$30.00/1M

Unknown Model

Unknown Provider

google

🎨Image

Latency

610ms

Context

66K

include_reasoningmax_tokensreasoning

Input Price

$2.00/1M

OpenAI: GPT-5 Image Mini

openai

🎨Image

GPT-5 Image Mini combines OpenAI's advanced language capabilities, powered by [GPT-5 Mini](https://openrouter.ai/openai/gpt-5-mini), with GPT Image 1 Mini for efficient image generation. This natively multimodal model features superior instruction following, text...

Latency

823ms

Context

400K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$2.50/1M

OpenAI: GPT-5 Image

openai

🎨Image

[GPT-5](https://openrouter.ai/openai/gpt-5) Image combines OpenAI's GPT-5 model with state-of-the-art image generation capabilities. It offers major improvements in reasoning, code quality, and user experience while incorporating GPT Image 1's superior instruction following,...

Latency

505ms

Context

400K

frequency_penaltyinclude_reasoninglogit_bias

Input Price

$10.00/1M

Google: Nano Banana (Gemini 2.5 Flash Image)

google

🎨Image

Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation,...

Latency

1395ms

Context

33K

max_tokensresponse_formatseed

Input Price

$0.30/1M