Available Models
Browse and compare LLM models available through the LLM Gateway.
57 model(s)
| Model | Input | Output | Input Price | Output Price | Cache Read | Cache Write | Context | Max Output |
|---|---|---|---|---|---|---|---|---|
Claude Haiku 4.5 Claude Haiku 4.5 is Anthropic’s fastest and most cost-efficient model, delivering performance comparable to Sonnet 4 across coding, computer use, and agentic tasks. | TextImage | Text | $1 | $5 | $0.1 | $1.25 | 200K | 64K |
Claude Opus 4.5 Claude Opus 4.5 is Anthropic’s frontier reasoning model, designed for complex software engineering, agentic workflows, and long-running computer tasks. It delivers strong multimodal performance, excels on coding and reasoning benchmarks, and offers improved resistance to prompt injection attacks.The model allows developers to adjust inference effort based on task needs, balancing response speed, reasoning depth, and token usage. It also supports advanced tool use, extended context management, and multi-agent workflows, making it well suited for research, code debugging, planning, and browser or spreadsheet tasks.
| TextImage | Text | $5 | $25 | $0.5 | $6.25 | 200K | 64K |
Claude Opus 4.6 Anthropic has released the new-generation Claude Opus 4.6 model, featuring a 1M context window and increasing the maximum output token limit to 128K—double the previous generation’s 64K cap. The model introduces an adaptive thinking mode that dynamically adjusts reasoning depth based on task complexity, along with a new highest-level “max effort” parameter. | TextImage | Text | $5 | $25 | $0.5 | $6.25 | 1M | 128K |
Claude Opus 4.7 Opus 4.7 has made significant improvements over Opus 4.6 in advanced software engineering, showing clear progress on the most challenging tasks. It can confidently handle the most complex coding work—which previously required close supervision. Opus 4.7 processes complex and long-running tasks in a rigorous and consistent manner, follows instructions precisely, and devises methods to verify its output before reporting. | TextImage | Text | $5 | $25 | $0.5 | $6.25 | 1M | 128K |
Claude Sonnet 4.5 Claude Sonnet 4.5 is the world’s leading coding model and Anthropic’s most capable model for building complex agents and using computers. It also delivers significant improvements in reasoning and mathematics. | TextImage | Text | $3 | $15 | $0.3 | $3.75 | 200K | 64K |
Claude Sonnet 4.6 Sonnet 4.6 is Anthropic’s most powerful Sonnet model to date, achieving frontier-level performance in programming, AI agents, and professional work. It excels at iterative development, navigating complex codebases, end-to-end project management with memory, professional document writing, and reliable computer use for web-based question answering and workflow automation. | TextImage | Text | $3 | $15 | $0.3 | $3.75 | 1M | 64K |
DeepSeek V3.2 DeepSeek has released the official V3.2 version, significantly enhancing its agentic and reasoning capabilities. It reaches GPT-5-level performance on mainstream benchmarks and supports tool use in thinking mode. At the same time, the newly introduced Speciale experimental version has achieved gold-medal-level results in multiple international competitions. The model is now fully open for use. | Text | Text | $0.286 | $0.429 | $0.029 | - | 128K | 8K |
DeepSeek V3.2 Thinking DeepSeek-V3.2-thinking is our first model to integrate reasoning into tool use, and it serves as the thinking mode of DeepSeek-V3.2. | Text | Text | $0.286 | $0.429 | $0.029 | - | 128K | 64K |
Deepseek V4 Pro DeepSeek-V4-Pro is a high-performance open-source large model released by DeepSeek. It features top-tier reasoning and Agent capabilities, supports ultra-long context, is compatible with domestic Ascend chips, and offers exceptional cost-effectiveness. | Text | Text | $1.714 | $3.429 | $0.143 | - | 1M | 384K |
Deepseek V4 Flash DeepSeek-V4-Flash is the lightweight version of the DeepSeek V4 series, focusing on high cost-effectiveness and high throughput efficiency. It is suitable for general dialogue and basic text tasks, while supporting a million-Token long context and efficient inference. | Text | Text | $0.143 | $0.286 | $0.029 | - | 1M | 384K |
GLM 4.7 GLM-4.7 is Zhipu’s latest flagship model. It is enhanced for Agentic Coding scenarios, with stronger coding ability, long-horizon task planning, and tool coordination, and it has achieved leading performance among open-source models on current rankings across multiple public benchmarks. Its general capabilities have also been improved, with more concise and natural responses and more immersive writing. | TextImage | Text | $0.571 | $2.286 | $0.114 | - | 200K | 128K |
GLM 5 GLM-5 is Zhipu’s next-generation flagship foundation model, built for Agentic Engineering and designed to deliver reliable productivity in complex systems engineering and long-horizon agent tasks. In coding and agent capabilities, GLM-5 achieves state-of-the-art performance among open-source models. In real-world programming scenarios, its user experience approaches that of Claude Opus 4.5. It excels at complex systems engineering and long-horizon agent tasks, making it an ideal foundation for general-purpose agent assistants. | Text | Text | $0.857 | $3.142 | $0.214 | - | 200K | 128K |
GLM-5.1 glm-5.1 is the latest flagship model launched by the Zhipu platform. | Text | Text | $0.857 | $1.143 | $0.186 | - | 200K | 128K |
GPT Image 1.5 GPT Image 1.5 is our latest image generation model, with better instruction tracking and adherence to prompts. | Text | Image | $7 | $10 | - | - | 64K | 16K |
GPT- 4.1 GPT-4.1 is a high-performance multimodal model released by OpenAI on April 15, 2025, positioned as a comprehensive upgrade to GPT-4o. It excels in advanced code generation, long-context understanding, fast inference, and multimodal processing, with key strengths of a 1M-token context window and lower cost, optimized for software development, complex instruction execution, and long-document analysis. | TextImage | Text | $2 | $8 | $0.5 | - | 1M | 32K |
GPT-4.1 Mini GPT-4.1 Mini is a lightweight multimodal large language model launched by OpenAI. It balances strong reasoning capability, million-level ultra-long context window and ultra-low cost, with low latency and high throughput. It is suitable for large-scale high-concurrency services, long document processing, daily conversation and lightweight multimodal application scenarios. | TextImage | Text | $0.4 | $1.6 | $0.1 | - | 1M | 32K |
GPT-4.1 Nano GPT-4.1 Nano is the smallest and most cost-efficient model in OpenAI’s GPT-4.1 family. It is designed for very low-cost, high-throughput tasks such as classification, extraction, simple reasoning, lightweight coding assistance, data transformation, and large-scale automation. It supports long-context text processing and image understanding, but its reasoning and coding ability are generally weaker than GPT-4.1 Mini and the full GPT-4.1 model. | TextImage | Text | $0.1 | $0.4 | $0.025 | - | 1M | 32K |
GPT-5 Gpt-5 is an OpenAI model for general-purpose reasoning, coding, instruction following, tool use, and production chat workloads, with the exact speed, reasoning depth, and cost depending on the selected variant.
| TextImage | Text | $1.25 | $10 | $0.125 | - | 400K | 128K |
GPT-5 Chat Latest GPT-5 Chat Latest is a new flagship multimodal conversational large model launched by OpenAI. It delivers powerful logical reasoning, ultra-low hallucinations, ultra-long context comprehension and excellent multimodal analysis capabilities, balancing response speed with professional-level generation quality. It is suitable for daily conversation, complex problem solving, long document analysis, creative content creation and multimodal intelligent interaction scenarios. | TextImage | Text | $1.25 | $10 | $0.125 | - | 400K | 128K |
GPT-5 Codex GPT-5-Codex is a professional large model released by OpenAI on September 15, 2025, deeply optimized for agentic coding and software engineering based on GPT-5OpenAI. Trained on real-world development scenarios, it delivers both fast interactive responses and long-duration independent task execution (over 7 hours of continuous work). It excels in code generation, review, debugging, refactoring, multi-language development, and system-level interaction, deeply integrated with IDE, CLI, cloud, and GitHub workflows—serving as an AI coding teammate for professional developers. | Text | Text | $1.25 | $10 | $0.125 | - | 400K | 128K |
GPT-5 Mini GPT-5 Mini is an OpenAI model for general-purpose reasoning, coding, instruction following, tool use, and production chat workloads, with the exact speed, reasoning depth, and cost depending on the selected variant.
| TextImage | Text | $0.25 | $2 | $0.025 | - | 400K | 128K |
GPT-5 Nano Gpt-5-Nano is an OpenAI model for general-purpose reasoning, coding, instruction following, tool use, and production chat workloads, with the exact speed, reasoning depth, and cost depending on the selected variant.
| TextImage | Text | $0.05 | $0.4 | $0.005 | - | 400K | 128 |
GPT-5 Pro Gpt-5-Pro is an OpenAI model for general-purpose reasoning, coding, instruction following, tool use, and production chat workloads, with the exact speed, reasoning depth, and cost depending on the selected variant.
| TextImage | Text | $15 | $120 | - | - | 400K | 128K |
GPT-5.1 Gpt-5.1 is an OpenAI model for general-purpose reasoning, coding, instruction following, tool use, and production chat workloads, with the exact speed, reasoning depth, and cost depending on the selected variant.
| TextImage | Text | $1.25 | $10 | $0.125 | - | 400K | 128K |
GPT-5.1 Chat Latest Gpt-5.1-Chat-Latest is an OpenAI model for general-purpose reasoning, coding, instruction following, tool use, and production chat workloads, with the exact speed, reasoning depth, and cost depending on the selected variant.
| TextImage | Text | $1.25 | $10 | $0.125 | - | 400K | 128K |
GPT-5.2 GPT-5.2 is the best model for coding and intelligence tasks across industries. | TextImage | Text | $1.75 | $14 | $0.175 | - | 400K | 128K |
GPT-5.2 Chat Latest Gpt-5.2-Chat-Latest is an OpenAI model for general-purpose reasoning, coding, instruction following, tool use, and production chat workloads, with the exact speed, reasoning depth, and cost depending on the selected variant.
| TextImage | Text | $1.75 | $14 | $0.175 | - | 400K | 128K |
GPT-5.2 Codex Gpt-5.2-Codex is an OpenAI model for general-purpose reasoning, coding, instruction following, tool use, and production chat workloads, with the exact speed, reasoning depth, and cost depending on the selected variant.
| Text | Text | $1.75 | $14 | $0.175 | - | 400K | 128K |
GPT-5.2 Pro Gpt-5.2-Pro is an OpenAI model for general-purpose reasoning, coding, instruction following, tool use, and production chat workloads, with the exact speed, reasoning depth, and cost depending on the selected variant.
| TextImage | Text | $21 | $168 | - | - | 400K | 272K |
GPT-5.3 Codex GPT-5.3-Codex redefines the role of AI in programming and general productivity through improved performance, broader capability generalization, and enhanced safety. | TextImage | Text | $1.75 | $14 | $0.175 | - | 400K | 128K |
GPT-5.4 Small and affordablGPT-5.4 is our frontier model for complex professional work.e model for lightweight tasks | TextImage | Text | $2.5 | $15 | $0.25 | - | 1M | 128K |
GPT-5.4 Pro GPT-5.4 Pro uses more compute to think more deeply and deliver consistently better answers. It is available exclusively through the Responses API, which supports multi-turn model interaction before generating a response and will enable additional advanced API capabilities in the future. | TextImage | Text | $30 | $180 | - | - | 1.1M | 128K |
GPT-5.5 GPT-5.5 (codenamed "Spud") is OpenAI's flagship multimodal large model released on April 23, 2026. As the first complete ground-up retrain since GPT-4.5, it is positioned as an agentic model for real work, featuring native computer control, autonomous agents, 1M+ token context, deep reasoning, and advanced coding & scientific capabilities, optimized for complex professional workflows, autonomous programming, long-document analysis, and enterprise AI applications. | TextImage | Text | $5 | $30 | $0.5 | - | 1M | 128K |
Gemini 2.5 Flash Gemini 2.5 Flash is a multimodal large language model launched by Google DeepMind in April 2025. It features ultra-fast response, million-level context window, controllable reasoning (Thinking Budget) and high cost performance, serving as the flagship version of the Gemini 2.5 family for large-scale real-time applications. | Text | TextImage | $0.3 | $2.5 | $0.03 | $1 | 1M | 64K |
Gemini 2.5 Flash Lite Gemini 2.5 Flash-Lite is an ultra-lightweight multimodal large model launched by Google DeepMind in June 2025. It features ultra-low cost, ultra-low latency, and high throughput, supports a 1M-token context window and controllable reasoning (Thinking Budget), serving as the cost-effective flagship of the Gemini 2.5 family for large-scale, high-concurrency, lightweight real-time applications (e.g., classification, translation, data processing). | Text | TextImageAudioVideo | $0.1 | $0.4 | $0.01 | $0.08333 | 1M | 64K |
Gemini 2.5 Pro Gemini 2.5 Pro is Google’s latest AI model and its most advanced model to date, excelling at coding and complex prompts. With “Deep Think,” it can reason before responding, improving performance and accuracy. The model performs exceptionally well across multiple benchmarks and ranks first on the LMArena leaderboard for reasoning and code generation. It supports multimodal input, including text, images, audio, video, and code. | TextImageAudioVideo | Text | $2.5 | $15 | $0.25 | $4.5 | 1M | 65.5K |
Gemini 3 Flash Preview Gemini 3 Flash Preview is a preview multimodal large model released by Google DeepMind in December 2025. It features ultra-fast inference, near-Pro level intelligence, 1M-token context window, multimodal inputs (text/images/audio/video), and configurable thinking levels, designed specifically for high-concurrency real-time scenarios such as agentic workflows, interactive development, long-document analysis, and coding. | Text | TextImageAudioVideo | $0.5 | $3 | $0.05 | $1 | 1M | 64K |
Gemini 3 Pro Gemini 3 Pro Preview is part of Google’s most intelligent model family to date, built on advanced reasoning capabilities. It is designed to turn ideas into reality through agentic workflows, autonomous coding, and complex multimodal tasks. | TextImageAudioVideo | Text | $2 | $12 | $0.2 | $4.5 | 1M | 64K |
Gemini 3.1 Pro Gemini 3.1 is Google’s most intelligent model family to date, built on advanced reasoning capabilities. It is designed to turn ideas into reality through agentic workflows, autonomous coding, and complex multimodal tasks. Gemini 3.1 Pro Preview is best suited for complex tasks requiring broad world knowledge and advanced cross-modal reasoning. | TextImageAudioVideo | Text | $2 | $12 | $0.2 | $4.5 | 1M | 64K |
Glm 5 Turbo GLM-5-Turbo is a foundation model deeply optimized for OpenClaw’s Lobster scenarios. From the training stage, it has been specially optimized around the core requirements of Lobster tasks, strengthening key capabilities such as tool calling, instruction following, scheduled and persistent task handling, and long-chain execution. This enables it to be truly actionable even in complex, dynamic, and long-horizon tasks. | Text | Text | $1 | $3.714 | $0.257 | - | 200K | 128K |
Grok 4.1 Fast Non Reasoning Grok-4-1-fast-non-reasoning is an AI model developed by xAI, optimized for maximum speed in generating responses and carrying out agentic tasks. Unlike its “reasoning” counterpart, this variant skips the use of “thinking tokens,” allowing it to deliver immediate, pattern-matched answers for simple and straightforward queries. | TextImage | Text | $0.2 | $0.5 | $0.05 | - | 2M | 64K |
Grok 4.1 Fast Reasoning Grok 4.1 excels in creative, emotional, and collaborative interactions, showing a sharper ability to capture subtle intent and delivering a more engaging conversational experience. It also maintains a highly consistent personality while fully inheriting the sharp intelligence and reliable performance of its predecessor. | TextImage | Text | $0.2 | $0.5 | $0.05 | - | 2M | 64K |
Kimi K2.5 Kimi K2.5 is Kimi’s smartest model to date, achieving open-source state-of-the-art performance in agents, coding, visual understanding, and a wide range of general intelligence tasks. It is also Kimi’s most versatile model so far, featuring a natively multimodal architecture that supports both visual and text inputs, thinking and non-thinking modes, as well as dialogue and agent tasks. | TextImage | Text | $0.571 | $3 | $0.1 | - | 256K | 32K |
Kimi k2 Kimi K2 is a breakthrough mixture-of-experts model designed for outstanding performance in frontier knowledge, reasoning, and coding tasks. It is built for autonomous action and intelligent problem-solving. | Text | Text | $0.571 | $2.286 | $0.143 | - | 262.1K | 32K |
Llama 4 Maverick Meta's latest mixture-of-experts model | TextImage | Text | $0.5 | $0.75 | $0.125 | $0.5 | 1M | 65.5K |
MiniMax M2.1 MiniMax-M2.1 is a lightweight, cutting-edge large language model optimized for coding, agentic workflows, and modern application development. Activating only 10 billion parameters, it delivers a major leap in real-world capabilities while maintaining excellent latency, scalability, and cost efficiency. | Text | Text | $0.3 | $1.2 | $0.03 | $0.375 | 204.8K | 1.3M |
MiniMax M2.5 MiniMax-M2.5 has achieved or surpassed the industry state of the art across productivity scenarios such as coding, tool use, search, and office work. | Text | Text | $0.3 | $1.2 | $0.06 | $0.375 | 204.8K | 131.1K |
Mistral Large Mistral's flagship model for complex tasks | Text | Text | $2 | $6 | $0.5 | $2 | 131.1K | 8.2K |
Nano Banana Gemini 2.5 Flash Image (codename "Nano Banana") is an image generation and editing model launched by Google DeepMind in August 2025. It features ultra-fast text-to-image, precise natural language editing, multi-image fusion, character consistency, and real-world physical reasoning, serving as the professional imaging model in the Gemini 2.5 family for creative design, e-commerce, and content creation. | Text | TextImage | $0.3 | $2.5 | $0.03 | $1 | 32K | 32K |
Nano Banana 2 Nano Banana 2 provides high-quality image generation and conversational editing at a mainstream price point and low latency. | TextImage | TextImage | $0.5 | $60 | - | - | 131.1K | 32.8K |
Nano Banana Pro Gemini 3 Pro Image Preview (Nano Banana Pro) is the next-generation AI image generation and editing model in Google’s Gemini family, serving as an upgraded version of Gemini 2.5 Flash Image (Nano Banana). Combining a multimodal Transformer with diffusion-based modeling, it natively supports 2K (2048×2048) and 4K output, with significant improvements in image quality, text rendering, and physical reasoning. | TextImage | TextImage | $2 | $120 | - | - | 65.5K | 32.8K |
QwQ Plus The QwQ inference model trained on the Qwen2.5-32B model significantly improves its inference capabilities through reinforcement learning. Core metrics such as mathematical code (AIME 24/25, livecodebench) and some general metrics (IFEval, LiveBench, etc.) reach the full-fledged level of DeepSeek-R1, and all metrics significantly surpass those of DeepSeek-R1-Distill-Qwen-32B, which is also based on Qwen2.5-32B. | Text | Text | $0.229 | $0.571 | - | - | 128K | 7K |
Qwen 3.5 397B A17B Qwen3.5系列397B-A17B原生视觉语言模型,基于混合架构设计,融合了线性注意力机制与稀疏混合专家模型,实现了更高的推理效率。在语言理解、逻辑推理、代码生成、智能体任务、图像理解、视频理解、图形用户界面(GUI)等多种任务中,均展现出与当前顶尖前沿模型相媲美的卓越性能。具备强大的代码生成与智能体能力,对于各类智能体场景具有良好的泛化性。 | TextImageVideo | Text | $0.429 | $2.571 | - | - | 256K | 64K |
Qwen 3.5 Flash Qwen3.5原生视觉语言系列Flash模型,基于混合架构设计,融合了线性注意力机制与稀疏混合专家模型,实现了更高的推理效率。模型效果在纯文本与多模态方面相较3系列均实现飞跃式进步;响应速度快,兼具推理速度和性能。 | TextImage | Text | $0.171 | $1.714 | $0.017 | $0.214 | 1M | 64K |
Qwen 3.5 Plus The Qwen3.5 native vision-language Plus models adopt a hybrid architecture that combines linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. Across a wide range of benchmark tasks, the 3.5 series demonstrates outstanding performance comparable to today’s leading frontier models, while delivering a significant leap over the 3 series in both pure text and multimodal capabilities. | TextImageVideo | Text | $0.57 | $3.426 | $0.06 | $0.714 | 1M | 64K |
Qwen3 Max Preview The latest Qwen3-Max-Preview model is the preview version of the Max model in the Qwen3 series. Compared with the Qwen 2.5 series, it delivers substantial overall improvements in general capabilities, with significantly enhanced Chinese and English text understanding, complex instruction following, subjective open-ended task performance, multilingual ability, and tool-use capability. The model also produces fewer knowledge hallucinations. | Text | Text | $2.143 | $8.571 | $0.429 | - | 256K | 64K |
Step 3.5 Flash Step 3.5 Flash is the strongest open-source foundation model released by StepFun to date. Its capabilities in agent scenarios and mathematical tasks approach those of closed-source models, enabling it to handle complex, long-chain tasks effectively. | Text | Text | $0.1 | $0.3 | - | - | 256K | 128K |