Free LLM API resources
This lists various services that provide free access or credits towards API-based LLM usage.
[!NOTE]
Please don’t abuse these services, else we might lose them.
[!WARNING]
This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)
GitHub-Repo-A list of free LLM inference resources accessible via API.
Free Providers
OpenRouter
Limits:
20 requests/minute
50 requests/day
Up to 1000 requests/day with $10 lifetime topup
Models share a common quota.
- Gemma 3 12B Instruct
- Gemma 3 27B Instruct
- Gemma 3 4B Instruct
- Hermes 3 Llama 3.1 405B
- Llama 3.1 405B Instruct
- Llama 3.2 3B Instruct
- Llama 3.3 70B Instruct
- Mistral 7B Instruct
- Mistral Small 3.1 24B Instruct
- Qwen 2.5 VL 7B Instruct
- alibaba/tongyi-deepresearch-30b-a3b:free
- allenai/olmo-3-32b-think:free
- allenai/olmo-3.1-32b-think:free
- arcee-ai/trinity-mini:free
- cognitivecomputations/dolphin-mistral-24b-venice-edition:free
- deepseek/deepseek-r1-0528:free
- google/gemma-3n-e2b-it:free
- google/gemma-3n-e4b-it:free
- kwaipilot/kat-coder-pro:free
- mistralai/devstral-2512:free
- moonshotai/kimi-k2:free
- nex-agi/deepseek-v3.1-nex-n1:free
- nvidia/nemotron-3-nano-30b-a3b:free
- nvidia/nemotron-nano-12b-v2-vl:free
- nvidia/nemotron-nano-9b-v2:free
- openai/gpt-oss-120b:free
- openai/gpt-oss-20b:free
- qwen/qwen3-4b:free
- qwen/qwen3-coder:free
- tngtech/deepseek-r1t-chimera:free
- tngtech/deepseek-r1t2-chimera:free
- tngtech/tng-r1t-chimera:free
- xiaomi/mimo-v2-flash:free
- z-ai/glm-4.5-air:free
Google AI Studio
Data is used for training when used outside of the UK/CH/EEA/EU.
| Model Name | Model Limits |
|---|---|
| Gemini 3 Flash | 250,000 tokens/minute 20 requests/day 5 requests/minute |
| Gemini 2.5 Flash | 250,000 tokens/minute 20 requests/day 5 requests/minute |
| Gemini 2.5 Flash-Lite | 250,000 tokens/minute 20 requests/day 10 requests/minute |
| Gemma 3 27B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
| Gemma 3 12B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
| Gemma 3 4B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
| Gemma 3 1B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
NVIDIA NIM
Phone number verification required. Models tend to be context window limited.
Limits: 40 requests/minute
Mistral (La Plateforme)
- Free tier (Experiment plan) requires opting into data training
- Requires phone number verification.
Limits (per-model): 1 request/second, 500,000 tokens/minute, 1,000,000,000 tokens/month
Mistral (Codestral)
- Currently free to use
- Monthly subscription based
- Requires phone number verification
Limits: 30 requests/minute, 2,000 requests/day
- Codestral
HuggingFace Inference Providers
HuggingFace Serverless Inference limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB.
Limits: $0.10/month in credits
- Various open models across supported providers
Vercel AI Gateway
Routes to various supported providers.
Limits: $5/month
Cerebras
| Model Name | Model Limits |
|---|---|
| gpt-oss-120b | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Qwen 3 235B A22B Instruct | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Llama 3.3 70B | 30 requests/minute 64,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Qwen 3 32B | 30 requests/minute 64,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Llama 3.1 8B | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
| Z.ai GLM-4.6 | 10 requests/minute 60,000 tokens/minute 100 requests/hour 100,000 tokens/hour 100 requests/day 1,000,000 tokens/day |
Groq
| Model Name | Model Limits |
|---|---|
| Allam 2 7B | 7,000 requests/day 6,000 tokens/minute |
| Llama 3.1 8B | 14,400 requests/day 6,000 tokens/minute |
| Llama 3.3 70B | 1,000 requests/day 12,000 tokens/minute |
| Llama 4 Maverick 17B 128E Instruct | 1,000 requests/day 6,000 tokens/minute |
| Llama 4 Scout Instruct | 1,000 requests/day 30,000 tokens/minute |
| Whisper Large v3 | 7,200 audio-seconds/minute 2,000 requests/day |
| Whisper Large v3 Turbo | 7,200 audio-seconds/minute 2,000 requests/day |
| canopylabs/orpheus-arabic-saudi | |
| canopylabs/orpheus-v1-english | |
| groq/compound | 250 requests/day 70,000 tokens/minute |
| groq/compound-mini | 250 requests/day 70,000 tokens/minute |
| meta-llama/llama-guard-4-12b | 14,400 requests/day 15,000 tokens/minute |
| meta-llama/llama-prompt-guard-2-22m | |
| meta-llama/llama-prompt-guard-2-86m | |
| moonshotai/kimi-k2-instruct | 1,000 requests/day 10,000 tokens/minute |
| moonshotai/kimi-k2-instruct-0905 | 1,000 requests/day 10,000 tokens/minute |
| openai/gpt-oss-120b | 1,000 requests/day 8,000 tokens/minute |
| openai/gpt-oss-20b | 1,000 requests/day 8,000 tokens/minute |
| openai/gpt-oss-safeguard-20b | 1,000 requests/day 8,000 tokens/minute |
| qwen/qwen3-32b | 1,000 requests/day 6,000 tokens/minute |
Cohere
Limits:
20 requests/minute
1,000 requests/month
Models share a common monthly quota.
- c4ai-aya-expanse-32b
- c4ai-aya-expanse-8b
- c4ai-aya-vision-32b
- c4ai-aya-vision-8b
- command-a-03-2025
- command-a-reasoning-08-2025
- command-a-translate-08-2025
- command-a-vision-07-2025
- command-r-08-2024
- command-r-plus-08-2024
- command-r7b-12-2024
- command-r7b-arabic-02-2025
GitHub Models
Extremely restrictive input/output token limits.
Limits: Dependent on Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise)
- AI21 Jamba 1.5 Large
- Codestral 25.01
- Cohere Command A
- Cohere Command R 08-2024
- Cohere Command R+ 08-2024
- DeepSeek-R1
- DeepSeek-R1-0528
- DeepSeek-V3-0324
- Grok 3
- Grok 3 Mini
- Llama 4 Maverick 17B 128E Instruct FP8
- Llama 4 Scout 17B 16E Instruct
- Llama-3.2-11B-Vision-Instruct
- Llama-3.2-90B-Vision-Instruct
- Llama-3.3-70B-Instruct
- MAI-DS-R1
- Meta-Llama-3.1-405B-Instruct
- Meta-Llama-3.1-8B-Instruct
- Ministral 3B
- Mistral Medium 3 (25.05)
- Mistral Small 3.1
- OpenAI GPT-4.1
- OpenAI GPT-4.1-mini
- OpenAI GPT-4.1-nano
- OpenAI GPT-4o
- OpenAI GPT-4o mini
- OpenAI Text Embedding 3 (large)
- OpenAI Text Embedding 3 (small)
- OpenAI gpt-5
- OpenAI gpt-5-chat (preview)
- OpenAI gpt-5-mini
- OpenAI gpt-5-nano
- OpenAI o1
- OpenAI o1-mini
- OpenAI o1-preview
- OpenAI o3
- OpenAI o3-mini
- OpenAI o4-mini
- Phi-4
- Phi-4-mini-instruct
- Phi-4-mini-reasoning
- Phi-4-multimodal-instruct
- Phi-4-reasoning
Cloudflare Workers AI
Limits: 10,000 neurons/day
- @cf/aisingapore/gemma-sea-lion-v4-27b-it
- @cf/ibm-granite/granite-4.0-h-micro
- @cf/openai/gpt-oss-120b
- @cf/openai/gpt-oss-20b
- @cf/qwen/qwen3-30b-a3b-fp8
- DeepSeek R1 Distill Qwen 32B
- Deepseek Coder 6.7B Base (AWQ)
- Deepseek Coder 6.7B Instruct (AWQ)
- Deepseek Math 7B Instruct
- Discolm German 7B v1 (AWQ)
- Falcom 7B Instruct
- Gemma 2B Instruct (LoRA)
- Gemma 3 12B Instruct
- Gemma 7B Instruct
- Gemma 7B Instruct (LoRA)
- Hermes 2 Pro Mistral 7B
- Llama 2 13B Chat (AWQ)
- Llama 2 7B Chat (FP16)
- Llama 2 7B Chat (INT8)
- Llama 2 7B Chat (LoRA)
- Llama 3 8B Instruct
- Llama 3 8B Instruct (AWQ)
- Llama 3.1 8B Instruct (AWQ)
- Llama 3.1 8B Instruct (FP8)
- Llama 3.2 11B Vision Instruct
- Llama 3.2 1B Instruct
- Llama 3.2 3B Instruct
- Llama 3.3 70B Instruct (FP8)
- Llama 4 Scout Instruct
- Llama Guard 3 8B
- LlamaGuard 7B (AWQ)
- Mistral 7B Instruct v0.1
- Mistral 7B Instruct v0.1 (AWQ)
- Mistral 7B Instruct v0.2
- Mistral 7B Instruct v0.2 (LoRA)
- Mistral Small 3.1 24B Instruct
- Neural Chat 7B v3.1 (AWQ)
- OpenChat 3.5 0106
- OpenHermes 2.5 Mistral 7B (AWQ)
- Phi-2
- Qwen 1.5 0.5B Chat
- Qwen 1.5 1.8B Chat
- Qwen 1.5 14B Chat (AWQ)
- Qwen 1.5 7B Chat (AWQ)
- Qwen 2.5 Coder 32B Instruct
- Qwen QwQ 32B
- SQLCoder 7B 2
- Starling LM 7B Beta
- TinyLlama 1.1B Chat v1.0
- Una Cybertron 7B v2 (BF16)
- Zephyr 7B Beta (AWQ)
Google Cloud Vertex AI
Very stringent payment verification for Google Cloud.
| Model Name | Model Limits |
|---|---|
| Llama 3.2 90B Vision Instruct | 30 requests/minute Free during preview |
| Llama 3.1 70B Instruct | 60 requests/minute Free during preview |
| Llama 3.1 8B Instruct | 60 requests/minute Free during preview |
Providers with trial credits
Fireworks
Credits: $1
Models: Various open models
Baseten
Credits: $30
Models: Any supported model - pay by compute time
Nebius
Credits: $1
Models: Various open models
Novita
Credits: $0.5 for 1 year
Models: Various open models
AI21
Credits: $10 for 3 months
Models: Jamba family of models
Upstage
Credits: $10 for 3 months
Models: Solar Pro/Mini
NLP Cloud
Credits: $15
Requirements: Phone number verification
Models: Various open models
Alibaba Cloud (International) Model Studio
Credits: 1 million tokens/model
Models: Various open and proprietary Qwen models
Modal
Credits: $5/month upon sign up, $30/month with payment method added
Models: Any supported model - pay by compute time
Inference.net
Credits: $1, $25 on responding to email survey
Models: Various open models
Hyperbolic
Credits: $1
Models:
- DeepSeek V3
- DeepSeek V3 0324
- Llama 3 70B Instruct
- Llama 3.1 405B Base
- Llama 3.1 405B Instruct
- Llama 3.1 70B Instruct
- Llama 3.1 8B Instruct
- Llama 3.2 3B Instruct
- Llama 3.3 70B Instruct
- Pixtral 12B (2409)
- Qwen QwQ 32B
- Qwen2.5 72B Instruct
- Qwen2.5 Coder 32B Instruct
- Qwen2.5 VL 72B Instruct
- Qwen2.5 VL 7B Instruct
- deepseek-ai/deepseek-r1-0528
- openai/gpt-oss-120b
- openai/gpt-oss-120b-turbo
- openai/gpt-oss-20b
- qwen/qwen3-235b-a22b
- qwen/qwen3-235b-a22b-instruct-2507
- qwen/qwen3-coder-480b-a35b-instruct
- qwen/qwen3-next-80b-a3b-instruct
- qwen/qwen3-next-80b-a3b-thinking
SambaNova Cloud
Credits: $5 for 3 months
Models:
- E5-Mistral-7B-Instruct
- Llama 3.1 8B
- Llama 3.3 70B
- Llama 3.3 70B
- Llama-4-Maverick-17B-128E-Instruct
- Qwen/Qwen3-235B
- Qwen/Qwen3-32B
- Whisper-Large-v3
- deepseek-ai/DeepSeek-R1-0528
- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
- deepseek-ai/DeepSeek-V3-0324
- deepseek-ai/DeepSeek-V3.1
- deepseek-ai/DeepSeek-V3.1-Terminus
- openai/gpt-oss-120b
- tbd
Scaleway Generative APIs
Credits: 1,000,000 free tokens
Models:
- BGE-Multilingual-Gemma2
- DeepSeek R1 Distill Llama 70B
- Gemma 3 27B Instruct
- Llama 3.1 8B Instruct
- Llama 3.3 70B Instruct
- Mistral Nemo 2407
- Pixtral 12B (2409)
- Whisper Large v3
- gpt-oss-120b
- holo2-30b-a3b
- mistral-small-3.2-24b-instruct-2506
- qwen3-235b-a22b-instruct-2507
- qwen3-coder-30b-a3b-instruct
- qwen3-embedding-8b
- voxtral-small-24b-2507