Free-LLM-API-Resources

This article introduces a list of free LLM inference resources accessible via API.

Free LLM API resources

This lists various services that provide free access or credits towards API-based LLM usage.

[!NOTE]
Please don’t abuse these services, else we might lose them.

[!WARNING]
This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)

GitHub-Repo-A list of free LLM inference resources accessible via API.

Free Providers

OpenRouter

Limits:

20 requests/minute
50 requests/day
Up to 1000 requests/day with $10 lifetime topup

Models share a common quota.

Google AI Studio

Data is used for training when used outside of the UK/CH/EEA/EU.

Model NameModel Limits
Gemini 3 Flash250,000 tokens/minute
20 requests/day
5 requests/minute
Gemini 2.5 Flash250,000 tokens/minute
20 requests/day
5 requests/minute
Gemini 2.5 Flash-Lite250,000 tokens/minute
20 requests/day
10 requests/minute
Gemma 3 27B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 12B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 4B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 1B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute

NVIDIA NIM

Phone number verification required. Models tend to be context window limited.

Limits: 40 requests/minute

Mistral (La Plateforme)

  • Free tier (Experiment plan) requires opting into data training
  • Requires phone number verification.

Limits (per-model): 1 request/second, 500,000 tokens/minute, 1,000,000,000 tokens/month

Mistral (Codestral)

  • Currently free to use
  • Monthly subscription based
  • Requires phone number verification

Limits: 30 requests/minute, 2,000 requests/day

  • Codestral

HuggingFace Inference Providers

HuggingFace Serverless Inference limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB.

Limits: $0.10/month in credits

  • Various open models across supported providers

Vercel AI Gateway

Routes to various supported providers.

Limits: $5/month

Cerebras

Model NameModel Limits
gpt-oss-120b30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Qwen 3 235B A22B Instruct30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Llama 3.3 70B30 requests/minute
64,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Qwen 3 32B30 requests/minute
64,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Llama 3.1 8B30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Z.ai GLM-4.610 requests/minute
60,000 tokens/minute
100 requests/hour
100,000 tokens/hour
100 requests/day
1,000,000 tokens/day

Groq

Model NameModel Limits
Allam 2 7B7,000 requests/day
6,000 tokens/minute
Llama 3.1 8B14,400 requests/day
6,000 tokens/minute
Llama 3.3 70B1,000 requests/day
12,000 tokens/minute
Llama 4 Maverick 17B 128E Instruct1,000 requests/day
6,000 tokens/minute
Llama 4 Scout Instruct1,000 requests/day
30,000 tokens/minute
Whisper Large v37,200 audio-seconds/minute
2,000 requests/day
Whisper Large v3 Turbo7,200 audio-seconds/minute
2,000 requests/day
canopylabs/orpheus-arabic-saudi
canopylabs/orpheus-v1-english
groq/compound250 requests/day
70,000 tokens/minute
groq/compound-mini250 requests/day
70,000 tokens/minute
meta-llama/llama-guard-4-12b14,400 requests/day
15,000 tokens/minute
meta-llama/llama-prompt-guard-2-22m
meta-llama/llama-prompt-guard-2-86m
moonshotai/kimi-k2-instruct1,000 requests/day
10,000 tokens/minute
moonshotai/kimi-k2-instruct-09051,000 requests/day
10,000 tokens/minute
openai/gpt-oss-120b1,000 requests/day
8,000 tokens/minute
openai/gpt-oss-20b1,000 requests/day
8,000 tokens/minute
openai/gpt-oss-safeguard-20b1,000 requests/day
8,000 tokens/minute
qwen/qwen3-32b1,000 requests/day
6,000 tokens/minute

Cohere

Limits:

20 requests/minute
1,000 requests/month

Models share a common monthly quota.

  • c4ai-aya-expanse-32b
  • c4ai-aya-expanse-8b
  • c4ai-aya-vision-32b
  • c4ai-aya-vision-8b
  • command-a-03-2025
  • command-a-reasoning-08-2025
  • command-a-translate-08-2025
  • command-a-vision-07-2025
  • command-r-08-2024
  • command-r-plus-08-2024
  • command-r7b-12-2024
  • command-r7b-arabic-02-2025

GitHub Models

Extremely restrictive input/output token limits.

Limits: Dependent on Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise)

  • AI21 Jamba 1.5 Large
  • Codestral 25.01
  • Cohere Command A
  • Cohere Command R 08-2024
  • Cohere Command R+ 08-2024
  • DeepSeek-R1
  • DeepSeek-R1-0528
  • DeepSeek-V3-0324
  • Grok 3
  • Grok 3 Mini
  • Llama 4 Maverick 17B 128E Instruct FP8
  • Llama 4 Scout 17B 16E Instruct
  • Llama-3.2-11B-Vision-Instruct
  • Llama-3.2-90B-Vision-Instruct
  • Llama-3.3-70B-Instruct
  • MAI-DS-R1
  • Meta-Llama-3.1-405B-Instruct
  • Meta-Llama-3.1-8B-Instruct
  • Ministral 3B
  • Mistral Medium 3 (25.05)
  • Mistral Small 3.1
  • OpenAI GPT-4.1
  • OpenAI GPT-4.1-mini
  • OpenAI GPT-4.1-nano
  • OpenAI GPT-4o
  • OpenAI GPT-4o mini
  • OpenAI Text Embedding 3 (large)
  • OpenAI Text Embedding 3 (small)
  • OpenAI gpt-5
  • OpenAI gpt-5-chat (preview)
  • OpenAI gpt-5-mini
  • OpenAI gpt-5-nano
  • OpenAI o1
  • OpenAI o1-mini
  • OpenAI o1-preview
  • OpenAI o3
  • OpenAI o3-mini
  • OpenAI o4-mini
  • Phi-4
  • Phi-4-mini-instruct
  • Phi-4-mini-reasoning
  • Phi-4-multimodal-instruct
  • Phi-4-reasoning

Cloudflare Workers AI

Limits: 10,000 neurons/day

  • @cf/aisingapore/gemma-sea-lion-v4-27b-it
  • @cf/ibm-granite/granite-4.0-h-micro
  • @cf/openai/gpt-oss-120b
  • @cf/openai/gpt-oss-20b
  • @cf/qwen/qwen3-30b-a3b-fp8
  • DeepSeek R1 Distill Qwen 32B
  • Deepseek Coder 6.7B Base (AWQ)
  • Deepseek Coder 6.7B Instruct (AWQ)
  • Deepseek Math 7B Instruct
  • Discolm German 7B v1 (AWQ)
  • Falcom 7B Instruct
  • Gemma 2B Instruct (LoRA)
  • Gemma 3 12B Instruct
  • Gemma 7B Instruct
  • Gemma 7B Instruct (LoRA)
  • Hermes 2 Pro Mistral 7B
  • Llama 2 13B Chat (AWQ)
  • Llama 2 7B Chat (FP16)
  • Llama 2 7B Chat (INT8)
  • Llama 2 7B Chat (LoRA)
  • Llama 3 8B Instruct
  • Llama 3 8B Instruct (AWQ)
  • Llama 3.1 8B Instruct (AWQ)
  • Llama 3.1 8B Instruct (FP8)
  • Llama 3.2 11B Vision Instruct
  • Llama 3.2 1B Instruct
  • Llama 3.2 3B Instruct
  • Llama 3.3 70B Instruct (FP8)
  • Llama 4 Scout Instruct
  • Llama Guard 3 8B
  • LlamaGuard 7B (AWQ)
  • Mistral 7B Instruct v0.1
  • Mistral 7B Instruct v0.1 (AWQ)
  • Mistral 7B Instruct v0.2
  • Mistral 7B Instruct v0.2 (LoRA)
  • Mistral Small 3.1 24B Instruct
  • Neural Chat 7B v3.1 (AWQ)
  • OpenChat 3.5 0106
  • OpenHermes 2.5 Mistral 7B (AWQ)
  • Phi-2
  • Qwen 1.5 0.5B Chat
  • Qwen 1.5 1.8B Chat
  • Qwen 1.5 14B Chat (AWQ)
  • Qwen 1.5 7B Chat (AWQ)
  • Qwen 2.5 Coder 32B Instruct
  • Qwen QwQ 32B
  • SQLCoder 7B 2
  • Starling LM 7B Beta
  • TinyLlama 1.1B Chat v1.0
  • Una Cybertron 7B v2 (BF16)
  • Zephyr 7B Beta (AWQ)

Google Cloud Vertex AI

Very stringent payment verification for Google Cloud.

Model NameModel Limits
Llama 3.2 90B Vision Instruct30 requests/minute
Free during preview
Llama 3.1 70B Instruct60 requests/minute
Free during preview
Llama 3.1 8B Instruct60 requests/minute
Free during preview

Providers with trial credits

Fireworks

Credits: $1

Models: Various open models

Baseten

Credits: $30

Models: Any supported model - pay by compute time

Nebius

Credits: $1

Models: Various open models

Novita

Credits: $0.5 for 1 year

Models: Various open models

AI21

Credits: $10 for 3 months

Models: Jamba family of models

Upstage

Credits: $10 for 3 months

Models: Solar Pro/Mini

NLP Cloud

Credits: $15

Requirements: Phone number verification

Models: Various open models

Alibaba Cloud (International) Model Studio

Credits: 1 million tokens/model

Models: Various open and proprietary Qwen models

Credits: $5/month upon sign up, $30/month with payment method added

Models: Any supported model - pay by compute time

Inference.net

Credits: $1, $25 on responding to email survey

Models: Various open models

Hyperbolic

Credits: $1

Models:

  • DeepSeek V3
  • DeepSeek V3 0324
  • Llama 3 70B Instruct
  • Llama 3.1 405B Base
  • Llama 3.1 405B Instruct
  • Llama 3.1 70B Instruct
  • Llama 3.1 8B Instruct
  • Llama 3.2 3B Instruct
  • Llama 3.3 70B Instruct
  • Pixtral 12B (2409)
  • Qwen QwQ 32B
  • Qwen2.5 72B Instruct
  • Qwen2.5 Coder 32B Instruct
  • Qwen2.5 VL 72B Instruct
  • Qwen2.5 VL 7B Instruct
  • deepseek-ai/deepseek-r1-0528
  • openai/gpt-oss-120b
  • openai/gpt-oss-120b-turbo
  • openai/gpt-oss-20b
  • qwen/qwen3-235b-a22b
  • qwen/qwen3-235b-a22b-instruct-2507
  • qwen/qwen3-coder-480b-a35b-instruct
  • qwen/qwen3-next-80b-a3b-instruct
  • qwen/qwen3-next-80b-a3b-thinking

SambaNova Cloud

Credits: $5 for 3 months

Models:

  • E5-Mistral-7B-Instruct
  • Llama 3.1 8B
  • Llama 3.3 70B
  • Llama 3.3 70B
  • Llama-4-Maverick-17B-128E-Instruct
  • Qwen/Qwen3-235B
  • Qwen/Qwen3-32B
  • Whisper-Large-v3
  • deepseek-ai/DeepSeek-R1-0528
  • deepseek-ai/DeepSeek-R1-Distill-Llama-70B
  • deepseek-ai/DeepSeek-V3-0324
  • deepseek-ai/DeepSeek-V3.1
  • deepseek-ai/DeepSeek-V3.1-Terminus
  • openai/gpt-oss-120b
  • tbd

Scaleway Generative APIs

Credits: 1,000,000 free tokens

Models:

  • BGE-Multilingual-Gemma2
  • DeepSeek R1 Distill Llama 70B
  • Gemma 3 27B Instruct
  • Llama 3.1 8B Instruct
  • Llama 3.3 70B Instruct
  • Mistral Nemo 2407
  • Pixtral 12B (2409)
  • Whisper Large v3
  • gpt-oss-120b
  • holo2-30b-a3b
  • mistral-small-3.2-24b-instruct-2506
  • qwen3-235b-a22b-instruct-2507
  • qwen3-coder-30b-a3b-instruct
  • qwen3-embedding-8b
  • voxtral-small-24b-2507
comments powered by Disqus