v1.76.3-stable - Performance, Video Generation & CloudZero Integration
warning
This release has a known issue where startup is leading to Out of Memory errors when deploying on Kubernetes. We recommend waiting before upgrading to this version.
Deploy this versionโ
- Docker
 - Pip
 
docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.76.3
pip install litellm
pip install litellm==1.76.3
Key Highlightsโ
- Major Performance Improvements +400 RPS when using correct amount of workers + CPU cores combination
 - Video Generation Support - Added Google AI Studio and Vertex AI Veo Video Generation through LiteLLM Pass through routes
 - CloudZero Integration - New cost tracking integration for exporting LiteLLM Usage and Spend data to CloudZero.
 
Major Changesโ
- 
Performance Optimization: LiteLLM Proxy now achieves +400 RPS when using correct amount of CPU cores - PR #14153, PR #14242
By default, LiteLLM will now use
num_workers = os.cpu_count()to achieve optimal performance.Override Options:
Set environment variable:
DEFAULT_NUM_WORKERS_LITELLM_PROXY=1Or start LiteLLM Proxy with:
litellm --num_workers 1 - 
Security Fix: Fixed memory_usage_in_mem_cache cache endpoint vulnerability - PR #14229
 
Performance Improvementsโ
This release includes significant performance optimizations. On our internal benchmarks we saw 1 instance get +400 RPS when using correct amount of workers + CPU cores combination.
- +400 RPS Performance Boost - LiteLLM Proxy now uses correct amount of CPU cores for optimal performance - PR #14153
 - Default CPU Workers - Changed DEFAULT_NUM_WORKERS_LITELLM_PROXY default to number of CPUs - PR #14242
 
New Models / Updated Modelsโ
New Model Supportโ
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features | 
|---|---|---|---|---|---|
| OpenRouter | openrouter/openai/gpt-4.1 | 1M | $2.00 | $8.00 | Chat completions with vision | 
| OpenRouter | openrouter/openai/gpt-4.1-mini | 1M | $0.40 | $1.60 | Efficient chat completions | 
| OpenRouter | openrouter/openai/gpt-4.1-nano | 1M | $0.10 | $0.40 | Ultra-efficient chat | 
| Vertex AI | vertex_ai/openai/gpt-oss-20b-maas | 131K | $0.075 | $0.30 | Reasoning support | 
| Vertex AI | vertex_ai/openai/gpt-oss-120b-maas | 131K | $0.15 | $0.60 | Advanced reasoning | 
| Gemini | gemini/veo-3.0-generate-preview | 1K | - | $0.75/sec | Video generation | 
| Gemini | gemini/veo-3.0-fast-generate-preview | 1K | - | $0.40/sec | Fast video generation | 
| Gemini | gemini/veo-2.0-generate-001 | 1K | - | $0.35/sec | Video generation | 
| Volcengine | doubao-embedding-large | 4K | Free | Free | 2048-dim embeddings | 
| Together AI | together_ai/deepseek-ai/DeepSeek-V3.1 | 128K | $0.60 | $1.70 | Reasoning support | 
Featuresโ
- Google Gemini
 - OpenRouter
- Added GPT-4.1 model family - PR #14101
 
 - Groq
- Added support for reasoning_effort parameter - PR #14207
 
 - X.AI
- Fixed XAI cost calculation - PR #14127
 
 - Vertex AI
 - VLLM
- Handle output parsing responses API output - PR #14121
 
 - Ollama
- Added unified 'thinking' param support via 
reasoning_content- PR #14121 
 - Added unified 'thinking' param support via 
 - Anthropic
- Added supported text field to anthropic citation response - PR #14126
 
 - OCI Provider
- Handle assistant messages with both content and tool_calls - PR #14171
 
 - Bedrock
 - Databricks
- Added support for anthropic citation API in Databricks - PR #14077
 
 
Bug Fixesโ
New Provider Supportโ
- Volcengine
- Added Volcengine embedding module with handler and transformation logic - PR #14028
 
 
LLM API Endpointsโ
Featuresโ
- Images API
 - Responses API
 - Bedrock Passthrough
- Support AWS_BEDROCK_RUNTIME_ENDPOINT on bedrock passthrough - PR #14156
 
 - Google AI Studio Passthrough
- Allow using Veo Video Generation through LiteLLM Pass through routes - PR #14228
 
 - General
 
Bugsโ
- General
 
Spend Tracking, Budgets and Rate Limitingโ
Featuresโ
- Added header support for spend_logs_metadata - PR #14186
 - Litellm passthrough cost tracking for chat completion - PR #14256
 
Bug Fixesโ
Management Endpoints / UIโ
Featuresโ
- UI Improvements
- Logs page screen size fixed - PR #14135
 - Create Organization Tooltip added on Success - PR #14132
 - Back to Keys should say Back to Logs - PR #14134
 - Add client side pagination on All Models table - PR #14136
 - Model Filters UI improvement - PR #14131
 - Remove table filter on user info page - PR #14169
 - Team name badge added on the User Details - PR #14003
 - Fix: Log page parameter passing error - PR #14193
 
 - Authentication & Authorization
 
Bugsโ
- General
- Validate store model in db setting - PR #14269
 
 
Logging / Guardrail Integrationsโ
Featuresโ
- Datadog
- Ensure 
apm_idis set on DD LLM Observability traces - PR #14272 
 - Ensure 
 - Braintrust
- Fix logging when OTEL is enabled - PR #14122
 
 - OTEL
- Optional Metrics and Logs following semantic conventions - PR #14179
 
 - Slack Alerting
- Added alert type to alert message to slack for easier handling - PR #14176
 
 
Guardrailsโ
- Added guardrail to the Anthropic API endpoint - PR #14107
 
New Integrationโ
Performance / Loadbalancing / Reliability improvementsโ
Featuresโ
- Performance
 - Monitoring
- Added Prometheus missing metrics - PR #14139
 
 - Timeout
- Stream Timeout Control - Allow using 
x-litellm-stream-timeoutheader for stream timeout in requests - PR #14147 
 - Stream Timeout Control - Allow using 
 - Routing
- Fixed x-litellm-tags not routing with Responses API - PR #14289
 
 
Bugsโ
- Security
- Fixed memory_usage_in_mem_cache cache endpoint vulnerability - PR #14229
 
 
General Proxy Improvementsโ
Featuresโ
- SCIM Support
 - Kubernetes
- Added optional PodDisruptionBudget for litellm proxy - PR #14093
 
 - Error Handling
- Add model to azure error message - PR #14294
 
 
New Contributorsโ
- @iabhi4 made their first contribution in PR #14093
 - @zainhas made their first contribution in PR #14087
 - @LifeDJIK made their first contribution in PR #14146
 - @retanoj made their first contribution in PR #14133
 - @zhxlp made their first contribution in PR #14193
 - @kayoch1n made their first contribution in PR #14191
 - @kutsushitaneko made their first contribution in PR #14171
 - @mjmendo made their first contribution in PR #14176
 - @HarshavardhanK made their first contribution in PR #14213
 - @eycjur made their first contribution in PR #14207
 - @22mSqRi made their first contribution in PR #14241
 - @onlylhf made their first contribution in PR #14028
 - @btpemercier made their first contribution in PR #11319
 - @tremlin made their first contribution in PR #14287
 - @TobiMayr made their first contribution in PR #14262
 - @Eitan1112 made their first contribution in PR #14252
 

