Open
Description
Required for release
Documentation improvement
EPP refactor focused on extension
- support filters and scorers registration in build-time #718
- EPP should allow configurable metrics collection #703
- Refactor Ext-proc server logic to better reflect EPP Layers #733
Production hardening of reference implementation
EPP
- Support setting fallback endpoints in EPP picking #414
- Support endpoint subsetting #415
- EPP HA deployment #692
- Hitless Rollout Investigation #557
LoRA Syncer issues
- A metric in LoRA syncer tracking loaded adapters #600
- lora-syncer tool's error handling needs improvement #584
Algorithm development improvement
- Implement Lightweight Scheduler Simulation Tests for Inference Gateway #709
- Benchmark Test Harness #732
Prefix-aware routing
Queuing/Criticality Enforcement
Stretch
BBR improvements
Conformance testing
Concrete example of a multi-workload InferencePool in use
InferenceModel extensible routing
- Not just LoRA (ex. RAG, Sys prompts, leave room for potential expansion for things such as Activation Engineering )