Enhance LLM capabilities, multi-engine support, and HF NER pipeline features by baixiac · Pull Request #42 · CogStack/CogStack-ModelServe

baixiac · 2026-06-23T14:18:12Z

feat: add token usage reporting as an option
feat: make sure the prompt and output do not exceed the context window
feat: support LLM tools and json schema for structured output and handle forwarded prefix
feat: add bulk processing support for HF NER models
feat: support output formatting for Ollama generate and chat endpoints
feat: add caching for system prompts and support multi-tool calls
feat: add optional speculative decoding and disable the thinking mode by default
feat: add the option of using the SGLang engine for serving
feat: make generation timeout configurable and support per-request chat template overriding
feat: generate prettier label names if missing
feat: support model packaging with custom quantisation
feat: add cap on the training context and the option to train on the classification head only
feat: support Viterbi decoding with IOBES tagging and label ID and pretty name pairs for HF NER models
feat: add TTFT and TPOT metrics for generative endpoints
feat: handle zero-length tokens and improve label rebalancing
feat: improve the label assignment process
feat: make the LoRA adapter optional for supervised trainings
feat: tune supervised training args based on the available device
feat: add regex support for model parameter freezing
feat: log the total number of tokens used for training
feat: make NER confidence score threshold configurable
fix: fix cross-request KV cache pollution in prefix caching
fix: ignore background labels during metrics computation
fix: fix unknown-concept collection for IOB/IOBES and sanity check with no TPs
fix: fix labelling and improve dynamic batching size calculation
fix: fix entity-level metrics collection
fix: ensure safe tokenizer max model length
perf: avoid CPU oversubcription
perf: enable the instrumentor on generative routes and add curl as a GPU image util
chore: upgrade mlflow to 2.22

feat: make sure the prompt and output do not exceed the context window feat: support LLM tools and json schema for structured output and handle forwarded prefix feat: add bulk processing support for HF NER models feat: support output formatting for Ollama generate and chat endpoints feat: add caching for system prompts and support multi-tool calls feat: add optional speculative decoding and disable the thinking mode by default feat: add the option of using the SGLang engine for serving feat: make generation timeout configurable and support per-request chat template overriding feat: generate prettier label names if missing feat: support model packaging with custom quantisation feat: add cap on the training context and the option to train on the classification head only feat: support Viterbi decoding with IOBES tagging and label ID and pretty name pairs for HF NER models feat: add TTFT and TPOT metrics for generative endpoints feat: handle zero-length tokens and improve label rebalancing feat: improve the label assignment process feat: make the LoRA adapter optional for supervised trainings feat: tune supervised training args based on the available device feat: add regex support for model parameter freezing feat: log the total number of tokens used for training feat: make NER confidence score threshold configurable fix: fix cross-request KV cache pollution in prefix caching fix: ignore background labels during metrics computation fix: fix unknown-concept collection for IOB/IOBES and sanity check with no TPs fix: fix labelling and improve dynamic batching size calculation fix: fix entity-level metrics collection fix: ensure safe tokenizer max model length perf: avoid CPU oversubcription perf: enable the instrumentor on generative routes and add curl as a GPU image util chore: upgrade mlflow to 2.22

Enhance LLM capabilities, multi-engine support, and HF NER pipeline features

baixiac merged commit 326d96f into main Jun 23, 2026
9 checks passed

baixiac deleted the v3 branch June 23, 2026 15:18

baixiac added a commit that referenced this pull request Jun 23, 2026

Merge pull request #42 from CogStack/v3

2d6c476

Enhance LLM capabilities, multi-engine support, and HF NER pipeline features

baixiac added a commit that referenced this pull request Jun 23, 2026

Merge pull request #42 from CogStack/v3

2e2c6b0

Enhance LLM capabilities, multi-engine support, and HF NER pipeline features

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance LLM capabilities, multi-engine support, and HF NER pipeline features#42

Enhance LLM capabilities, multi-engine support, and HF NER pipeline features#42
baixiac merged 1 commit into
mainfrom
v3

baixiac commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

baixiac commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant