Skip to content

Enhance LLM capabilities, multi-engine support, and HF NER pipeline features#42

Merged
baixiac merged 1 commit into
mainfrom
v3
Jun 23, 2026
Merged

Enhance LLM capabilities, multi-engine support, and HF NER pipeline features#42
baixiac merged 1 commit into
mainfrom
v3

Conversation

@baixiac

@baixiac baixiac commented Jun 23, 2026

Copy link
Copy Markdown
Member

feat: add token usage reporting as an option
feat: make sure the prompt and output do not exceed the context window
feat: support LLM tools and json schema for structured output and handle forwarded prefix
feat: add bulk processing support for HF NER models
feat: support output formatting for Ollama generate and chat endpoints
feat: add caching for system prompts and support multi-tool calls
feat: add optional speculative decoding and disable the thinking mode by default
feat: add the option of using the SGLang engine for serving
feat: make generation timeout configurable and support per-request chat template overriding
feat: generate prettier label names if missing
feat: support model packaging with custom quantisation
feat: add cap on the training context and the option to train on the classification head only
feat: support Viterbi decoding with IOBES tagging and label ID and pretty name pairs for HF NER models
feat: add TTFT and TPOT metrics for generative endpoints
feat: handle zero-length tokens and improve label rebalancing
feat: improve the label assignment process
feat: make the LoRA adapter optional for supervised trainings
feat: tune supervised training args based on the available device
feat: add regex support for model parameter freezing
feat: log the total number of tokens used for training
feat: make NER confidence score threshold configurable
fix: fix cross-request KV cache pollution in prefix caching
fix: ignore background labels during metrics computation
fix: fix unknown-concept collection for IOB/IOBES and sanity check with no TPs
fix: fix labelling and improve dynamic batching size calculation
fix: fix entity-level metrics collection
fix: ensure safe tokenizer max model length
perf: avoid CPU oversubcription
perf: enable the instrumentor on generative routes and add curl as a GPU image util
chore: upgrade mlflow to 2.22

feat: make sure the prompt and output do not exceed the context window
feat: support LLM tools and json schema for structured output and handle forwarded prefix
feat: add bulk processing support for HF NER models
feat: support output formatting for Ollama generate and chat endpoints
feat: add caching for system prompts and support multi-tool calls
feat: add optional speculative decoding and disable the thinking mode by default
feat: add the option of using the SGLang engine for serving
feat: make generation timeout configurable and support per-request chat template overriding
feat: generate prettier label names if missing
feat: support model packaging with custom quantisation
feat: add cap on the training context and the option to train on the classification head only
feat: support Viterbi decoding with IOBES tagging and label ID and pretty name pairs for HF NER models
feat: add TTFT and TPOT metrics for generative endpoints
feat: handle zero-length tokens and improve label rebalancing
feat: improve the label assignment process
feat: make the LoRA adapter optional for supervised trainings
feat: tune supervised training args based on the available device
feat: add regex support for model parameter freezing
feat: log the total number of tokens used for training
feat: make NER confidence score threshold configurable
fix: fix cross-request KV cache pollution in prefix caching
fix: ignore background labels during metrics computation
fix: fix unknown-concept collection for IOB/IOBES and sanity check with no TPs
fix: fix labelling and improve dynamic batching size calculation
fix: fix entity-level metrics collection
fix: ensure safe tokenizer max model length
perf: avoid CPU oversubcription
perf: enable the instrumentor on generative routes and add curl as a GPU image util
chore: upgrade mlflow to 2.22
@baixiac baixiac merged commit 326d96f into main Jun 23, 2026
9 checks passed
@baixiac baixiac deleted the v3 branch June 23, 2026 15:18
baixiac added a commit that referenced this pull request Jun 23, 2026
Enhance LLM capabilities, multi-engine support, and HF NER pipeline features
baixiac added a commit that referenced this pull request Jun 23, 2026
Enhance LLM capabilities, multi-engine support, and HF NER pipeline features
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant