Conversation
feat: make sure the prompt and output do not exceed the context window feat: support LLM tools and json schema for structured output and handle forwarded prefix feat: add bulk processing support for HF NER models feat: support output formatting for Ollama generate and chat endpoints feat: add caching for system prompts and support multi-tool calls feat: add optional speculative decoding and disable the thinking mode by default feat: add the option of using the SGLang engine for serving feat: make generation timeout configurable and support per-request chat template overriding feat: generate prettier label names if missing feat: support model packaging with custom quantisation feat: add cap on the training context and the option to train on the classification head only feat: support Viterbi decoding with IOBES tagging and label ID and pretty name pairs for HF NER models feat: add TTFT and TPOT metrics for generative endpoints feat: handle zero-length tokens and improve label rebalancing feat: improve the label assignment process feat: make the LoRA adapter optional for supervised trainings feat: tune supervised training args based on the available device feat: add regex support for model parameter freezing feat: log the total number of tokens used for training feat: make NER confidence score threshold configurable fix: fix cross-request KV cache pollution in prefix caching fix: ignore background labels during metrics computation fix: fix unknown-concept collection for IOB/IOBES and sanity check with no TPs fix: fix labelling and improve dynamic batching size calculation fix: fix entity-level metrics collection fix: ensure safe tokenizer max model length perf: avoid CPU oversubcription perf: enable the instrumentor on generative routes and add curl as a GPU image util chore: upgrade mlflow to 2.22
baixiac
added a commit
that referenced
this pull request
Jun 23, 2026
Enhance LLM capabilities, multi-engine support, and HF NER pipeline features
baixiac
added a commit
that referenced
this pull request
Jun 23, 2026
Enhance LLM capabilities, multi-engine support, and HF NER pipeline features
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat: add token usage reporting as an option
feat: make sure the prompt and output do not exceed the context window
feat: support LLM tools and json schema for structured output and handle forwarded prefix
feat: add bulk processing support for HF NER models
feat: support output formatting for Ollama generate and chat endpoints
feat: add caching for system prompts and support multi-tool calls
feat: add optional speculative decoding and disable the thinking mode by default
feat: add the option of using the SGLang engine for serving
feat: make generation timeout configurable and support per-request chat template overriding
feat: generate prettier label names if missing
feat: support model packaging with custom quantisation
feat: add cap on the training context and the option to train on the classification head only
feat: support Viterbi decoding with IOBES tagging and label ID and pretty name pairs for HF NER models
feat: add TTFT and TPOT metrics for generative endpoints
feat: handle zero-length tokens and improve label rebalancing
feat: improve the label assignment process
feat: make the LoRA adapter optional for supervised trainings
feat: tune supervised training args based on the available device
feat: add regex support for model parameter freezing
feat: log the total number of tokens used for training
feat: make NER confidence score threshold configurable
fix: fix cross-request KV cache pollution in prefix caching
fix: ignore background labels during metrics computation
fix: fix unknown-concept collection for IOB/IOBES and sanity check with no TPs
fix: fix labelling and improve dynamic batching size calculation
fix: fix entity-level metrics collection
fix: ensure safe tokenizer max model length
perf: avoid CPU oversubcription
perf: enable the instrumentor on generative routes and add curl as a GPU image util
chore: upgrade mlflow to 2.22