Replace manual JSON parsing with Instructor for structured output reliability#339
Conversation
…iability
- Add llm_structured() and llm_astructured() in utils.py using instructor.from_litellm()
- Add Pydantic models for all 11 structured outputs in page_index.py
- Replace all extract_json() calls with typed Instructor calls
- Remove 'Directly return the final JSON structure' from all prompts
- Fixes silent KeyError crashes when extract_json() returned {} on malformed LLM output
|
@pegahmansourian Thanks for this — moving to structured output (instructor + Pydantic) is a solid direction and it removes a whole class of manual-parsing bugs. A few things to address before we can take it:
Minor/practical: instructor isn't in requirements.txt (install will break), and the branch now conflicts with main — will need a rebase. |
Problem
The codebase uses manual JSON prompt instructions across 11 locations in
page_index.py, with a customextract_json()parser inutils.pythathandles malformed output through multiple fallback strategies.
When the LLM returns slightly malformed JSON,
extract_json()silentlyreturns
{}, causing downstreamKeyErrorcrashes like:Solution
Replace manual JSON parsing with Instructor,
which is patched directly onto the existing LiteLLM client already used in the codebase.
Changes
utils.pysync_instructor_clientandasync_instructor_clientpatched viainstructor.from_litellm()llm_structured()for sync structured callsllm_astructured()for async structured callsextract_json()andget_json_content()can be removed once all callers are updatedpage_index.pyextract_json()calls withllm_structured()/llm_astructured()"Directly return the final JSON structure"from all prompts — Instructor handles this automaticallyBenefits
{}returnsLiteral["yes", "no"]fields are always validinstructoris the only new dependencyNew dependency
This PR adds one new dependency:
instructorAdd to
requirements.txt:instructorOr install directly: