Add ask_vlm method for cloud VLM alert verification by srnangi · Pull Request #442 · groundlight/python-sdk

srnangi · 2026-06-18T00:11:05Z

What

Adds Groundlight.ask_vlm(media, query, model_id) for VLM-based alert verification. Calls POST /v1/vlm-verifications on the Groundlight cloud (AWS Bedrock) and returns a VLMVerificationResult with verdict (YES/NO/UNSURE), confidence, reasoning, and token cost fields.

Pairs with the janzu PR (zuuul#6519). No local inference — VLM runs entirely in the cloud.

How

media accepts 1–8 images (single or list). Accepts numpy BGR arrays, PIL Images, bytes, BytesIO/BufferedReader, or filename strings — encoded via the existing parse_supported_image_types utility.
POSTs multipart/form-data: media parts as image/jpeg files; query and model_id as form fields (not URL params, so the prompt never leaks into access logs).
model_id is a friendly alias (e.g. "gpt-5.4", "claude-sonnet-4.5"). The server maps it to the real Bedrock model ID and returns 400 on an unknown alias. Defaults to the server-configured default.
Exports VLMVerificationResult from the package root.

Usage

gl = Groundlight()

# Single image
result = gl.ask_vlm(frame, query="Is there a fire in this image?")
if result.verdict == "YES":
    emit_alert()

# Full frame + cropped ROI — describe each in the query
result = gl.ask_vlm(
    media=[full_frame, roi_crop],
    query="Image 1 is the full camera frame; image 2 is the cropped region "
          "a detector flagged. Is there really a fire?",
)
print(result.confidence, result.reasoning, result.total_cost_usd)

Changes

src/groundlight/client.py: ask_vlm method + VLMVerificationResult dataclass
src/groundlight/__init__.py: exports VLMVerificationResult
test/unit/test_ask_vlm.py: 5 unit tests (mocked HTTP)

Bug fix included

sanitize_endpoint_url strips the trailing slash from self.endpoint, so the original code produced .../device-apiv1/vlm-verifications. Fixed to /v1/vlm-verifications. Regression test included.

Testing

Since the endpoint isn't deployed yet, tests are mocked — no live server. 5 tests covering what's meaningful in that context:

Result fields correctly unpacked from server response JSON
Numpy image encoded to JPEG and sent as multipart media part
query/model_id sent as form fields, not URL params (prompt must not appear in logs)
More than 8 media items raises ValueError before any network call
URL contains correct path (/device-api/v1/vlm-verifications)

🤖 Generated with Claude Code

Add Groundlight.ask_vlm(images, query, model_id) which verifies one or two images against a natural-language query by calling POST /v1/vlm-queries. Returns a VLMVerificationResult dataclass with verdict (YES/NO/UNSURE), confidence, reasoning, and token cost. - Accepts a single image or [full_frame, roi] for the dual-image strategy, reusing parse_supported_image_types for encoding. - Moves the requests import to module level. - Exports VLMVerificationResult from the package. - Unit tests with mocked HTTP. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- POST query and model_id as multipart form fields (data=) instead of query-string params, matching the updated endpoint and keeping long prompts out of URLs and access logs. - model_id is now a friendly alias (e.g. "gpt-5.4", "claude-sonnet-4.5") resolved server-side, not a raw Bedrock model ID. - Tests updated to assert form-field transport. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Drop the gpt-5.4 example (OpenAI models on Bedrock are text-only and cannot do image verification); use claude-sonnet-4.5 / nova-pro instead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Match the generalized endpoint: param images -> media, multipart field 'media', guard raised from 2 to 8. The query should describe each media item (server makes no frame/ROI assumption). Docstring + tests updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Endpoint renamed server-side from vlm-queries to vlm-verifications. Update the SDK POST path and test fixtures accordingly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

sanitize_endpoint_url() strips the trailing slash from self.endpoint, so joining without "/" produced ".../device-apiv1/vlm-verifications" instead of ".../device-api/v1/vlm-verifications". Added test_url_has_correct_path to pin the correct URL shape. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… tests - model_id docstring now lists all current supported aliases with a note that the server is the source of truth (400 on unknown alias) - documents that corrupted bytes are validated server-side -> HTTPError 400 - rewrites test_ask_vlm.py as module-level functions matching repo convention - adds test_timeout_passed_to_requests: verifies timeout kwarg forwarded - adds test_corrupted_image_bytes_raises_http_error: server 400 -> HTTPError Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Drop tests that only verify kwarg passthroughs or mock server-side behavior (timeout forwarding, corrupted-image 400, dual-image loop, model_id omission). Keep the five that catch real issues or verify non-obvious invariants: result parsing, image encoding, form-field vs URL security property, >8 guard, and the URL path bug. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…suppression - Extract MAX_VLM_MEDIA_ITEMS = 8 constant (fixes PLR2004 magic value in client.py) - test_ask_vlm: use optional numpy import + @skipif(MISSING_NUMPY) so test collection doesn't fail in envs without numpy (fixes ModuleNotFoundError) - Replace _FAKE_JPEG bytes for tests that don't exercise encoding (removes numpy dependency from 4 of 5 tests entirely) - Remove magic 400 assertion from result-parsing test (fixes PLR2004 in test) - Add pylint: disable comments on VLMVerificationResult (too-many-instance-attributes) and ask_vlm (too-many-locals) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a new SDK surface for VLM-based alert verification against the Groundlight cloud by introducing Groundlight.ask_vlm(...), returning a structured VLMVerificationResult, and exporting that result type from the package root. The PR also adds mocked unit tests and includes a regression test for correct URL path construction.

Changes:

Add Groundlight.ask_vlm(media, query, model_id) that uploads 1–8 images as multipart/form-data to /v1/vlm-verifications and parses the response into VLMVerificationResult.
Export VLMVerificationResult from groundlight.__init__.
Add unit tests for request construction (multipart fields/files, URL path) and response parsing.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
`src/groundlight/client.py`	Adds `VLMVerificationResult`, `MAX_VLM_MEDIA_ITEMS`, and the new `ask_vlm` method using `requests` multipart upload.
`src/groundlight/__init__.py`	Re-exports `VLMVerificationResult` from the package root.
`test/unit/test_ask_vlm.py`	Adds mocked unit tests validating request/response handling and URL path behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix type annotation: List now typed as List[Union[...]] to match all supported per-item types, not just List[np.ndarray] - Add empty-list guard: raise ValueError before any network call - Fix URL version doubling: strip trailing /vN from endpoint before appending /v1/vlm-verifications (sanitize_endpoint_url allows /v1 paths) - Use time.time_ns() for request ID to avoid ms-precision collisions - Clarify docstring: bytes/streams accept any common format (JPEG/PNG/WEBP), server normalises to JPEG server-side Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

- Corrects the accepted filename types (JPEG/PNG only; WEBP filenames unsupported by bytestream_from_filename) and clarifies that raw bytes/BytesIO/BufferedReader are sent as-is (server normalises via ensure_jpeg_format regardless of declared content-type). - Adds test_url_no_version_duplication_for_versioned_endpoint to guard against the /v1/v1/ double-version regression. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The CLI auto-registers all public Groundlight methods but raises KeyError if a method is missing from _COMMAND_GROUPS. Add ask_vlm to _COMMAND_GROUPS and _GROUP_ORDER under a new "VLM Verification" panel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

timmarkhuff · 2026-06-25T16:55:44Z

+MAX_VLM_MEDIA_ITEMS = 8
+
+
+@dataclass


This should be a Pydantic model. That matches the pattern we have established in this SDK.

Consistency aside, I think there will also be a problem with the CLI if we use a dataclass. The CLI has special logic to pretty print objects, and I don't believe it has any handling for dataclasses.

We shouldn't be hand writing a class like this, should we? Most classes like this are autogenerated by the API spec.

Fixed: converted VLMVerificationResult from a dataclass to a pydantic BaseModel, matching the established SDK pattern and enabling proper CLI pretty-printing via model_dump_json.

🤖 Addressed by Claude Code

Acknowledged — the ideal end-state is for VLMVerificationResult to be generated from the OpenAPI spec (alongside VlmVerification in zuuul). The janzu endpoint is not deployed yet, so wiring up spec generation is a follow-on task. For now the hand-written Pydantic BaseModel gives correct CLI behaviour and the same interface, and can be replaced by the generated model once the spec is stable.

🤖 Addressed by Claude Code

timmarkhuff · 2026-06-25T16:58:52Z

+    id: str
+    query: str
+    model_id: str
+    verdict: str  # "YES" | "NO" | "UNSURE"


Should we call this label or result instead of verdict? That would match established patterns better. But maybe this is meant to be intentionally different?

It feels like we should be making use of existing objects, like Result. Right now, this class has verdict and confidence as two separate, seemingly unrelated concepts, but shouldn't they be grouped together as a Result?

Not changed — VLM verification outputs YES/NO/UNSURE which is intentionally different from image-query binary/multiclass labels. Grouping verdict+confidence into a nested Result felt like premature reuse: the existing Result objects are generated from the detector image-query spec and carry detector-specific semantics. Keeping them flat and top-level on VLMVerificationResult avoids that coupling. Happy to revisit if there is a strong preference for nesting, or once the spec is auto-generated.

🤖 Addressed by Claude Code

timmarkhuff · 2026-06-25T17:01:09Z

+    created_at: str
+    input_tokens: Optional[int] = None
+    output_tokens: Optional[int] = None
+    total_cost_usd: Optional[float] = None


I've heard it argued that float is not great for representing currency due to rounding issues. int or Decimal might be better.

Left as float — total_cost_usd is an informational/display field echoed back from the server; the SDK performs no arithmetic on it, so Decimal's precision guarantees are not needed here. If the server ever returns this value in a way that requires client-side summing or billing math, switching to Decimal would be the right call.

🤖 Addressed by Claude Code

timmarkhuff · 2026-06-25T17:03:33Z

+    def ask_vlm(  # pylint: disable=too-many-locals
+        self,
+        media: Union[
+            "np.ndarray",


I don't think we should be double quoting the type annotations here. Elsewhere in this same file we do not.

Fixed: removed the string forward-reference quotes from np.ndarray and Image.Image in the ask_vlm signature — consistent with every other method in client.py that uses the same optional_imports (lines 717, 881, etc.).

🤖 Addressed by Claude Code

timmarkhuff

I think there are some consistency issues with this PR that I have noted in my comments. We'll want Brandon to weigh in on these too.

…pe annotations - VLMVerificationResult is now a pydantic BaseModel (matching SDK patterns and enabling proper CLI pretty-printing via model_dump_json). - Remove double-quoted forward refs ("np.ndarray", "Image.Image") in ask_vlm signature — consistent with every other method in client.py that uses the same Image/np imports. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

srnangi · 2026-06-25T20:15:19Z

I think there are some consistency issues with this PR that I have noted in my comments. We'll want Brandon to weigh in on these too.

Thanks @timmarkhuff for the comments. My claude agent was on auto-reply mode for this PR, so it replied on my behalf without me looking at it, sorry about that. I'll take a look at your comments again carefully.

buildci and others added 7 commits June 24, 2026 00:52

Automatically reformatting code

9a5e3e1

Update ask_vlm model_id docstring examples to vision-capable aliases

2b20fce

Drop the gpt-5.4 example (OpenAI models on Bedrock are text-only and cannot do image verification); use claude-sonnet-4.5 / nova-pro instead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ask_vlm: point at renamed /v1/vlm-verifications endpoint

00789e0

Endpoint renamed server-side from vlm-queries to vlm-verifications. Update the SDK POST path and test fixtures accordingly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

srnangi force-pushed the feature/vlm-verification-sdk branch from b2b0755 to 263808d Compare June 24, 2026 08:41

srnangi marked this pull request as ready for review June 24, 2026 21:30

srnangi commented Jun 24, 2026

View reviewed changes

Comment thread src/groundlight/client.py Outdated

srnangi commented Jun 24, 2026

View reviewed changes

Comment thread test/unit/test_ask_vlm.py Outdated

buildci and others added 3 commits June 24, 2026 14:53

srnangi requested a review from Copilot June 25, 2026 00:10

Copilot started reviewing on behalf of srnangi June 25, 2026 00:10 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Comment thread src/groundlight/client.py

Comment thread src/groundlight/client.py

Comment thread src/groundlight/client.py Outdated

Comment thread src/groundlight/client.py Outdated

Comment thread src/groundlight/client.py Outdated

srnangi requested review from brandon-wada, Copilot and paulina-positronix June 25, 2026 00:27

Copilot started reviewing on behalf of srnangi June 25, 2026 00:27 View session

srnangi requested review from robotrapta and sunildkumar June 25, 2026 00:27

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Comment thread src/groundlight/client.py Outdated

Comment thread test/unit/test_ask_vlm.py

buildci and others added 2 commits June 24, 2026 17:38

timmarkhuff reviewed Jun 25, 2026

View reviewed changes

timmarkhuff requested changes Jun 25, 2026

View reviewed changes

buildci and others added 2 commits June 25, 2026 12:28

Automatically reformatting code

2616477

		MAX_VLM_MEDIA_ITEMS = 8


		@dataclass

Uh oh!

Conversation

srnangi commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Usage

Changes

Bug fix included

Testing

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timmarkhuff Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timmarkhuff left a comment

Choose a reason for hiding this comment

Uh oh!

srnangi commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

srnangi commented Jun 18, 2026 •

edited

Loading

timmarkhuff Jun 25, 2026 •

edited

Loading