Skip to content

protocol: regenerate root onto LSP 3.18 with union-aware jsonrpc2 transport#59

Open
zchee wants to merge 39 commits into
mainfrom
3.18
Open

protocol: regenerate root onto LSP 3.18 with union-aware jsonrpc2 transport#59
zchee wants to merge 39 commits into
mainfrom
3.18

Conversation

@zchee

@zchee zchee commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Converge the repository root onto a single, spec-faithful, generated LSP 3.18 type set, replacing the hand-maintained ~3.15-era package that had drifted from the spec and carried bare interface{} in domain types. The public go.lsp.dev/protocol import path stays stable while the shapes now track the meta-model.

This is 19 commits; they cluster into four areas.

1. Generated LSP 3.18 root (headline — 7eac687)

  • Hand-written types replaced by the genlsp-generated 3.18 type set.
  • Union (or) types are sealed Go interfaces, decoded through a union-aware jsonrpc2.Codec installed via WithCodec.
  • Transport rewired onto the go.lsp.dev/jsonrpc2 fastest branch (pinned via a local replace while that branch settles before its main merge).
  • Server (75) and Client (21) dispatch interfaces hand-ported from the generated method registry; $/cancelRequest handled by CancelHandler. CancelParams.ID and ProgressToken are now Integer|String unions.
  • Dependencies trimmed to library essentials: log/slog replaces go.uber.org/zap; segmentio/encoding/json and go.lsp.dev/pkg/xcontext removed; go.lsp.dev/uri folded into a parity port (File/New/Parse/From, Filename; DocumentURI aliases URI).

2. The generator (internal/genlsp/*)

The metaModel.json → Go type generator that produces the root package: sealed-interface unions, go-json-experiment codecs, jsontext.Value for LSPAny. make generate is idempotent (byte-identical re-emit).

3. JSON hot-path optimization + correctness fixes

  • Zero-alloc union dispatch (e4181b7): replaced per-predicate map[string]jsontext.Value reparsing with a hand-rolled top-level byte scanner over the resident jsontext.Value, plus a fused objectHasAndKnownGuard so each arm guard scans the object once. Measured on linux/amd64 (Xeon 8481C, GOMAXPROCS=8, count=10, p=0.000): decode geomean −17.9% ns/op, −31.6% allocs/op, −33.8% B/op; best case workspace_symbol −69.5% ns / −88.0% allocs. Encode unchanged.
  • Escaped-key decode fix (00c0212): the scanner compared raw member-key bytes against Go literals without decoding JSON escapes, so e.g. {"\u006bind":"create"} misdispatched and silently dropped fields. Escape-free keys keep the byte-for-byte zero-alloc fast path; backslash-bearing keys decode via jsontext.AppendUnquote.
  • CancelHandler clobber fix (dddd2ed): the fastest-transport CancelHandler forwarded an already-cancelled reply-time context, so every successful response through NewServer/NewClient was clobbered into ErrRequestCancelled. Reply-time substitution dropped; genuine cancellations still surfaced by the pre-dispatch ctx.Err() gate.
  • Optional-enum pointer policy (eeb4cc7): per the omitzero policy, optional enums whose zero value is never a valid member drop their *T (22 fields, e.g. Diagnostic.Severity); enums that declare a zero member (CodeActionKind.Empty, TextDocumentSyncKind.None) keep the pointer.

4. Tooling / CI

  • CircleCI → GitHub Actions (6e4feac): test job with Codecov coverage + gotestsum artifacts; lint job fails on any uncommitted diff.
  • go.mod tool directive replaces the separate tools/ module (b9720cb, feedd86); golangci-lint config ported to the v2 schema (90230c4).

Verification

  • make generate is idempotent (empty git diff on re-run).
  • golangci-lint clean; full suite passes under -race.
  • Production-path integration test exercises NewServer/NewClient end to end.
  • Registry-exhaustiveness check + FuzzUnionDispatchOracle / FuzzRoundTrip (10M+ execs on amd64); the dispatch scanner is gated by a differential oracle retaining the prior map-based predicates.
  • Benchmark spine retained under .bench/ (amd64 baseline/after, benchstat, RESULTS.md, provenance).

Notes

  • The jsonrpc2 dependency is pinned via a local replace to its fastest branch; this should be repointed once that branch merges to main.
  • This is a large, intentional rewrite of the root package; the public import path is unchanged but type shapes track the 3.18 meta-model, so downstream callers using removed hand-written types will need updates.

zchee added 12 commits June 4, 2026 09:42
Consolidate CI on GitHub Actions to drop the external CircleCI
dependency. The new workflow runs a test job that produces coverage
uploaded to Codecov plus gotestsum artifacts, and a lint job that runs
make lint and fails on any uncommitted diff.
Now that go.mod tracks tool dependencies via the tool directive, drop
the tools/bin bootstrap and the per-tool install rules. Targets invoke
binaries with `go tool` (gotestsum, gofumpt, golangci-lint,
goimports-rereviser) and `make tools` installs them via `go install
tool`, removing the separate tools module wiring.
golangci-lint is now provided through the go tool directive at v2, whose
configuration schema differs from v1. Port the linter set and settings
to the version: "2" layout so `go tool golangci-lint run` parses the
config instead of erroring on the legacy keys.
Satisfy gofumpt -extra, which requires the leading ctx argument on its
own line when the call spans multiple lines. Pure formatting; no
behavior change.
Converge the repository root onto a single, spec-faithful, generated
source of truth: the genlsp-generated LSP 3.18 type set plus a transport
layer rewired onto the go.lsp.dev/jsonrpc2 fastest branch. The prior
root was a hand-maintained ~3.15-era package that drifted from the spec
and carried bare interface{} in domain types; regenerating it keeps the
public go.lsp.dev/protocol import stable while the shapes track the
meta-model.

Union ("or") types are sealed Go interfaces decoded through a union-aware
jsonrpc2.Codec installed via WithCodec. The Server (75) and Client (21)
dispatch interfaces are hand-ported from the generated method registry,
with every method routed and $/cancelRequest handled by CancelHandler.
CancelParams.ID and ProgressToken are now unions (Integer|String).

Dependencies are trimmed to the library essentials: log/slog replaces
go.uber.org/zap; segmentio/encoding/json and go.lsp.dev/pkg/xcontext are
removed; go.lsp.dev/uri is folded into a parity port (File/New/Parse/From
and Filename, with DocumentURI an alias of URI). jsonrpc2 is pinned via a
local replace while the fastest branch settles before its main merge.

Generation is idempotent (make generate), golangci-lint is clean, and the
suite passes under -race, including a production-path integration test and
a registry-exhaustiveness check.
The vendored jsonrpc2 fastest CancelHandler cancels each request's
per-request context at reply time as cleanup, then forwards that
already-cancelled context to the reply. protocol.CancelHandler's wrapper
substituted ErrRequestCancelled whenever the reply-time context was
cancelled, so on the fastest transport every successful response was
clobbered into an error -- breaking every request issued through the
public NewServer/NewClient constructors.

Drop the reply-time substitution and detach only; genuine cancellations
are surfaced by the dispatchers' pre-dispatch ctx.Err() gate and by
jsonrpc2.CancelHandler. Re-enable TestIntegrationRoundTrip, the
production-path test that documented and was blocked by this defect, so
CI exercises NewServer/NewClient end to end.

Also guard NewServer/NewClient against a nil logger via the existing nop
logger, and correct the ErrContentModified message (it was a copy of the
cancellation text).
Per the omitzero policy (json-omitempty-omitzero.md), an optional enum
whose zero value is never a valid member needs no pointer: ",omitzero" on
a plain value omits exactly the unset case. Generalize the generator's
optionalStructDropsPointer into optionalDropsPointer, extending the
zero-meaningful check from structs to enums (via enumHasZeroMember) and
named scalar aliases.

22 optional enum/alias fields convert from *T to T (e.g. Diagnostic.
Severity). Enums that declare a zero member keep their pointer so a
present zero stays distinguishable from absent: CodeActionKind defines
Empty == "" and TextDocumentSyncKind defines None == 0, so CodeAction.Kind
and friends remain *T. Raw bool/int/string primitives also keep their
pointers (absent vs explicit 0/false/"").

Regeneration stays idempotent and round-trip is preserved. Adds
TestEnumPointerPolicy locking both halves of the policy.
aliasToken's reference branch bound s only to discard it with
"_ = s"; the comma-ok check needs no value, so use the blank form.

fieldDocText and indent each built a string with "+" inside
WriteString, allocating a throwaway per line; write the parts
directly with successive WriteString calls instead.

Both edits are generator-internal and output-neutral: make generate
regenerates a byte-identical tree (empty git diff), so the emitted
protocol package is unchanged. Verified with go build, the -race
suite, and golangci-lint (0 issues).
@zchee zchee force-pushed the 3.18 branch 3 times, most recently from 50e369c to 18e35c9 Compare June 10, 2026 06:34
zchee added 5 commits June 10, 2026 22:06
Union arm discrimination decoded each candidate object into a
map[string]jsontext.Value per predicate call, so a single arm guard
(objectHasKeys && objectKeysKnown) reparsed the same object twice and a
multi-arm union many times. Replace the map helpers in union_runtime.go
with a hand-rolled zero-alloc top-level byte scanner over the already
resident jsontext.Value, and emit a fused objectHasAndKnownGuard from
the generator so each arm guard scans the object once, not twice.

Backward API compatibility is intentionally not preserved; the runtime
helpers are unexported and the change is internal to dispatch.

Measured on linux/amd64 (Xeon 8481C, taskset -c 0-7, GOMAXPROCS=8,
count=10, p=0.000): decode geomean -17.9% ns/op, -31.6% allocs/op,
-33.8% B/op. Best case workspace_symbol -69.5% ns / -88.0% allocs
(1078->129); didchange -57.3% / -82.7% (387->67). Encode is unchanged.

Correctness is gated by a differential oracle that retains the prior
map-based predicates and asserts the scanner and the fused guards agree
on every well-formed input, plus FuzzUnionDispatchOracle and
FuzzRoundTrip (10M+ execs on amd64). A fuzz-found panic on the malformed
input {" is fixed and pinned by a seed and the malformed-input test.

Adds protocol_bench_test.go and testdata/corpus/ as the measurement
spine; .bench/ records the amd64 baseline and after results.
The zero-alloc union-dispatch scanner compared the raw object-member key
bytes (and the "kind" discriminator value) against Go string literals
without decoding JSON escapes. A valid payload that spells a member name
or discriminator with a \uXXXX escape — e.g. {"\u006bind":"create",...},
where \u006b is 'k' so the key is "kind" — failed every literal compare,
so the arm was misdispatched and fields were silently dropped.

Add keyEquals and decode escapes in unquoteJSONString: an escape-free
key keeps the byte-for-byte fast path (the common case stays
zero-alloc), and a key containing a backslash is decoded via
jsontext.AppendUnquote before comparison, matching what the prior
map-based decode produced. Route all five key-compare sites and the
kind-value path through it.

The differential oracle missed this because its inputs had no escaped
keys; the map oracle (json.Unmarshal) decodes them, so feeding escaped
keys makes the oracle diverge from a raw-compare scanner. Add escaped
key/value inputs to the oracle's static set and the fuzz seeds; verified
by breaking keyEquals and watching the oracle fail on exactly those
inputs. Fast-path allocs/op unchanged; amd64 suite + race + fuzz green.

Found by the final-gate code review of the byte-scanner change.
The FuzzUnionDispatchOracle seed block was labeled "JSON-escaped
key/value seeds" but held plain keys, so the fuzzer did not start from
genuinely escaped inputs. Replace them with real \uXXXX-escaped keys so
the seed corpus matches its intent and exercises the unescape path from
the first iteration.

Non-blocking LOW finding from the final-gate review of 5379c4c.
Stabilize benchmark ordering, add dispatch-focused benchmark cases,
and lock backend semantics that optimization probes must preserve.
Generate token-level codecs for measured hot structs and replace
selected optional pointer fields with presence-preserving values.
Keep third-party JSON backends as semantic probes instead of defaults.

Rejected: Promote Segmentio, Goccy, or Sonic without semantic parity.
Rejected: Keep objectKindCase and SymbolInformationSlice encoder probes after regressions.
zchee added 5 commits June 10, 2026 22:08
The LSP meta-model embeds JSDoc-style annotations inside documentation
prose: @SInCE version stamps, {@link}/{@linkcode} cross-references, and
@deprecated tags that duplicate the structured deprecated field. These
render as non-idiomatic godoc that adds no value to Go consumers.

Sanitize the doc text at emit time in genlsp's fieldDocText, the single
choke point every generated comment flows through:

  - @SInCE lines drop entirely; trailing changelog prose is preserved.
    An inline @SInCE stops at the version digits so a sentence-ending
    period stays attached.
  - {@link Target text} unwraps to its display text, or to the bare
    target when no display text is present; {@linkcode} likewise.
  - @deprecated prose drops in favour of the structured Deprecated:
    directive. Every prose tag was already backed by that field with
    identical text, so drop is equivalent to convert with no loss and
    no duplicate directive.

The structured Since:/Deprecated: lines are intentionally retained; the
request targeted the @-prefixed prose tags. @sample is left untouched as
out of scope. The hand-written errors.go is edited directly since the
generator does not emit it.

Most of the .go churn is regenerated output (+334/-827); the durable
change is in internal/genlsp (emit.go sanitizer + TestSanitizeDoc).
The earlier sweep unwrapped {@link Target text} to its plain display
text. Bracket the target symbol instead so godoc and pkg.go.dev render
a real doc link: {@link CodeLensRequest} becomes [CodeLensRequest] and
{@link Uri.scheme scheme} becomes [Uri.scheme].

The target is the first token inside the braces; trailing display text
is discarded since Go doc links cannot carry alternate link text.
Targets that match an exported identifier in the package resolve to a
clickable link; the rest render as harmless literal bracketed text.

Change is in genlsp's unwrapLink (now bracket-based) plus the .go churn
from regeneration; go vet stays clean (no malformed-link diagnostics).
Anti-slop pass over the branch found no dead code (staticcheck U1000
clean, go vet clean); the genlsp doc-sanitizer added this session
carried three fixable linter nits:

  - inlineSinceRe: use d shorthand (regexpSimplify) and note that it
    diverges from sinceRe on purpose -- stopping at the last version
    digit is what keeps a sentence-terminating period attached.
  - fix "favour" -> "favor" misspelling in the sanitizeDoc comment.
  - generator stale-cleanup test: exec.CommandContext(t.Context())
    instead of exec.Command (noctx).

Remaining golangci findings are pre-existing branch code (hugeParam in
the build-time generator, ptrToRefParam on decoder output params,
cyclop, a generated-coupled rename) and are left untouched: applying
them is churn or ripples into generated output without a behavior or
perf justification.
NewNull and NewNullable had zero references in generated code,
hand-written code, and tests. Nullable values are obtained by
decoding the wire (UnmarshalJSON) or as the absent zero value;
nothing constructs the null or value states programmatically.

Removing them narrows the public API to the decode-oriented use
the package actually exercises. Consumers that later need to build
a Nullable to send can reintroduce a constructor deliberately. The
unexported set/null/value fields and the remaining IsZero/IsNull/
Get/MarshalJSON/UnmarshalJSON surface stay; staticcheck and go vet
remain clean and the generated files are unaffected.
Drop jsontext duplicate-name tracking and UTF-8 validation on the wire
path (LSP peers are trusted: duplicates decode last-wins, invalid UTF-8
mangles to U+FFFD) and convert Optional, Nullable, and DiagnosticTags
to MarshalJSONTo/UnmarshalJSONFrom so the fresh-decoder-per-field cost
disappears. Settle the pre-existing lint debt that surfaced once the
Makefile lint rule pointed at the renamed .golangci.yaml again.

amd64 (n=10): geomean -4.8% ns/op; token-codec encode categories -25%
(completion_list -26.0%, publish_diagnostics -27.3%, workspace_symbol
-25.1%); decode hot categories -5..-8%; allocs byte-identical across
all 39 benchmarks. semantic_tokens and initialize_request regress ~3%
(alignment-class, allocs unchanged); both paths are replaced by the
M2/M3 byte-level milestones, so the wireOptions half is kept.

Rejected: reverting the wireOptions half over those ~3% regressions.
@zchee zchee force-pushed the 3.18 branch 2 times, most recently from e653075 to 5a26dbd Compare June 10, 2026 13:49
zchee added 2 commits June 10, 2026 22:50
Generate byte-level decoders and direct append encoders for the LSP
benchmark spine so hot payloads avoid reflection/jsontext overhead while
retaining generated ownership and raw-value validation contracts.

Benchmark current:
  same host, CPU set, Go version, benchmark names, and count after this
  generated-codec implementation plus default.pgo.

Benchstat excerpt:
  sec/op geomean: 17.85 us -> 7.208 us (-59.63%)
  B/s geomean: 126.3 MiB/s -> 312.8 MiB/s (+147.70%)
  B/op geomean: 4.298 KiB -> 4.875 KiB (+13.42%)
  allocs/op geomean: 14.50 -> 5.726 (-60.51%)

Validation:
  go test -count=1 ./...
  git diff --cached --check

Tradeoff:
  Decode/initialize_request remains +14.26% vs baseline.
  G010 raw LSPAny validation costs +2.20% vs M6/default.pgo,
  but preserves Marshal's invalid JSON error contract.
Emit protocol generator outputs with .gen.go names so generated
sources are easy to identify. Stale cleanup continues to remove old
marker-bearing files safely.

Generated protocol contents are byte-identical across the rename. Only
the generator naming policy and filename-sensitive tests change.
zchee added 7 commits June 11, 2026 01:01
Refactor append encoders to satisfy complexity checks while
preserving canonical JSON output.

Document intentional decode out-params where boxed union fast paths
assign interface slots in place.
Broaden generated MarshalJSONTo and UnmarshalJSONFrom coverage from the
small hot-path set to all eligible generated LSP structures, synthetic
literal and and-composed structs, and generated named slices.

Share embedded JSON-field dominance between the encoder and byte decoder
with one flattenJSONFields helper. This prevents generated paths from
emitting different promoted fields, including CreateFile/DeleteFile/
RenameFile kind members.

Benchmark command:
  go test -run '^$' \
    -bench 'Benchmark(Decode|Encode)$' -benchmem -count=5 .

Environment:
  go1.26.4 darwin/arm64, Apple M3 Max
  base/current HEAD: 8ea4699

Benchstat excerpt:
  Decode/initialize_request  7.705us -> 2.514us  -67.37%
  Decode/initialize_result   4.250us -> 2.130us  -49.88%
  Encode/initialize_request  3.542us -> 2.905us  -17.98%
  Encode/initialize_result   1.912us -> 1.512us  -20.92%
  sec/op geomean             4.389us -> 3.811us  -13.16%
  B/s geomean                513.7Mi -> 591.5Mi  +15.15%
  B/op geomean               4.875Ki -> 4.956Ki  +1.65%
  allocs/op geomean          5.726 -> 6.074      +6.07%

Allocation caveat:
  Encode/initialize_request allocs/op increased 5 -> 18 while
  latency improved 3.542us -> 2.905us. Encode/initialize_result
  allocs/op increased 11 -> 13 while latency improved
  1.912us -> 1.512us.

Verification:
  make generate idempotence: hash_equal=true
  git diff --check: passed
  go test -count=1 -timeout=3m ./...: passed
  gopls check changed files: passed
  go vet ./...: passed
  FuzzByteDecodeOracle -fuzztime=8s: passed, 1087885 execs
  post-fix review: APPROVE, architect CLEAR

Constraint: generator and generated outputs are committed together.
Constraint: benchmarks describe the integrated generated tree.
Rejected: do not split generated output into a separate commit.
Rejected: a split commit would not match the measured state.
Pass renderedField by pointer while preserving value storage in the
flattened JSON-field list, avoiding the gocritic hugeParam warning.
Preallocate generated zero-helper checks from the known embedded fields
and local fields to satisfy prealloc without changing generated output.

Verification:
  make lint: passed, 0 issues
  go test -count=1 ./internal/genlsp .: passed
  make generate: hash_equal=true
  git diff --check: passed
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 68.42690% with 568 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.6%. Comparing base (3c0d433) to head (908373c).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
codec_helpers.go 61.9% 172 Missing ⚠️
append_encode.go 75.5% 125 Missing ⚠️
client.go 7.7% 107 Missing ⚠️
decode_runtime.go 85.5% 74 Missing ⚠️
decode_boxed_runtime.go 47.1% 65 Missing ⚠️
context.go 0.0% 8 Missing ⚠️
append_encode_runtime.go 86.3% 6 Missing ⚠️
clone.go 50.0% 6 Missing ⚠️
codec.go 76.1% 5 Missing ⚠️
@@           Coverage Diff            @@
##            main     #59      +/-   ##
========================================
+ Coverage   16.1%   17.6%    +1.4%     
========================================
  Files         17      32      +15     
  Lines       1760   49163   +47403     
========================================
+ Hits         285    8677    +8392     
- Misses      1474   40486   +39012     
+ Partials       1       0       -1     
Flag Coverage Δ
Linux-ARM64 17.6% <68.4%> (?)
Linux-X64 20.2% <73.2%> (?)
Files with missing lines Coverage Δ
decoders.gen.go 10.9% <ø> (ø)
encoders.gen.go 9.7% <ø> (ø)
handler.go 53.1% <ø> (+53.1%) ⬆️
internal/genlsp/cmd/genlsp/main.go 0.0% <ø> (ø)
internal/genlsp/emit.go 90.9% <ø> (ø)
internal/genlsp/emit_bytedec.go 96.9% <ø> (ø)
internal/genlsp/generate.go 72.2% <ø> (ø)
internal/genlsp/load.go 66.6% <ø> (ø)
internal/genlsp/model.go 55.2% <ø> (ø)
internal/genlsp/name.go 91.1% <ø> (ø)
... and 22 more

@github-advanced-security

Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

Generated URI and DocumentURI fields now use the shared URI type so
protocol values follow the same parsing and filename behavior as the
standalone URI package. Keep protocol.URI as the local union bridge
where marker methods require a package-local receiver.

Validation:
- make generate
- go test -v -race -count=1 -shuffle=on ./...
Generated protocol URI fields now use go.lsp.dev/uri for shared parsing
and canonicalization behavior. Keep protocol.URI only as the local
sealed-union bridge required for marker methods.

Constraint: go.lsp.dev/uri@18c7d8285663
zchee added 2 commits June 11, 2026 14:34
Emit static JSON member names through one Go literal helper so the
raw-literal output requested for generated encoders and decoders
survives make generate. Use strconv.CanBackquote to keep future
non-backquotable names valid.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants