Skip to content

apstndb/execspansql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

100 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

execspansql

Yet another gcloud spanner databases execute-sql replacement for better composability.

Features

  • (Almost) Compatible interface with gcloud spanner databases execute-sql
    • Incompatibilities
      • Doesn't support gcloud config
  • Can receive results of some queries which gcloud can't execute
    • Query with query parameters
    • Large result sets over 10MB
  • Embedded jq
  • Configurable gRPC logging (off, metadata, payload with payload caveat)
  • (Experimental) CSV output
  • (Experimental) Check whether the query can be executed as a partition query or not.

This tool is still pre-release quality and none of guarantees.

Usage

You can use released binaries.

Usage: execspansql --sql=STRING --sql-file=STRING --project=STRING --instance=STRING <database> [flags]

Yet another gcloud spanner databases execute-sql replacement

Arguments:
  <database>    ID of the database.

Flags:
  -h, --help                       Show context-sensitive help.
      --sql=STRING                 SQL query text; exclusive with --sql-file.
      --sql-file=STRING            File name contains SQL query; exclusive with
                                   --sql
  -p, --project=STRING             ID of the project ($CLOUDSDK_CORE_PROJECT).
  -i, --instance=STRING            ID of the instance
                                   ($CLOUDSDK_SPANNER_INSTANCE).
      --query-mode="NORMAL"        Query mode.
      --format="json"              Output format.
      --redact-rows                Redact result rows from output
  -c, --compact-output             Compact JSON output (--compact-output of jq)
      --filter=STRING              jq filter
  -r, --raw-output                 (--raw-output of jq)
      --filter-file=STRING         (--from-file of jq)
      --jq-input-mode="eager"      How query rows are passed to jq (json/yaml
                                   only): eager (full ResultSet), lazy (JQValue
                                   root).
      --param=PARAM,...            [name]=[type or literal]; legacy [name]:[...]
                                   also accepted
      --param-file=STRING          YAML or JSON file of query parameters (name
                                   to type/literal string)
      --log-grpc="off"             gRPC logging mode: off, metadata, or payload
                                   (payload may include request and response
                                   payloads in logs)
      --experimental-trace-project=STRING
                                   Export traces to Cloud Trace in the given
                                   project.
      --experimental-trace-stdout
                                   Export spans to stderr as pretty JSON (local
                                   debugging).
      --experimental-trace-otlp    Export spans via OTLP/gRPC to a local
                                   OpenTelemetry collector.
      --experimental-trace-otlp-endpoint="localhost:4317"
                                   OTLP/gRPC endpoint used with
                                   --experimental-trace-otlp.
      --enable-partitioned-dml     Execute DML statement using Partitioned DML
      --timeout=10m                Maximum time to wait for the SQL query to
                                   complete
      --try-partition-query        (Experimental) Check whether the query can be
                                   executed as partition query or not

Timestamp Bound
  --strong                   Perform a strong query.
  --read-timestamp=STRING    Perform a query at the given timestamp.
                             (micro-seconds precision)

Local build requires Go 1.25.

$ go install github.com/apstndb/execspansql@latest

You can use container image in GitHub Container Registry.

$ docker run --rm -t -v "${HOME}/.config/gcloud/application_default_credentials.json:/home/nonroot/.config/gcloud/application_default_credentials.json:ro" \
    ghcr.io/apstndb/execspansql/execspansql:latest -p ${SPANNER_PROJECT} -i ${SPANNER_INSTANCE} ${SPANNER_DATABASE} --sql 'SELECT 1'
# or use specific version
$ docker run --rm -t -v "${HOME}/.config/gcloud/application_default_credentials.json:/home/nonroot/.config/gcloud/application_default_credentials.json:ro" \
    ghcr.io/apstndb/execspansql/execspansql:vX.Y.Z -p ${SPANNER_PROJECT} -i ${SPANNER_INSTANCE} ${SPANNER_DATABASE} --sql 'SELECT 1'

Notable features

There are examples omitting some required options.

Parameter support

Many Cloud Spanner clients don't support parameter. Without modifications, query which have parameters are impossible to execute and query whose parameters' types are STRUCT or ARRAY are impossible to show query plans.

execspansql supports query parameters via repeated --param name=value flags (legacy name:value is also accepted) or a --param-file (YAML or JSON). When both are given, --param overrides entries from the file.

PLAN with complex typed parameters

You can use type syntax to plan a query.

$ execspansql ${DATABASE_ID} --query-mode=PLAN \
              --sql='SELECT * FROM UNNEST(@arr) WITH OFFSET' \
              --param='arr=ARRAY<STRUCT<STRING>>'
$ execspansql ${DATABASE_ID} --query-mode=PLAN \
              --sql='SELECT @str.*' \
              --param='str=STRUCT<FirstName STRING, LastName STRING>'

Parameters from a file

--param-file accepts YAML or JSON (.json extension selects JSON; otherwise YAML). Values are the same type/literal strings used with --param.

# params.yaml
arr: ARRAY<STRING>
names: '[STRUCT<FirstName STRING, LastName STRING>("John", "Doe"), ("Mary", "Sue")]'
$ execspansql ${DATABASE_ID} --query-mode=PROFILE \
              --sql='SELECT * FROM UNNEST(@arr) WITH OFFSET' \
              --param-file=params.yaml

PROFILE with complex typed parameterized values

You can use subset of literal syntax to execute a query.

Note: It only emulates literals and doesn't emulate coercion.

$ execspansql ${DATABASE_ID} --query-mode=PROFILE \
              --sql='SELECT * FROM UNNEST(@arr) WITH OFFSET' \
              --param='arr=[STRUCT<pk INT64, col STRING>(1, "foo"), (42, "foobar")]'
$ execspansql ${DATABASE_ID} --query-mode=PROFILE \
              --sql='SELECT * FROM Singers WHERE STRUCT<FirstName STRING, LastName STRING>(FirstName, LastName) IN UNNEST(@names)' \
              --param='names=[STRUCT<FirstName STRING, LastName STRING>("John", "Doe"), ("Mary", "Sue")]'

Embedded jq

execspansql can process output using embedded wader/gojq (jq-compatible; includes JQValue for lazy inputs) using --filter flag.

--jq-input-mode controls how results are passed to jq (json/yaml only):

Mode Input Typical filter
eager (default) Full ResultSet object ., .stats.queryPlan
lazy JQValue root (metadata / rows Iter / stats) .rows[], .stats.queryPlan

In lazy mode, metadata is populated after the first row is read from Spanner (or after a zero-row result). Prefer .rows[] to stream rows. Bare .rows is a lazy iterator: reuse it in one object literal (for example {a: .rows, b: .rows}) may not duplicate rows because jq can evaluate the subexpression once; use {a: [.rows[]], b: [.rows[]]} when you need two row arrays. After .stats drains the iterator, captured .rows values replay from materialized rows.

--jq-input-mode=lazy emits rows incrementally, but rows are cached internally after first materialization and reused, so it is not a strict constant-memory mode for large result sets.

Output expands top-level gojq.Iter to one JSON/YAML document per row (JSONL-style). Nested Iter values inside objects are expanded to arrays on encode.

Example: Extract QueryPlan

[rendertree] command takes QueryPlan, and it can be extracted by jq filter.

$ execspansql ${DATABASE_ID} --query-mode=PROFILE --format=json \
              --sql='SELECT * FROM Singers@{FORCE_INDEX=SingersByFirstLastName}' \
              --filter='.stats.queryPlan' \
  | rendertree --mode=PROFILE 
+-----+----------------------------------------------------------------------------+------+-------+------------+
| ID  | Operator                                                                   | Rows | Exec. | Latency    |
+-----+----------------------------------------------------------------------------+------+-------+------------+
|   0 | Distributed Union                                                          |    5 |     1 | 0.47 msecs |
|  *1 | +- Distributed Cross Apply                                                 |    5 |     1 | 0.44 msecs |
|   2 |    +- Create Batch                                                         |      |       |            |
|   3 |    |  +- Local Distributed Union                                           |    5 |     1 | 0.21 msecs |
|   4 |    |     +- Compute Struct                                                 |    5 |     1 | 0.19 msecs |
|   5 |    |        +- Index Scan (Full scan: true, Index: SingersByFirstLastName) |    5 |     1 | 0.18 msecs |
|  13 |    +- [Map] Serialize Result                                               |    5 |     1 | 0.13 msecs |
|  14 |       +- Cross Apply                                                       |    5 |     1 | 0.12 msecs |
|  15 |          +- Batch Scan (Batch: $v2)                                        |    5 |     1 | 0.01 msecs |
|  19 |          +- [Map] Local Distributed Union                                  |    5 |     5 |  0.1 msecs |
| *20 |             +- FilterScan                                                  |    5 |     5 | 0.09 msecs |
|  21 |                +- Table Scan (Table: Singers)                              |    5 |     5 | 0.08 msecs |
+-----+----------------------------------------------------------------------------+------+-------+------------+
Predicates(identified by ID):
  1: Split Range: ($SingerId' = $SingerId)
 20: Seek Condition: ($SingerId' = $batched_SingerId)

Example: Complex jq filter

plan.jq render query plan tree in pure jq.

$ execspansql ${DATABASE_ID} --query-mode=PROFILE --format=json \
  --sql='SELECT * FROM Singers@{FORCE_INDEX=SingersByFirstLastName}' \
  --filter-file=examples/plan.jq --raw-output
 *0 Distributed Union
 *1   Distributed Cross Apply
  2     Create Batch
  3       Local Distributed Union
  4         Compute Struct
  5           Index Scan (Full scan: true, Index: SingersByFirstLastName)
 13     [Map] Serialize Result
 14       Cross Apply
 15         Batch Scan (Batch: $v2)
 19         [Map] Local Distributed Union
*20           FilterScan
 21             Table Scan (Table: Singers)
Predicates:
  0: Split Range: true
  1: Split Range: ($SingerId' = $SingerId)
 20: Seek Condition: ($SingerId' = $batched_SingerId)

Examples from document

Querying data with a STRUCT object

$ execspansql ${DATABASE_ID} --query-mode=NORMAL \
    --sql='SELECT SingerId FROM SINGERS
           WHERE (FirstName, LastName) = @singerinfo' \
    --param='singerinfo=STRUCT<FirstName STRING, LastName STRING>("Elena", "Campbell")'

Querying data with a STRUCT object)

$ execspansql ${DATABASE_ID} --query-mode=NORMAL \
    --sql='SELECT SingerId FROM SINGERS
           WHERE STRUCT<FirstName STRING, LastName STRING>(FirstName, LastName)
           IN UNNEST(@names)' \
    --param='names=[STRUCT<FirstName STRING, LastName STRING>("Elena", "Campbell"), ("Hannah", "Harris")]'

Accessing STRUCT field values

$ execspansql ${DATABASE_ID} --query-mode=NORMAL \
    --sql='SELECT SingerId
           FROM Singers
           WHERE FirstName = @name.FirstName' \
    --param='name=STRUCT<FirstName STRING, LastName STRING>("Elena", "Campbell")'
$ execspansql ${DATABASE_ID} --query-mode=NORMAL \
    --sql='SELECT SingerId, @songinfo.SongName
           FROM Singers
           WHERE STRUCT<FirstName STRING, LastName STRING>(FirstName, LastName) IN UNNEST(@songinfo.ArtistNames)' \
     --param='songinfo=STRUCT<SongName STRING, ArtistNames ARRAY<STRUCT<FirstName STRING, LastName STRING>>>("Imagination", [("Elena", "Campbell"), ("Hannah", "Harris")])'

(Experimental) OpenTelemetry tracing

Export Spanner client spans and PROFILE query plans via OpenTelemetry (spannerotel + the Spanner client's native OTel instrumentation).

Plan node spans (spannerotel/plantotrace) appear only with --query-mode=PROFILE (or equivalent stats that include a query plan). NORMAL mode still records Spanner client spans, but not per-plan-node children.

Exactly one trace export flag may be set: --experimental-trace-project, --experimental-trace-stdout, or --experimental-trace-otlp.

Cloud Trace

$ execspansql $DATABASE_ID --sql "SELECT * FROM Singers@{FORCE_INDEX=SingersByFirstLastName}" \
    --query-mode=PROFILE --experimental-trace-project=$PROJECT_ID

Local collector (OTLP/gRPC)

Send spans to a local OpenTelemetry Collector, Jaeger, Grafana Tempo, etc.:

# Example: collector listening on localhost:4317
$ execspansql $DATABASE_ID --query-mode=PROFILE --sql 'SELECT 1' --experimental-trace-otlp

# Custom endpoint
$ execspansql $DATABASE_ID --query-mode=PROFILE --sql 'SELECT 1' \
    --experimental-trace-otlp --experimental-trace-otlp-endpoint=127.0.0.1:4317

stderr JSON (no collector)

Pretty-printed span JSON to stderr:

$ execspansql $DATABASE_ID --query-mode=PROFILE --sql 'SELECT 1' --experimental-trace-stdout

Note: --experimental-trace-stdout writes to stderr, not stdout. Note: --log-grpc=payload can log request and response payloads (including bound parameters and row values) and should only be used in trusted environments.

trace.png

(Experimental) --try-partition-query

Check whether the query can be executed as partition query or not.

By default this checks against the current schema using a strong read. Pass --read-timestamp to check partitionability against a historical schema within the database version retention window.

--try-partition-query is a query-routing check and rejects jq-related options (--filter, --filter-file, --raw-output, --compact-output) and --jq-input-mode=lazy.

$ execspansql ${DATABASE_ID} --sql='SELECT * FROM Singers JOIN Albums USING(SingerId)' --try-partition-query
success

$ execspansql ${DATABASE_ID} --sql='SELECT * FROM Singers JOIN Concerts USING(SingerId)' --try-partition-query
2023/08/31 16:43:33 rpc error: code = InvalidArgument desc = Query is not root partitionable since it does not have a DistributedUnion at the root. Please run EXPLAIN for query plan details.
exit status 1

Limitations

  • --format=experimental_csv does not run the jq pipeline; --filter, --filter-file, --raw-output, --compact-output, and --jq-input-mode=lazy are rejected.
  • --raw-output and --compact-output are supported only when --format=json.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors