Conversation
Pure-Go ODBC bindings for the IBM DB2 CLI shared library using purego. Registers all required SQL functions (SQLConnect, SQLExecDirect, SQLFetch, SQLGetData, etc.) at runtime without CGo. Supports both macOS (dylib) and Linux (so) library paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Defines ChangeEvent, OpType, CSN (commit sequence number), and Version. CSN is a variable-length big-endian byte sequence (10 bytes on DB2 ≤11.x, 16 bytes on DB2 12.1+). Provides SQL hex literal encoding, comparison, and null-CSN sentinel for watermark tracking. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reads all rows from configured tables inside a single read-only DB2 RR (Repeatable Read) transaction, ensuring cross-table consistency. Supports configurable batch size, parallelism, isolation level, and table filter functions. Captures the starting CSN from ASNCDC.IBMSNAP_REGISTER before the first row so streaming can resume from the exact snapshot point. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Commits Review LGTM |
|
I see the integration tests are currently being skipped with the following reason: I haven't looked into it yet so I'm not sure if possible or not, but it'd be good to get these running in the CI pipeline. |
|
Commits
Review Large, well-structured enterprise connector with thorough internal validation (identifier validation against SQL injection, sanitized error logging, mutex/atomic discipline, idempotent close path), unit + integration test coverage, and proper registration ( LGTM |
…ering Polls ASNCDC change tables for rows beyond the current watermark CSN. Merges events from all registered tables, sorts by (CSN, IBMSNAP_INTENTSEQ) for total order, and pairs D+I opcode pairs into UPDATE events. Auto-detects IBMSNAP_COMMITSEQ byte length (10 for DB2 ≤11.x, 16 for DB2 12.1+). Supports configurable poll batch size, backoff interval, and table filter. Column metadata is cached per change table and invalidated on schema changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the Debezium/Netflix DBLog open-window algorithm for exactly-once incremental snapshotting alongside a live CDC stream. Each chunk is bracketed by (lowCSN, highCSN] watermarks; any CDC event arriving within the window evicts the corresponding snapshot row, guaranteeing each row is delivered exactly once as op="r" or superseded by its CDC event. Includes chunk-level checkpoint persistence so a connector restart resumes from the last fully-emitted primary key. Supports NULL PK fast-fail, per-chunk progress logging, and a Logger interface for observability. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements driver.Driver, driver.Conn, driver.Stmt, and driver.Rows on top of the purego db2cli bindings, making the IBM DB2 CLI accessible via the standard database/sql interface. Includes DSN credential sanitization in error messages to prevent password leaks in logs. Supports context-aware Prepare/Exec/Query paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Commits Review The connector logic has clearly received careful attention to concurrency/lifecycle (shutdown signaller reset on reconnect, single Two issues raised inline:
Neither blocks merge; both are doc/comment fixes. |
Full-featured IBM DB2 CDC connector built on SQL Replication (ASNCDC). Supports initial full-table snapshot, live CDC streaming, and incremental on-demand snapshots via a signal table (compatible with Debezium's execute-snapshot protocol). Key capabilities: - Snapshot modes: initial, always, never, no_data - Configurable snapshot isolation (DB2 RR or CS), batch size, parallelism - Per-chunk checkpoint persistence for crash-safe incremental snapshot resume - Signal table: execute-snapshot, stop-snapshot, pause/resume-snapshot - Table regex include/exclude filters - Debezium-envelope output (op c/u/d/r, before/after, source metadata) - Schema-change events with column-metadata cache invalidation - Heartbeat events for consumer lag detection - checkpoint_cache support (Redpanda Connect cache resource or DB2 table) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Shared test helpers (db2test package) manage the DB2 Docker container lifecycle for integration tests, including ASNCDC pre-warm and per-test table cleanup. TestMain in the integration suite shares a single container across all tests to avoid 8-minute Docker init overhead. The macOS runner script builds a native test binary that uses the local IBM clidriver dylib instead of a Linux container, enabling fast local iteration on Apple Silicon. Integration tests cover: snapshot+streaming, multi-chunk incremental snapshot, concurrent INSERT/UPDATE/DELETE during snapshot, and resume-after-crash checkpoint recovery. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds db2_cdc to the enterprise component bundle and registers it in the component info CSV (enterprise, not cloud-safe). Wires into the integration test packages list. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Generated component documentation covering all configuration fields, snapshot modes, signal table protocol, checkpoint cache options, and example configurations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| { | ||
| name: "empty tables rejected", | ||
| configYAML: ` | ||
| dsn: "DATABASE=SAMPLE;HOSTNAME=db2host;PORT=50000;PROTOCOL=TCPIP;UID=db2inst1;PWD=secret" | ||
| schema: "DB2ADMIN" | ||
| tables: [] | ||
| `, | ||
| errContains: "either tables or table_include_regex must be specified", | ||
| }, |
There was a problem hiding this comment.
These test cases (and "neither tables nor include_regex rejected" at lines 355-362) expect newDB2CDCInput to return an error containing "either tables or table_include_regex must be specified", but newDB2CDCInput in input_db2_cdc.go does not perform that validation — an empty tables list (with no table_include_regex) passes through silently. Both test cases will fail with expected an error, got nil.
This also contradicts the tables field description which explicitly says: "When empty, all CDC-registered tables in the schema are discovered dynamically from ASNCDC.IBMSNAP_REGISTER" — so empty tables is an intentional auto-discovery feature.
Either remove these two test cases (matching the documented behavior) or add the missing validation in newDB2CDCInput and update the field description accordingly.
|
Commits Review
|
Summary
A new enterprise
db2_cdcinput connector for IBM DB2 LUW. Uses CGO-free DB2 CLI bindings viagithub.com/ebitengine/purego— the IBM DB2 shared library (libdb2.so.1/libdb2.dylib/db2cli.dll) is loaded at runtime viadlopen/LoadLibrary, so no C toolchain or IBM SDK is needed at build time.What it does
op,before,after,source,ts_ms) for drop-in compatibility with existing Debezium DB2 connector consumersArchitecture (layered, 6 commits)
Output format
Each message is a Debezium-compatible JSON envelope:
{ "op": "u", "before": {"EMP_ID": 7, "SALARY": 90000}, "after": {"EMP_ID": 7, "SALARY": 95000}, "source": { "connector": "db2", "name": "db2_cdc", "schema": "DB2INST1", "table": "EMPLOYEES", "snapshot": "false", "commit_lsn": "CSN:000000000000C350", "ts_ms": 1747526400000 }, "ts_ms": 1747526400000 }opvalues:c(insert),u(update withbeforepopulated),d(delete),r(snapshot read),heartbeat,schema_change.Message metadata keys:
db2_op,db2_table,db2_schema,db2_csn,db2_intent_seq,db2_snapshot,db2_connector.Key features
Snapshot modes
initialneverwhen_neededinitial_onlyno_dataIncremental snapshot (Debezium parity)
Implements the Netflix DBLog / Debezium open-window deduplication algorithm for DB2:
lowCSN = MAX(SYNCHPOINT)before chunk SELECTSELECT * FROM table WHERE pk > lastKey ORDER BY pk FETCH FIRST N ROWShighCSN = MAX(SYNCHPOINT)after chunk SELECT(lowCSN, highCSN]evict their rows from the windowop=r; checkpoint savesLastEmittedPKafter each chunkCorrectness: composite
(CSN, IntentSeq)watermarkUnlike Debezium DB2, polling uses
(IBMSNAP_COMMITSEQ, IBMSNAP_INTENTSEQ)as a composite watermark. This prevents event loss when a single transaction writes more rows thanpoll_batch_size— a gap Debezium does not close.Signal table
Supports Debezium-compatible execute-snapshot signals plus extended lifecycle controls:
execute-snapshotstop-snapshotpause-snapshotresume-snapshotMulti-schema streaming
One connector polls change tables across all schemas in a single query; changeTables keyed as
SCHEMA.TABLE.Observability
db2_cdc_lag_msgauge: milliseconds behind the latest DB2 commit timestampincremental snapshot chunk: table=… chunkRows=… evicted=… csnLow=… csnHigh=…schema:table:csn:intentSeqDebezium gap matrix
(CSN, IntentSeq)paginationschemas: [...])initial_onlymodebeforepopulatedFiles
internal/impl/db2/db2cli/internal/impl/db2/driver.gointernal/impl/db2/replication/types.gointernal/impl/db2/replication/stream.gointernal/impl/db2/replication/snapshot.gointernal/impl/db2/replication/incremental.gointernal/impl/db2/input_db2_cdc.gointernal/impl/db2/db2test/db2test.gointernal/impl/db2/integration_test.goscripts/run-db2-macos-integration-tests.shpublic/components/db2/package.godocs/modules/components/pages/inputs/db2_cdc.adocTest plan
Unit tests (always run, no Docker required):
Integration tests — macOS (IBM clidriver, ~8 min first run, ~30s subsequent):
Integration tests — Linux:
Requires Docker and
icr.io/db2_community/db2:latest(IBM license accepted viaLICENSE=accept). Container runs with--privilegedfor DB2's IPC/shared-memory subsystem.Integration test coverage:
TestIntegrationDB2CDCSnapshotAndStreaming— full CDC round-trip (insert/update/delete)TestIntegrationDB2CDCMultiSchema— two schemas in one connector instanceTestIntegrationDB2CDCInitialSnapshot— keyset-paginated full snapshotTestIntegrationDB2CDCIncrementalSnapshotMultiChunk— 20 rows, chunk=4, verifies no duplicatesTestIntegrationDB2CDCIncrementalSnapshotWithConcurrentInsert— new rows outside window pass throughTestIntegrationDB2CDCIncrementalSnapshotWithConcurrentUpdate— CDC dedup evicts in-window updatesTestIntegrationDB2CDCIncrementalSnapshotWithConcurrentDelete— CDC dedup evicts in-window deletesTestIntegrationDB2CDCIncrementalSnapshotResumeAfterRestart— crash at chunk 2, resume, no duplicates