10 Engineering Decisions Behind Building a Self-Hosted Crypto Payment Gateway
Every engineering decision in XPay Labs was a tradeoff. Here are the 10 choices that most affected our architecture — why we made them, what we ruled out, and how they impact your deployment. These lessons apply whether you use XPay Labs or build your own payment infrastructure.
Distroless Java Over .NET or Go
Java gets a bad reputation for being heavy, but modern Java (21+) with Project Loom, virtual threads, and the Distroless base image flips the script. Our production image is smaller than most Alpine-based Go binaries because we strip everything — no package manager, no shell, no curl. Just the JVM, our bytecode, and the dependencies. The virtual thread model handles 10,000+ concurrent invoice scans without breaking a sweat, and the JIT compiler optimizes hot paths (like transaction signature verification) to near-native speed after warmup. We lose about 100ms on cold start compared to a native binary, but the gateway runs 24/7, so warmup happens once.
BIP-39 HD Wallets Over Per-Merchant Address Pools
BitPay assigns each merchant a static wallet address. Customers send payments there, and BitPay matches incoming transactions to invoices internally. This creates privacy leaks (everyone sees the same address on-chain) and reconciliation headaches. We chose BIP-39 hierarchical deterministic (HD) wallet derivation: each invoice gets a unique address derived from the master seed + invoice ID. The gateway scans all derived addresses on-chain. No address pool to manage, no private key database to leak, and each customer gets a fresh address they alone know. The derivation is pure computation — zero storage cost per invoice.
HMAC-SHA256 Webhooks Over IPN Callbacks
IPN works, but it is a dated pattern: BitPay sends a POST to your callback URL with a transaction ID, and you have to call back to BitPay to verify it. This adds latency and a依赖 on BitPay availability. Our approach: every webhook payload carries an HMAC-SHA256 signature in the X-Signature header. Your server recomputes the HMAC with your secret and compares. If it matches, the payload is authentic — no round-trip verification needed. We also include a Unix timestamp and reject signatures older than 5 minutes (prevents replay attacks). The webhook body contains the full payment context (amount, chain, tx_id, confirmations), so your server can process it immediately without additional API calls.
Event-Driven Block Scanning Over Polling
Polling-based architectures query the blockchain every N seconds. If N=5, detection latency averages 2.5s. If N=30, it averages 15s. We went event-driven: for EVM chains, we subscribe to newHeads via WebSocket and filter logs for our tracked addresses. For TRON, we use the gRPC event streaming endpoint. For SUI, we subscribe to transaction effects. New blocks trigger immediate scanning for relevant transactions. The result: sub-second payment detection on all chains with less CPU usage than polling, because the gateway only wakes when there is actually new data. On a Hetzner CX22, scanning TRON + ETH + BNB + SUI simultaneously uses under 1% CPU at idle.
Multi-Chain Normalization Layer
TRON uses TRC-20 events with 6-decimal precision and 19-block finality. EVM chains use ERC-20 Transfer logs with 18-decimal precision and 12-block finality. SUI uses Move objects with immediate finality. Without normalization, integrations would need chain-specific code. Our abstraction layer maps all three to a unified PaymentEvent: {amount, currency, chain, tx_id, from, to, confirmations, status}. The integration code on your server handles one event structure regardless of which chain the customer used. New chains are added by implementing a ChainScanner interface — typically 200-400 lines of code per chain.
In-Memory Invoice Index Over Database
A common approach is "store invoice in Postgres, query by address." For a payment gateway scanning thousands of addresses per block, this becomes a bottleneck fast. We keep all active (unexpired, unpaid) invoices in an in-memory B-tree indexed by deposit address. The block scanner does a single in-memory lookup per transaction — O(log n), measured at under 50 microseconds even at 10,000 invoices. Persistence to SQLite happens asynchronously for crash recovery. When the node restarts, it loads pending invoices from SQLite back into memory. This design also means the entire active state fits in ~2 MB of RAM for 10,000 invoices.
RESTful API With Stripe Conventions
Every payment processor reinvents API design. We chose to emulate Stripe because it is the API every developer already knows. POST /v1/payments creates an invoice. GET /v1/payments/pay_xxx retrieves it. Idempotency-Key headers prevent double charges. Expandable response objects reduce N+1 queries. Cursor-based pagination scales without offset drift. Error responses follow RFC 7807 Problem Details. The result: the average integration takes 45 minutes for a developer who has used Stripe before. Compared to BitPay's IPN flow or BTCPay Server's Greenfield API, the learning curve is dramatically shorter.
Project Loom Virtual Threads Over Reactive Programming
Reactive programming (WebFlux, RxJava) is the traditional Java answer to high-concurrency I/O. It works, but it splits your codebase into two worlds: reactive and imperative. Virtual threads eliminate this split. Each incoming webhook, block scan, or API request runs on its own virtual thread. When the thread calls RPC.getBlock() or db.save(), the JVM parks it and resumes another — automatically. The code reads as straight-line synchronous Java. No CompletableFuture chains, no .subscribe() callbacks, no debugging stack traces that span 20 lambdas. For a payment gateway where correctness is critical, readable code is a security feature.
Configurable Confirmation Strategy Per Chain
A confirmed Bitcoin transaction might still be reversed by a deep reorg. TRON and EVM chains rarely reorg beyond a few blocks. SUI has instant finality. We looked at historical reorg data for each chain and set conservative defaults: 19 blocks for TRON (~57 seconds), 12 for EVM chains (~2.5 minutes on ETH, ~24 seconds on Polygon), and immediate for SUI. Operators can override these — lower for faster detection at the cost of reorg risk, higher for bulletproof finality. We also expose a reorg-handling mode: on detecting a block reorganization, the gateway re-scans affected blocks and fires a payment.reorged event so your server can re-verify affected invoices.
Single Binary Deployment Over Microservices
Microservices are great for organizations with dedicated DevOps teams. For a self-hosted gateway meant to run on a single VPS, they are overhead. We ship everything in one binary: REST API server, block scanner engine, webhook dispatcher, admin CLI, and Prometheus metrics exporter. The process model uses the JVM's built-in thread isolation: virtual threads for API and webhook handling, carrier threads for CPU-intensive signature verification. Resource contention is managed by a shared thread pool with configurable limits. If you need to scale, you run multiple containers behind a load balancer. But for 95% of merchants, a single CX22 handles the full workload with resources to spare.
Built Different. Deploy in Minutes.
XPay Labs packs all 10 engineering decisions into a single Docker container. Deploy on your own VPS and see the difference engineering discipline makes.
