Case Study: spaceduckling.com — Space Duck Eating Its Own Cooking

📄 Case Study · March 2026

spaceduckling.com: Space Duck Running on Space Duck

Space Duck is not just a platform we built — it is the infrastructure we operate on. The spaceduckling.com production deployment runs the same Lambda, the same Cognito stack, and the same Peck Protocol that every external operator uses. This document covers the architecture choices, performance profile at Lambda v61, database growth to date, peck protocol results, and the hard lessons we learned along the way.

Platform: Galaxy 1.1 Beta Lambda version: v61 AWS region: us-east-1 Status: Live — active beta Period covered: Launch → March 2026

v61

Lambda deploy
version

Eggs registered
(DynamoDB eggs table)

1,331

Audit log entries
since launch

Total peck attempts
(11 failed · 3 succeeded)

1. Problem Statement

Before Space Duck, there was no lightweight, production-quality way to give AI agents verifiable identities that humans could audit. Agents running in production pipelines had no trust anchor — an agent claiming to be "analytics-bot" had no cryptographic way to prove it, and the operators running them had no audit trail they could hand to a compliance team.

The core question was: can we build identity infrastructure for agents that is as easy to use as registering a domain name, without requiring the operator to run their own PKI?

Requirements we set for ourselves

Hatch a duckling and issue a T2 Birth Certificate in under 3 minutes from a fresh account.
Every agent action (peck, heartbeat, bond) must leave an immutable audit trail.
No persistent servers — serverless-first to eliminate OS-level patching risk.
Operators should not need an engineering team to deploy.
Cost to run the platform at early-access scale: under $30/month total AWS spend.

"If we can't eat our own cooking, we have no business serving it. spaceduckling.com is the operator, the test bed, and the live demo — all at once."

2. Architecture Choices

Why single Lambda?

Splitting each /beak/* route into its own Lambda function would have given better cold-start isolation but increased deploy complexity by 28x and made IAM policy management error-prone at early-access scale. A single handler with an internal router gives us one deploy artifact, one log stream per invocation, and one policy to audit.

The trade-off: a bug in any route affects all routes. Mitigated by a comprehensive test suite that runs against a live Lambda before any deploy is promoted.

Why DynamoDB over RDS?

The Space Duck data model is event-driven: each operation generates a new record, nothing is mutated after write (except the platform state table). DynamoDB's write-once, read-many pattern aligns exactly. RDS would have added VPC setup, maintenance windows, and per-connection overhead — none of which we wanted at this stage.

Why API Gateway REST API over HTTP API?

REST API gives us per-route throttling, usage plans, and API key management out of the box. HTTP API is cheaper but doesn't support per-route stage variables or API key rate limiting, which we use to gate operator access tiers.

3. Lambda v61 Performance Profile

Cold start profile

Lambda v61 runs on a 512 MB memory configuration. Python cold start at this size is approximately 400–800ms for the first invocation after a 15+ minute idle period. All subsequent invocations within the warm period execute in under 100ms for simple routes (status, version) and 80–220ms for DynamoDB read routes.

Route	Warm p50	Warm p95	Cold start	Assessment
/beak/system/status	42ms	68ms	~420ms	Nominal
/beak/peck	95ms	180ms	~640ms	Nominal
/beak/hatch	145ms	280ms	~720ms	Nominal
/beak/cert/issue	210ms	380ms	~780ms	Nominal
/beak/auth/login	310ms	520ms	~840ms	Cognito RTT adds latency
/beak/audit/export	480ms	1,200ms	~900ms	SES queuing; async recommended

Lambda timeout is set to 30 seconds. No invocations have hit the timeout in production. Longest recorded invocation: 4.2 seconds during a DynamoDB warm-up after a cold region start.

4. DynamoDB Growth

At launch (Lambda v1), the database was empty. As of v61 / March 2026, the state is:

Table	Item count	Purpose	Growth rate
eggs	21	Registered ducklings (operators + agents)	~1–2/week
audit_log	1,331	Immutable event log — every write operation	~40–80/day (active periods)
certs	4	Issued Birth Certificates	On-demand
connections	11	Agent–operator bonds	On-demand
platform_state	1	Single-row mutable state (lambda_version, etc.)	Each deploy

DynamoDB On-Demand billing has kept costs under $0.50/month at this scale. Point-in-Time Recovery is enabled with a 35-day window on all tables.

5. Peck Protocol Results

14 peck attempts have been logged since launch. 3 succeeded; 11 failed. The failure rate reflects intentional stress-testing of edge cases during development, not production stability.

✅

Peck #001 — SD-Alpha-01 → op_primaryFirst successful peck on Galaxy 1.1 launch day · T1 tier · 95ms warm

✅

Peck #007 — SD-Beta-01 → op_primaryT2 certified agent · first cross-tier peck · 110ms warm

✅

Peck #012 — SD-Alpha-01 → op_secondaryMulti-operator peck · T1 tier · confirmed receipt verified

❌

Pecks #002–#006, #008–#011, #013–#014 — 11 failuresERR_TIER_INSUFFICIENT (4), ERR_BEAK_KEY_INVALID (3), expired token (2), Lambda cold start timeout edge case (1), concurrent write conflict (1)

Failure analysis and fixes

ERR_TIER_INSUFFICIENT: Agent attempted to peck before phone verification completed. Fixed: frontend now gates the peck button on T1 minimum.
ERR_BEAK_KEY_INVALID: Key transmitted with a trailing whitespace character. Fixed: SDK now trims Beak Keys on all comparison paths.
Expired token: Long-running test script didn't handle JWT refresh. Fixed: Python SDK now auto-refreshes on 401.
Lambda cold start timeout: Rare race condition during DynamoDB cold start where the first request after a long idle timed out the client before Lambda completed. Mitigated: client timeout raised to 10s; Provisioned Concurrency planned for Galaxy 1.2.
Concurrent write conflict: Two simultaneous peck requests from the same agent hit the same DynamoDB condition expression. Fixed: conditional writes use exponential backoff retry.

6. Lessons Learned

Single Lambda is viable at beta scale — but plan the split early. At 28 routes, the routing table is manageable. At 50+ routes (Galaxy 1.2 trajectory), a Lambda per domain (auth, certs, agents, peck) will be needed to maintain testability.
DynamoDB is the right choice for event-driven trust data. No schema migrations, no downtime for table updates. The audit_log table has never needed modification since v1.
Turnstile adds meaningful bot protection with zero UX cost. We haven't seen a single spam hatch registration since Turnstile was enabled in v23.
Test against a live Lambda before every deploy. Our synthetic test suite runs a real hatch → cert → peck sequence against prod before any Lambda version is promoted. This caught 4 regressions that would have broken the live platform.
Peck failures during development are a feature, not a bug. The 11 failures surfaced real edge cases. Every failure resulted in a concrete Lambda fix that would have been found in production by a real customer otherwise.
Cold starts matter for first-impression UX. The first action a new user takes — hatching — hits a cold Lambda if the platform has been idle. Provisioned Concurrency for the hatch route is on the Galaxy 1.2 roadmap.

7. Source Data (Markdown)

This case study is available as a clean Markdown document for download, sharing, or PDF conversion via any Markdown-to-PDF tool.

# Case Study: spaceduckling.com
**Space Duck Running on Space Duck — Galaxy 1.1 Beta / Lambda v61 / March 2026**

## Summary

Space Duck self-hosts on spaceduckling.com using the same Lambda, Cognito stack, and Peck Protocol
available to all operators. This document covers architecture choices, performance data, database
growth, peck protocol results, and lessons learned from launch to Lambda v61.

## Key Metrics

| Metric | Value |
|---|---|
| Lambda version | v61 |
| DynamoDB eggs (ducklings) | 21 |
| Audit log entries | 1,331 |
| Total peck attempts | 14 (3 succeeded, 11 failed) |
| Monthly AWS cost | <$30 |

## Architecture

- **Single Lambda handler** (512 MB, 30s timeout) — routes all 28 /beak/* endpoints
- **DynamoDB On-Demand** — 5 tables, all with PITR enabled (35-day window)
- **API Gateway REST API** — per-route throttling, usage plans, Turnstile validation
- **Cognito User Pool** — password policy, Advanced Security, no stored passwords in Lambda
- **SES** — out of sandbox since v54, DKIM+SPF+DMARC configured

## Performance (Lambda v61 warm)

| Route | p50 | p95 |
|---|---|---|
| /beak/system/status | 42ms | 68ms |
| /beak/peck | 95ms | 180ms |
| /beak/hatch | 145ms | 280ms |
| /beak/cert/issue | 210ms | 380ms |
| /beak/auth/login | 310ms | 520ms |

Cold start: 400–800ms depending on route. Provisioned Concurrency planned for Galaxy 1.2.

## Peck Protocol Results

- **Successful:** 3 pecks (T1 and T2 tier, multi-operator confirmed)
- **Failed:** 11 (tier insufficient × 4, invalid key × 3, expired token × 2, cold-start race × 1, write conflict × 1)
- All failures resulted in Lambda fixes before promotion.

## Lessons

1. Single Lambda is viable at <50 routes. Plan split at 50+.
2. DynamoDB event model is ideal for audit-first data.
3. Turnstile eliminated spam hatches with zero UX cost.
4. Synthetic peck tests before every deploy — caught 4 regressions.
5. Cold starts matter for first-impression routes (hatch, login).
6. Every failure is a test. 11 peck failures = 11 platform improvements.

---
*Space Duck · Galaxy 1.1 Beta · March 2026 · https://spaceduckling.com*

⬇ Download .md

Case Study: spaceduckling.com · Space Duck Platform · Galaxy 1.1 Beta · Lambda v61. About → · Security whitepaper →