Research Lab

SDF keeps learning so your team does not have to run every experiment.

Too little control gives teams electric speed, but quality can crash. Too much control protects the process, but slows momentum. Software Dark Factory tests the middle ground: agentic speed with enough structure to keep work reviewable, tested, maintainable, scalable, secure-aware, owned by the team, and reusable as future delivery context.

Start your assessment See the delivery model

Applied product research, not a benchmark lab, certification lab, correctness guarantee, or measured-savings claim.

Purpose

The lab is not research for its own sake.

SDF was shaped by thousands of hours of real agentic delivery work, not theory. The lab keeps that learning loop active as tools, providers, models, reasoning modes, and reviewer surfaces change.

Models will change. The delivery standard should not. SDF is not built to make teams dependent on one AI provider; Codex, Claude Code, Cursor, Copilot, and future agents should be swappable execution surfaces around the local SDF loop and repo-local Front Door.

The goal is the useful middle: enough structure to keep agentic work reviewable and safe to absorb, without slowing teams back down to pre-agent speed or letting quality crash under unmanaged speed.

Software delivery is the first proving ground. The broader research question is how organisations operate when agents can discover, decide, execute, hand off, and improve work across functions without losing human review or useful context.

Speed without a quality crash

The work tests how to keep quality, best practices, tests, maintainability, and ownership visible when agents increase output volume.

Control without lost momentum

SDF looks for the minimum useful governance that helps reviewers trust the work without adding process for its own sake.

SDF loop over model bet

SDF governs the delivery loop around the agent: receiver-owned guidance, verification, evidence, review surface, and the human-controlled boundary for approval, merge, deployment, and judgement.

What we test

Real workflows, real failure modes, reviewed before they reach customers.

The lab dogfoods SDF across the SDF CLI, this GTM app, and Explore while testing local and cloud agents, providers, models, reasoning modes, PR shapes, verification boundaries, reviewer surfaces, and receiver-safe Front Door refreshes.

Do not govern the model. Govern the delivery loop: agreed standards shape the work while it is being built, then the PR or MR arrives scoped, verified, explainable, reviewable, and backed by evidence.

What makes agents swappable

Which Front Door instructions, team playbooks, acceptance criteria, evidence records, and review surfaces keep delivery standards stable when the execution surface changes.

What improves review confidence

Which evidence, acceptance criteria, risk notes, verification records, and handoff details help a human reviewer make a better decision.

What damages quality

Where agentic speed creates fragile changes, unclear ownership, weak tests, maintainability drag, provider coupling, or security-sensitive blind spots.

What usage data can show

AI usage is not impressive by itself. Declared run context is recorded only when genuinely available, unknown values remain unknown, and token, usage, or cost data require deliberate integrations before they can support economics analysis.

What should become product

Reviewed lessons are turned into a better SDF path. Receiver-safe refreshes show that useful guidance can move through the named dogfood repositories without turning that learning into automatic enforcement.

What shared context should unlock

The research question is how governed evidence can travel with the work as it moves from engineering into product, marketing, operations, customer support, customer success, portfolio systems, CI/CD, quality tools, and future agent runs.

Customer benefit

A better SDF loop, not a bigger promise.

Customers do not need to become experts in every provider, model, agent surface, PR shape, or reasoning setting before trying governed agentic delivery.

SDF keeps testing the moving landscape, then packages the useful lessons into clearer intake, better evidence, stronger review surfaces, safer verification boundaries, and more practical handoff guidance.

The Front Door is the repo-local operating layer around changing agents. SDF and customer playbooks shape the work while it is being built; the reviewer mostly needs to know whether the change is scoped, verified, explainable, and reviewable, not which model typed the diff.

The current product records evidence for governed changes and keeps that context close to the work. The direction is a human-readable and agent-readable handoff layer that other teams and tools can reference downstream.

The goal is not a bigger audit trail. The goal is useful context that follows the work, so teams do not have to reconstruct intent from Slack, meetings, old PRs, or memory.

This does not certify provider or model quality, guarantee correctness, prove security, claim measured savings, or replace customer approval, merge, deploy, and production ownership.

Faster delivery remains the point.

Governance should protect useful speed, not bury it under ceremony.

Engineering quality still matters.

Testing, maintainability, scalability, sustainability, security awareness, and clear ownership stay part of the delivery standard.

Humans keep the decisions.

SDF prepares evidence and workflow structure. Your team keeps scope, review, merge, deployment, and production control.