Forensic Doets: Isolating Synthetic Comment Campaigns in 2026
Standard AI detectors drown genuine public testimony in false positives. We built a legally defensible pipeline that uses semantic clustering, metadata pacing, and prompt forensics to separate coordinated bot surges from human advocacy.
We tracked one midweek EPA rulemaking opening last spring. The Federal Register published the notice at 09:00. By 11:42, the submission queue held roughly eight thousand paragraphs. We watched the ingest rate jump by four orders of magnitude in under three hours. None of it looked random. Every block shared identical syntactic scaffolding, mirrored cadence patterns, and recycled regulatory phrasing. Real citizens did not coordinate that tightly. We needed a better filter. Commercial detectors failed immediately. They flagged environmental advocates alongside synthetic surges. We had to rebuild the workflow from scratch.
The Docket Flood
The search query surfaces constantly right now: how to tell if a legal document is AI generated? Practitioners instinctively paste text into scoring dashboards and hope for a probability curve. The curve lies. Modern drafting models strip out detectable watermarks. Perplexity scores bounce wildly depending on prompt temperature and training data proximity. Legitimate advocacy networks share boilerplate paragraphs for copy-paste campaigns, which trips every binary classifier.
The legal stakes compound quickly. Under 5 U.S. Code § 553, agencies must accept and review substantive public comments before finalizing rules. Synthetic campaigns weaponize that mandate. They flood the record with near-identical paragraphs, forcing legal staff to parse noise while drowning out targeted community testimony. I watched a small compliance team burn through three weeks of contract hours just to flag duplicate submissions. They eventually pulled the entire batch and delayed the rulemaking timeline by months. That delay triggered industry lawsuits and public backlash.
We tried a different approach. We abandoned single-model scoring entirely. The goal shifted from detection to isolation. If a submission cluster behaves like an automated pipeline, we treat it as a forensic object, not a content moderation problem. That pivot changes the math. It also changes the liability. Agencies cannot simply delete comments based on a vendor confidence score. The law demands a defensible audit trail.
Engineering a Defensible Pipeline
We rebuilt the ingestion workflow around three observable signals. Text similarity reveals prompt reuse. Submission timing exposes automation pacing. Metadata footprints flag synthetic generation artifacts. None of these signals rely on black-box classifiers. Each one survives legal scrutiny because the methodology is transparent, reproducible, and open to cross-examination.
Step 1: Semantic Clustering
We convert raw comment bodies into dense vector embeddings. Cosine similarity replaces subjective similarity judgments. A threshold around zero point nine two separates genuinely organic drafting from template-driven generation. Large language models produce near-perfect syntactic variance when asked to rewrite, but they collapse into highly predictable semantic neighborhoods when given a system prompt that demands regulatory compliance formatting. Grouping vectors exposes those neighborhoods instantly.
Step 2: Metadata Pacing Analysis
Human submissions follow circadian rhythms and weekend dips. Bot networks inject comments at constant millisecond intervals or trigger micro-bursts when target thresholds drop. We plot submission timestamps against cluster identifiers. A flat line across a high-similarity cluster indicates scripted API calls or scheduled task runners. We overlay rate limits and account creation dates. The pacing anomaly becomes the primary flag, not the text itself.
Step 3: Prompt-Template Forensics
Synthetic comments carry hidden scaffolding. They reuse identical transitional phrases, hallucinated citation formats, and standardized disclaimer blocks. We run n-gram extraction across each cluster. Recurring three-word sequences that never appear in organic legal drafting point to shared prompt engineering. The patterns do not prove AI authorship on their own. They establish a common generation source that warrants deeper review.
| Method | False Positive Risk | Legal Defensibility | Primary Use Case |
|---|---|---|---|
| Commercial Classifier Scores | High | Low | Quick triage for internal review |
| Semantic Clustering | Medium | High | Isolating prompt-driven batches |
| Metadata Pacing | Low | High | Detecting automated submission scripts |
We learned to stop trusting confidence percentages early on. Our first pipeline relied heavily on a third-party detection score to auto-route suspicious comments. We got it wrong. The system flagged a coalition submission from a Pacific Northwest watershed group because their legal counsel used standardized phrasing across fifty distinct signatories. The agency paused review. The coalition sued over chilled participation. We reversed the auto-routing rule that same week and rewrote the workflow to require human validation for every flagged cluster. That scar tissue shaped everything we build today. Transparency matters more than speed in public proceedings.
Federal and state regulators have accelerated scrutiny around synthetic content generation, signaling that agencies will soon face mandatory verification standards for public input channels.
The Stack That Actually Works
You do not need proprietary detection suites to run this workflow. Open tooling handles the heavy lifting when configured for auditability. We route everything through the Regulations.gov API for initial data ingestion. Python handles parsing and scheduling. We pass raw text through open-weight sentence-transformers to build the embedding matrix. Scikit-learn runs the dimensionality reduction and hierarchical density clustering. Pandas merges the timestamp arrays and flags pacing anomalies.
Researchers mapping NIST AI RMF 1.0 standards into public sector workflows emphasize verifiable pipelines over vendor promises. The framework treats data integrity as a compliance baseline, which aligns perfectly with forensic cluster analysis. When you can show an exact hash of every vector input, a precise similarity threshold, and a clear timestamp log, you survive administrative review. Black-box outputs do not survive.
Our platform runs these exact pipelines at scale. You can watch live investigations unfold on the public research dashboard or spin up dedicated infrastructure through our enterprise research teams. We maintain a complete editorial methodology document that outlines every data step, so external auditors can replicate our clustering logic without guessing.
What Our Pipeline Revealed
We processed roughly forty thousand comments across three consecutive dockets this quarter. The data exposed a quiet infrastructure. Roughly one in three high-volume submissions matched pacing signatures consistent with centralized automation. We did not auto-reject them. We grouped them, extracted the shared system-prompt artifacts, and handed the audit packages to agency counsel for independent review. The GAO report on federal agency AI risks outlines exactly why this manual handoff remains critical: automated filtering without human oversight breaks public trust and invites procedural injunctions.
We are tracking the legislative horizon closely. Congress is debating cryptographic provenance tags for all public inputs. If that mandate passes, forensic clustering becomes a historical exercise. If it stalls, agencies will rely entirely on independent research platforms and open-source intelligence networks. Academic initiatives like the trust restoration project at Cornell are testing watermarking and signature protocols that might eventually plug this gap. Right now, we operate in the open middle. We export, we cluster, we verify, and we publish our findings in the public audit feed without claiming absolute authority.
You can run the exact same validation yourself. Export one thousand comments from any open regulations.gov docket. Pass them through a cosine-similarity matrix with a zero point nine two threshold. Manually audit the top three largest clusters for recurring system-prompt artifacts. Map the submission timestamps across those clusters versus organic human submissions. Look for micro-burst pacing that matches bot-network automation rather than volunteer coordination.
Can rulemaking agencies legally enforce digital provenance standards for public comments without violating the Administrative Procedure Act’s guarantee of open, unrestricted public participation? I have not seen a constitutional test that resolves that tension yet. The courts will decide whether cryptographic tags become a requirement or an advisory. Until they do, we keep publishing our cluster logs, we keep inviting counter-analysis, and we keep treating docket verification as a collaborative audit rather than a gatekeeping function.
MOBILIZR -- Writing at mobilizr.org