Introduction
An experiment in AI-assisted grant evaluation: ten ENS SPP2 applications scored by frontier models against a weighted rubric, surfacing nine recurring failure modes across the cohort. Methodology and per-application reports are published on the ENS governance forum.
Details
The evaluation problem
ENS DAO's Special Purpose Program (SPP2) grant cycle had a familiar problem: inconsistent evaluation, scattered feedback, and no structured way for stewards or applicants to understand what "good" looked like. Cycles closed, projects funded, and the same gaps showed up in the next round's applications.
What we tested
We ran frontier models — Claude Code, Opus 4.6, and Sonnet 4.5 — against ten SPP2 applications using a weighted rubric. Two prompt variations: a pure scoring pass, and an enhanced pass that included GitHub repository analysis. The models produced per-application scoring, funding recommendations, and an aggregated report of recurring weaknesses across the cohort.
The point wasn't to replace human judgment. It was to test whether automated screening could absorb the operational burden of consistent first-pass review, leaving stewards to focus on the questions that actually require taste.
What we found
The models converged on a small set of failure modes that showed up in nearly every application:
- Missing budget breakdowns
- Vague KPIs without measurable targets
- Undefined scope presented as funded deliverables
- Generic tooling repackaged as ENS-specific
- No quarterly milestone structure
- GitHub evidence mismatching claimed team size
- Overlapping scope without differentiation
- Open-source compliance treated as afterthought
- COI disclosures lacking structural resolution
Only two of ten applications received a strong-fund signal. Half required additional guardrails. None of these patterns are hidden in any single application — but spotting the same nine across a batch is exactly the kind of work where consistency at scale beats human review.
What this enables
The highest-leverage intervention is the cheapest: publish the evaluation rubric before the application window opens. Applicants self-correct, stewards receive cleaner submissions to weigh, and the per-application review burden drops without changing the decision-making layer.
The full writeup is published as a HackMD artifact, including the rubric and per-application reports, with discussion on the ENS governance forum. This research is one piece of a larger thesis at Lighthouse: empirical accountability is a precondition for decentralized organisations to make decisions worth defending.
Topic
AI-Augmented GovernanceAI-assisted methods for grant evaluation, discourse mapping, and operational governance workflows that surface structure at scale.
Concepts

