Research-backed prompt & issue evaluation

See the toolkit in action.

Three evaluators drawn directly from the Prompt Engineering Toolkit. Paste a prompt, a GitHub issue, or a URL. Get back a calibrated, evidence-anchored evaluation in the same style the framework teaches. No API key required.

01 / Description

Prompt Evaluator

Score any prompt across the four PPEP dimensions — Product, Process, Performance, Epistemics — calibrated against nine reference anchors spanning 1/10 to 10/10.

Open evaluator 02 / Discernment

Issue Evaluator

Decide whether a GitHub implementation plan issue is safe to hand to an AI coding agent. Eight-section weighted rubric, severity findings, automatic rewrite suggestions.

Open evaluator 03 / Discernment

UI/UX URL Evaluator

Audit a live URL against Nielsen's heuristics, WCAG 2.2 AA, and a 20-dimension scoring system. Produces severity findings with suggested remediation.

Open evaluator

How this works The companion app is a thin proxy in front of Anthropic's API. Each request uses the canonical system prompt from the toolkit's prompts/ folder, so the app always matches what the repository teaches. Your evaluations are not logged or stored.

See the toolkit in action.

Evaluators

Prompt Evaluator

Issue Evaluator

UI/UX URL Evaluator