---
title: Agent Platform Eval Flywheel — Invoked Exchange skill
description: Measures and improves the quality of AI models and agents on Google Cloud using the Eval Quality Flywheel methodology. Use when evaluating an agent or model, building an eval dataset, picking or writing evaluation metrics, analyzing failures, comparing results before and after a fix, or when guidance is needed on Agent Platform eval methodology — including dataset schema, LLM-as-judge scoring, and common failure causes. For fine-tuning, use agent-platform-tuning. For general production deployment, use agent-platform-deploy.
doc_version: "1.0"
last_updated: "2026-06-26T21:12:05.525Z"
canonical: https://invoked.ai/skills/agent-platform-eval-flywheel
---

# Agent Platform Eval Flywheel

Measures and improves the quality of AI models and agents on Google Cloud using the Eval Quality Flywheel methodology. Use when evaluating an agent or model, building an eval dataset, picking or writing evaluation metrics, analyzing failures, comparing results before and after a fix, or when guidance is needed on Agent Platform eval methodology — including dataset schema, LLM-as-judge scoring, and common failure causes. For fine-tuning, use agent-platform-tuning. For general production deployment, use agent-platform-deploy.

- Shared by: google/skills
- Composes surfaces: 
- Tool chain: 
- Add to Invoked: https://invoked.ai/skills/agent-platform-eval-flywheel

## Community usage

_Building — published once enough workspaces have run this skill._

## Sitemap

See the full [Exchange sitemap](https://invoked.ai/sitemap.md).