---
title: Gke Inference — Invoked Exchange skill
description: Deploys and optimizes AI/ML inference workloads on GKE, using GPUs, TPUs, and model servers. Use when deploying GKE inference servers, configuring GKE GPU resources for inference, or deploying LLMs on GKE. Don't use for generic batch jobs or HPC task queues (use gke-batch-hpc instead).
doc_version: "1.0"
last_updated: "2026-06-29T08:43:35.324Z"
canonical: https://invoked.ai/skills/gke-inference
---

# Gke Inference

Deploys and optimizes AI/ML inference workloads on GKE, using GPUs, TPUs, and model servers. Use when deploying GKE inference servers, configuring GKE GPU resources for inference, or deploying LLMs on GKE. Don't use for generic batch jobs or HPC task queues (use gke-batch-hpc instead).

- Shared by: google/skills
- Composes surfaces: 
- Tool chain: 
- Add to Invoked: https://invoked.ai/skills/gke-inference

## Community usage

_Building — published once enough workspaces have run this skill._

## Sitemap

See the full [Exchange sitemap](https://invoked.ai/sitemap.md).
