AAWEA.ORG
AAWEA.ORG
AAWEA.ORG

Reliability

Explore pre-built workflows for Reliability. Fill in details to get professional output in seconds.

Reliability Skills

← All Skills
Server Management 🤖
Server Management
Community
Reliability

Server management principles and decision-making. Process management, monitoring strategy, and scaling decisions. Teaches thinking, not commands.

Service Mesh Expert 🤖
Service Mesh Expert
Community
Reliability

Expert service mesh architect specializing in Istio, Linkerd, and cloud-native networking patterns. Masters traffic management, security policies, observability integration, and multi-cluster mesh con

Tool Use Guardian 🤖
Tool Use Guardian
Community
Reliability

FREE — Intelligent tool-call reliability wrapper. Monitors, retries, fixes, and learns from tool failures. Auto-recovers from truncated JSON, timeouts, rate limits, and mid-chain failures.

Datadog Automation 🤖
Datadog Automation
Community
Reliability

Automate Datadog tasks via Rube MCP (Composio): query metrics, search logs, manage monitors/dashboards, create events and downtimes. Always search tools first for current schemas.

Distributed Tracing 🤖
Distributed Tracing
Community
Reliability

Implement distributed tracing with Jaeger and Tempo for request flow visibility across microservices.

Pagerduty Automation 🤖
Pagerduty Automation
Community
Reliability

Automate PagerDuty tasks via Rube MCP (Composio): manage incidents, services, schedules, escalation policies, and on-call rotations. Always search tools first for current schemas.

Sentry Automation 🤖
Sentry Automation
Community
Reliability

Automate Sentry tasks via Rube MCP (Composio): manage issues/events, configure alerts, track releases, monitor projects and teams. Always search tools first for current schemas.

Application Performance Performance Optimization 🤖
Application Performance Performance Optimization
Community
Reliability

Optimize end-to-end application performance with profiling, observability, and backend/frontend tuning. Use when coordinating performance optimization across the stack.

Distributed Debugging Debug Trace 🤖
Distributed Debugging Debug Trace
Community
Reliability

You are a debugging expert specializing in setting up comprehensive debugging environments, distributed tracing, and diagnostic tools. Configure debugging workflows, implement tracing solutions, and establish troubleshooting practices for development and production environments.

Incident Responder 🤖
Incident Responder
Community
Reliability

Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management.

Observability Engineer 🤖
Observability Engineer
Community
Reliability

Build production-ready monitoring, logging, and tracing systems. Implements comprehensive observability strategies, SLI/SLO management, and incident response workflows.

On Call Handoff Patterns 🤖
On Call Handoff Patterns
Community
Reliability

Effective patterns for on-call shift transitions, ensuring continuity, context transfer, and reliable incident response across shifts.

Postmortem Writing 🤖
Postmortem Writing
Community
Reliability

Comprehensive guide to writing effective, blameless postmortems that drive organizational learning and prevent incident recurrence.

Slo Implementation 🤖
Slo Implementation
Community
Reliability

Framework for defining and implementing Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets.

🤖

Not sure where to start?

Tell us your goal and our AI will recommend the perfect skill for you — no browsing required.
🎯 Find My Perfect Skill →