Auditing Six Snowflake Accounts in Three Minutes

Credit burn, failed queries, login anomalies, dynamic table costs. Six countries. One scan.

The problem

Bill manages data engineering infrastructure across six Snowflake accounts in five countries — US, UK, Germany, Australia, Canada, and Mexico. Every morning, he needs to know:

How much compute cost accrued overnight
Which warehouses consumed the most
Which users had high failure rates
What the most expensive queries were
Whether anyone had failed login attempts
What dynamic table refreshes are costing — and whether they're running on the right warehouses
Whether Snowflake's built-in AI features (Cortex) are burning credits silently

Before the automation, this required logging into each account's web UI, navigating to Snowflake's built-in monitoring tables (ACCOUNT_USAGE views), running queries manually, copying results into notes, and mentally assembling the picture. Forty-five minutes on a good day. On a day with anomalies: longer.

The solution

A Python script that connects to all six accounts via SSO, runs seven queries against each account's SNOWFLAKE.ACCOUNT_USAGE views in parallel, and returns a structured, machine-readable report. The AI reads the result and produces a color-coded briefing.

The queries

Credit burn — total credits and cost by warehouse, last 24 hours. Cost calculated at the contract rate per credit.

Failure hotspots — users with more than 5 failed queries in 24 hours, ranked by failure count.

Expensive queries — top 3 queries by amount of data read, with user, warehouse, duration, and a 150-character preview of the SQL.

Login failures — failed login attempts by user, last 24 hours. Security flag.

Dynamic table refreshes — refresh count, warehouse used, cloud credits consumed, total runtime. Flagged RED if running on an ad-hoc or oversized warehouse.

Cortex AI costs — AI_SERVICES credit consumption, broken down by user and model.

The triage system

Results are flagged by severity:

RED — Security events (failed logins), dynamic tables on wrong warehouses, AI costs over threshold. Act now.
YELLOW — Elevated costs, high refresh counts, users with unusual failure rates. Investigate today.
CLEAN — Normal operations. Acknowledge and move on.

The thresholds aren't arbitrary. They're calibrated from real incidents — like the time a dynamic table refresh was silently running on a $200/hour ad-hoc warehouse instead of the $2/hour dedicated refresh warehouse. That was caught on day one of the audit system. It was burning money every 10 minutes, 24/7, with no alert.

The briefing format

### Briefing — 2026-04-10 (6 accounts)

| Account | Credits | Cost    | Top Flag           |
|---------|---------|---------|--------------------|
| kas_us  | 42.3    | $113.36 | Expensive query    |
| kas_uk  | 8.1     | $21.71  | Clean              |
| kas_de  | 3.2     | $8.58   | Clean              |
| kas_au  | 12.7    | $34.04  | Failed logins (3)  |
| kas_ca  | 5.4     | $14.47  | Clean              |
| kas_mx  | 1.1     | $2.95   | Clean              |

Action items:
1. Investigate kas_au failed logins
2. Review kas_us expensive query (4.2 TB scan)

Three minutes to read. Everything that needs attention is flagged. Everything clean is acknowledged. No dashboard hopping. No mental assembly.

The architecture

The multi-account sweep is a single Python script using the Snowflake connector with Single Sign-On authentication (the "Log in with Google" pattern, where one login gets you into multiple systems). Each account connection may trigger a browser popup for auth; cached tokens allow most to connect silently after the first run.

Some accounts require role overrides — the default role doesn't have access to ACCOUNT_USAGE views. The script handles this per-account:

ROLE_OVERRIDES = {
    "kas_au": "DOADMIN",
    "kas_ca": "SYSADMIN",
}

Results come back as JSON. The AI reads the JSON, applies the triage thresholds, and formats the briefing. If the Python sweep fails (network issues, SSO timeout), the system falls back to querying only the MCP-connected account and notes the limitation.

The human/AI split

The human defined

What matters: credits, failures, expensive queries, login security, dynamic table costs, AI costs.
What the thresholds are: dynamic tables on ad-hoc warehouses = RED. AI costs over $50/day = RED.
The triage priority: security first, then blocked people, then cost leaks, then stale work.
Which accounts to scan and which roles to use.

The AI built

The multi-account sweep script with SSO auth and role overrides.
The seven ACCOUNT_USAGE queries, tuned for the 24-hour window.
The briefing format and color-coded flag system.
The fallback logic for partial failures.
The skill command (/briefing) that runs the whole thing every morning.

What you can steal from this

ACCOUNT_USAGE is free. Snowflake's built-in views give you everything — credit burn, query history, login events, metering — at no additional cost. You're already paying for it.
Multi-account means multi-connect. There's no cross-account query in Snowflake. You have to connect to each one. Automate the connection, not the query.
Thresholds from incidents, not guesses. Every RED flag in this system exists because something actually went wrong once. The dynamic table threshold exists because it caught a real cost leak. Calibrate from reality.
The manager's job is decisions, not gathering. Forty-five minutes of dashboard-checking was replaced by three minutes of reading a briefing. The information is the same. The time cost is not.
Build the briefing for the human, not the dashboard. Dashboards show everything. Briefings show what matters. The AI filters. The human decides.

← Back to case studies

Disclosure: This page was generated by Claude (Anthropic) under Bill's direction. The Snowflake audit system described here is in daily production use across six accounts.