Data Agent
AI-powered natural language to SQL query engine with RBAC-filtered schema access.
Architecture
Intent Classification
| Type | Example | Handler |
|---|---|---|
| Data | "How many open findings?" | Data Agent (SQL) |
| Document | "Summarize last year's IT audit" | RAG Pipeline |
| Hybrid | "Show findings similar to FR-042" | Both |
| Conversational | "What does CCCER stand for?" | Direct LLM |
Schema Registry
The Schema Registry provides RBAC-filtered DDL:
- Each role sees only authorized tables
- Column-level filtering for sensitive data
- Schema cached in Redis (5-min TTL) per role+org
SQL Security (5 Layers)
| Layer | Check |
|---|---|
| 1. Syntax validation | Valid SQL syntax |
| 2. Whitelist check | Only SELECT statements |
| 3. Table access | Only authorized tables |
| 4. Injection prevention | No semicolons, comments, or DDL |
| 5. Read-only enforcement | SET default_transaction_read_only = ON |
Caching
- Results cached in Redis
- 5-minute TTL
- Scoped by role + org_id
- Cache invalidated on data changes
Example
User: "Show me the top 5 business units by number of open findings"
Generated SQL:
SELECT bu.name, COUNT(f.id) as finding_count
FROM finding f
JOIN engagement e ON f.engagement_id = e.id
JOIN audit_universe_item bu ON e.universe_item_id = bu.id
WHERE f.status != 'closed'
GROUP BY bu.name
ORDER BY finding_count DESC
LIMIT 5;
Result: Chart + table showing the data.