Historical Retrieval (RAG)
AIIA uses Retrieval-Augmented Generation (RAG) to search and retrieve relevant historical audit data. When generating suggestions, the AI accesses past engagements, findings, and workpapers to provide context-aware recommendations.
What RAG Does
RAG enhances AI responses by:
- Embedding your audit data into a vector database (pgvector)
- Searching for semantically similar historical records when a query is made
- Augmenting the AI prompt with relevant context
- Generating more accurate, context-aware responses
Use Cases
| Feature | RAG Contribution |
|---|---|
| Recurring Finding Detection | Searches past findings for similar patterns |
| Test Procedure Suggestions | Retrieves test procedures used in similar engagements |
| Risk Assessment | Finds historical risk assessments for comparable areas |
| Narrative Drafting | References past report language for consistency |
| KRI Insights | Correlates current KRI values with historical trends |
Permission-Aware Retrieval
Security
RAG retrieval respects RBAC at all times:
- The AI only retrieves records the requesting user has permission to view
- User roles (
user_roles) are passed to the RAG pipeline - Cross-organization data is never accessible
- Engagement-level access controls are enforced
How Data Is Indexed
The RAG system indexes:
- Finding descriptions and CCCER components
- Workpaper narratives and conclusions
- Test procedure descriptions and results
- Engagement objectives and scope
- Risk and control descriptions
Data is embedded using the configured embedding model (set in Administration → AI Models).
Citations
Every RAG-enhanced response includes citations:
- References to specific findings, workpapers, or engagements
- Source document identifiers
- Relevance scores
- Links to the original records (respecting permissions)
Configuration
RAG is configured through:
- Embedding Model — set in Administration → AI Models (type: embedding)
- Vector Database — PostgreSQL with pgvector extension
- Indexing Schedule — data is re-indexed periodically via background worker
On-Premises Support
For air-gapped deployments:
- Use Ollama with a local embedding model
- All data stays within your infrastructure
- No external API calls required