Case Study
Case Study
Cutting Loan-Underwriting Cycle Time 70% at an Indian NBFC
A mid-tier Indian NBFC asked us to compress the underwriting cycle for unsecured personal and SME loans. The baseline cycle from application to disbursement was 4–6 days. The target was under one. The constraint was that nothing we shipped could compromise the firm’s standing under RBI’s IT Governance directions or DPDPA 2023.
Eighteen weeks from kickoff to live, cycle time landed at 22 hours median, 38 hours at the 90th percentile. This is what was built and what we learned.
The original cycle
Application came in via the mobile app or branch DSA. KYC documents and bureau pull happened automatically. Underwriting then required a credit officer to:
- Read the application and bureau report.
- Pull additional documents (bank statements, GST returns for SME, payslips for salaried) and review them.
- Run policy rules against the file — DTI, employment continuity, banking behaviour, business vintage.
- Write a credit note explaining the decision.
- Route for sanction (one or two levels depending on ticket size).
Steps 2 and 4 consumed the most time. Step 2 was bottlenecked on manual document review. Step 4 was bottlenecked on the credit officer’s writing throughput.
What we built
A multi-agent system with three primary agents and a coordinator, each with a narrow scope and an explicit audit trail.
- Document agent. Ingested bank statements, ITRs, GST returns, payslips. Extracted structured features (average balance, salary credit pattern, return-filing regularity, GST turnover trend). Used vision-enabled models for scanned documents, with rule-based parsing for digital PDFs.
- Policy agent. Ran the firm’s written credit policy against the extracted features and the bureau pull. Returned a structured decision with policy citations.
- Narrative agent. Drafted the credit note. Always cited the underlying evidence. Never invented a fact.
- Coordinator. Orchestrated the flow, managed retries, handed off to a human credit officer for review.
Every output was advisory. The credit officer remained the decision-maker of record. The system pre-filled the decision and the note; the officer reviewed, edited, and sanctioned.
Where the four weeks of evals went
We refused to ship until we had run the system in shadow mode against 12,000 historical applications with known outcomes. The eval harness measured:
- Document extraction accuracy. Field-level accuracy against ground truth on a stratified sample. We were ruthless on bank statement parsing — the long tail of formats from Indian banks, cooperative banks, and small finance banks is genuinely hard.
- Policy decision concordance. Did the policy agent agree with the credit officer’s recorded decision? Where it disagreed, why? Some disagreements were the model being wrong. Some were the model catching policy violations the human had overlooked. Both were valuable.
- Narrative quality. Blind review by senior credit officers comparing model-drafted notes to human-drafted notes. We did not ship until the model notes were preferred in 70% of blind comparisons.
- Latency and cost. Per-application cost target was Rs 12. We hit it through model-tier routing — small models for structured extraction, larger models only for narrative generation, with prompt caching on the policy document.
Governance architecture
Because this is BFSI in India, governance was not a checklist. It was the architecture.
- Data residency. All inference inside Indian regions, on the firm’s VPC. No prompts or responses traversed a foreign network.
- Consent and purpose. Customer data was processed under existing loan-application consents. We did not retrain any model on customer data; the system was retrieval and policy-grounded only.
- Audit log. Every agent invocation logged in append-only storage — prompts, retrieved context, tool calls, structured outputs, human overrides. Tamper-evident. Queryable by the compliance team without engineering involvement.
- Model risk classification. Tier 2 under the firm’s MRM policy (decision-support, human-in-loop). Quarterly performance review, annual independent validation, change control via the firm’s existing model-risk committee.
- Adverse action. Where the system recommended decline, the rationale was structured and policy-cited — directly usable in the adverse action notice the regulator requires.
What we got right
The discipline of shadow-mode evals on real historical data was non-negotiable. We caught issues that no demo would have surfaced. Bank-statement parsing was harder than we expected; we invested in a dedicated test set covering 30 bank formats and four document-AI vendors before settling on the final architecture.
We kept the credit officer in the seat. The system never autonomously approved a loan. This was both a regulatory choice and an organisational one — credit risk staff had to trust the system before they would use it well, and the only way to earn that trust was to give them the final call.
What we got wrong, and fixed
The first version of the narrative agent over-explained. Credit officers wanted terse notes that flagged the two or three things they should check. We had built notes that read like an essay. Rewriting the prompt and the eval rubric to reward terseness cut both token cost and reviewer time.
We under-invested in the GST extraction in the first release. Mid-engagement we discovered that the firm’s SME book was growing faster than its salaried book, and SME applications depend on GST returns far more than bank statements. We rebuilt the GST agent in week 12 and recovered the timeline.
What changed
| Metric | Before | After |
|---|---|---|
| Cycle time (median) | 4 days | 22 hours |
| Cycle time (P90) | 6 days | 38 hours |
| Credit officer throughput | 14 files/day | 38 files/day |
| Cost per underwritten file | Rs 240 | Rs 90 |
| Disbursement-stage rejection | 8.1% | 5.9% |
The disbursement-stage rejection drop was the most interesting — the policy agent caught issues at the underwriting stage that the human officers had been missing under time pressure. The system did not just go faster; it went better.
What we will publish next
A separate post on the eval harness itself — what it covered, how we ran it, what we would change. Subscribe via RSS if you want it when it lands.
Names and identifying details are anonymised at the client’s request. Engagement metrics are accurate.
More Insights
- Why 80% of Enterprise AI Agents Never Reach Production
- DPDP Act and Generative AI: What Indian Enterprises Must Implement
- The Compliance Surface of Production AI
- When BM25 Beats Your Embedding Model: Hybrid Retrieval in the Wild