Essay
Essay
DPDP Act and Generative AI: What Indian Enterprises Must Implement
The Digital Personal Data Protection Act, 2023 is now in force, and most Indian enterprises building generative AI have spent the last eighteen months getting the framework wrong. Not because the Act is unclear — it is clearer than GDPR in several respects — but because the people writing the privacy policy and the people building the RAG pipeline are rarely in the same room.
This is a practical mapping. What DPDPA requires, expressed as the architectural decisions you need to make in your LLM stack.
The four obligations that matter for AI
The Act creates many obligations. Four of them collide directly with how generative AI systems are built.
- Consent must be specific, informed, and granular. A user consenting to personal data being used for “service delivery” has not consented to it being used to train a foundation model. Each purpose needs its own consent.
- Purpose limitation is binding. Personal data collected for one purpose cannot be used for another. Embedding a customer’s profile into a vector store for “internal search” does not authorise re-using those embeddings to fine-tune a sales-assistance model.
- Cross-border transfer is restricted. Personal data of Indian principals can be transferred only to permitted jurisdictions (currently to be notified). Sending embeddings of personal data to a foreign LLM API counts as a cross-border transfer.
- The right to erasure is real. A data principal can request deletion. Your AI stack must be able to honour that — including deletion from vector stores, training datasets, fine-tuning corpora, and agent memory.
Each of these has an architectural consequence.
Consent ledger
If you are using personal data anywhere in your AI pipeline — training, embedding, retrieval, agent memory — you need a consent ledger. Not a column in a CRM. A purpose-tagged, timestamped record of which principal consented to which use, with cryptographic non-repudiation. When the principal withdraws consent, downstream systems must be able to reconcile.
Architecturally, this means consent is a first-class object in your data plane. Every embedding, every training record, every retrieved chunk carries a reference to the consent record that authorises its presence. When a consent is withdrawn, you can compute the blast radius.
Purpose-bound vector stores
The convenient pattern — “embed all customer data into one vector store, query it for any internal use case” — is a DPDPA violation waiting to be enforced. Different purposes need different stores, or one store with hard purpose-tagging at the chunk level and access control that enforces it at query time.
This costs storage. It saves the regulator’s attention.
Cross-border discipline
Sending Indian personal data to a foreign-hosted LLM is a cross-border transfer. The simplest architectural response is the safest one: route any inference involving personal data through India-hosted endpoints. Most major providers now offer Indian inference regions. Where they do not, the choice is between an Indian-hosted open-weights deployment and a careful legal opinion on the destination jurisdiction.
Embeddings of personal data are personal data. This trips up engineering teams who treat the embedding as a one-way hash. It is not. Modern inversion attacks can reconstruct enough of the original text to constitute a transfer.
Erasure across the stack
When a principal exercises the right to erasure, your obligation extends to every place their data has been processed. For a typical enterprise AI stack, that means:
- The transactional store the data came from.
- The vector store(s) where it was embedded.
- The training corpora it was added to.
- The fine-tuned model weights it influenced, where reproducible.
- The agent memory and conversation logs it appears in.
- The audit logs — which generally cannot be deleted but must be access-restricted.
Not all of these are trivially reversible. A model fine-tuned on data including the principal’s records does not “forget” that data when you delete the source. The defensible posture is to design for reproducibility — fine-tuning datasets are versioned, training is reproducible, and retraining without the deleted records is operationally feasible within a defined timeline.
The audit trail your regulator will ask for
The Data Protection Board has not yet built up a published enforcement record. The shape of what it will ask for is, however, predictable from the Act and from comparable regulators elsewhere. At minimum:
- Consent records with purpose tags and timestamps.
- Data-flow diagrams showing where personal data enters each AI subsystem.
- Cross-border transfer logs.
- Erasure request handling logs with completion timelines.
- Incident response playbooks for personal-data breaches involving AI subsystems.
These are not artifacts you produce after a notice arrives. They are systems you stand up before you ship.
What we recommend
Treat DPDPA as a design constraint on your AI architecture, not a privacy-policy revision. The cost of getting it right at design time is small. The cost of retrofitting consent, purpose, and erasure into a deployed generative AI stack is large — and the cost of not retrofitting it is, increasingly, a regulator-shaped problem.
If you would like a DPDPA-and-AI gap assessment, we run them as a fixed-scope engagement.
More Insights
- Why 80% of Enterprise AI Agents Never Reach Production
- The Compliance Surface of Production AI
- Cutting Loan-Underwriting Cycle Time 70% at an Indian NBFC
- When BM25 Beats Your Embedding Model: Hybrid Retrieval in the Wild