.
Article

Understanding Privacy and Data Protection in Legal AI: A Guide for In-House Counsel

Zero Data Retention is critical for in-house legal teams using AI. Learn what it means, why it matters, and how to evaluate legal AI vendors securely.

The In-House Counsel’s Guide to AI and Privacy and Data Protection 

Artificial intelligence is rapidly transforming the legal landscape, offering powerful tools for legal document automation, AI contract review and streamlining in-house workflows. For legal professionals, the central question is no longer if they should adopt legal AI, but how to do so securely and responsibly.

This guide is designed for in-house legal teams who are navigating the complexities of AI and privacy/data protection. We will demystify critical concepts like “Zero Data Retention (ZDR)" and "training on your data," providing the clarity you need to evaluate legal AI tools with confidence.

[Q&A Section]

What does "training on your data" actually mean?

"Training on your data" is the process where an AI provider uses the information you upload, such as prompts, contracts, emails, or internal documents, to improve its underlying AI models. This could be by the AI provider or by the underlying large language models (LLMs). While this is one way to make the AI outputs more relevant over time, it means your confidential information is being absorbed and generalized into a system and potentially used by other customers. For legal teams, this raises concerns about confidentiality and privilege.

When lawyers hear that an AI system is “training on data”, it is natural to picture documents being stored, remembered or later reproduced. In reality, however, training refers to updating a model’s internal parameters (often called weights) so that it behaves differently in the future. When you input text into an LLM, that information is used to generate a response in the moment, but it does not automatically become part of the model’s long-term knowledge. Even in systems that do incorporate user data into training., the goal is not to store or replay documents verbatim, but to learn broad statistical patterns across very large datasets. There is a recognized phenomenon known as memorization, where highly distinctive or repeatedly seen text can sometimes be recalled but this is uncommon and not how these systems are designed to operate. Importantly, a model producing a plausible-sounding answer about a topic is not evidence that it has seen proprietary information; it reflects the model’s ability to predict likely language.

At Wordsmith, we remove this risk entirely by ensuring that customer data is never used to train any underlying models or included in any pre-training corpus. Your data is used solely to serve you, not to improve models for anyone else.

Wordsmith’s approach: We make a clear and simple promise: We do not train on your data. Instead, our team of legal and software engineers build tools and bake legal intelligence into our product. Your information is yours alone and is never used to train our models for other customers. Even within your organisation, we have built a user first environment meaning that you will only share playbooks and templates, for example, if you choose to do so.

What is the difference between "Zero Training" and "Zero Retention"?

This is a critical distinction that is often misunderstood.

  • Zero Training: This is a promise that an AI provider will not use your data to train their models (or underlying models). This is the most important security principle for protecting your confidentiality.

  • Zero Retention: This term is often used to refer to data being immediately discarded after processing. However, for a platform to be useful, it must retain some information to function. For example, it needs to store your documents in a repository, for example, so you can access them later.

Many vendors use these terms interchangeably, but it is essential to understand the nuance. The most important question to ask in addition to "Do you retain my data?" is "Do you train your models on my data?"

How does Wordsmith handle my data?

Wordsmith is designed to give you the full power of AI without compromising your privacy. Here is how we handle your information:

  1. We Do Not Train on Your Data: Your data is never used to improve our AI models for other users. The underlying LLMs will also not train on your data.

  2. Data is Retained for Your Use: When you upload a document or link your Google Drive, we store that information securely so that you can access and use it within your Wordsmith account. This data is used for your enablement and is not accessed by Wordsmith for any other purpose.

  3. You Are in Control: You can delete your data from Wordsmith at any time. Our systems are designed to give you full control over your information. When you delete that data, it is deleted from our systems.

  4. Secure and Compliant: All your data is encrypted in transit and at rest and is stored on secure AWS servers in the EU, ensuring compliance with GDPR. We are also SOC 2 Type 2 certified, demonstrating our commitment to enterprise-grade security.

Why a ‘No-Training’ Policy is Non-Negotiable for In-House Legal Teams

For corporate legal departments, the sensitivity of the data they handle cannot be overstated. It includes unannounced financial results, M&A strategies, litigation tactics, advice and trade secrets. Entrusting this information to an AI platform that doesn’t guarantee your data will not be used for training is not just a security risk; it is a professional and ethical minefield.

1. Upholding Privilege

The cornerstone of the legal profession is the sanctity of privilege. Disclosing privileged communications to a third party for purposes other than providing legal services can waive that privilege. When an AI platform uses your data to train its models, it is acting as a third party using your data for its own benefit, creating a credible risk of waiver. A strict “no-training” policy ensures the AI platform acts as a service provider working on your behalf, not a third-party beneficiary of your data.

2. Adhering to Professional Ethics and Confidentiality

Legal professionals are bound by strict ethical and regulatory duties of confidentiality. These duties, enforced by the SRA and bar associations, require lawyers to take reasonable steps to prevent the inadvertent or unauthorized disclosure of client information. Using an AI contract review software that trains on your data could be seen as a failure to take such reasonable steps.

3. Protecting Competitive Intelligence and Strategic Advantage

In-house counsel are custodians of their company’s most valuable strategic information. From negotiation playbooks to patent filings, the data handled by legal teams is a treasure trove for competitors. An AI platform that trains on customer data, even in aggregated or anonymized form, could inadvertently leak strategic insights. A true “no-training” model is the only way to guarantee that your company’s confidential strategies remain yours alone.

4. Ensuring Regulatory Compliance Across Industries

Modern businesses operate under a patchwork of data protection regulations. While GDPR is the most well-known in Europe, there are state-level privacy laws throughout the United States and data protection laws in most jurisdictions. In addition, in-house teams at publicly-listed companies, financial services, construction, pharmaceutical and energy companies, for example, must navigate a host of industry-specific rules.

  • Publicly-listed Companies: Regulations mandate strict controls over the processing and storage of material non-public information (MNPI).

  • Financial Services: conduct, prudential, AML and sanctions regimes that impose strict controls over customer data, transaction monitoring, risk management and disclosures.

  • Healthcare & Pharma: HIPAA and other regulations impose stringent requirements on the handling of Protected Health Information (PHI).

  • Energy: Critical infrastructure data and environmental compliance information require robust security measures.

A “no-training” policy simplifies compliance by ensuring your sensitive data is not being used for purposes other than to advise the business.

How to Evaluate a Legal AI Vendor’s Data Policy: A Checklist for In-House Counsel

Not all AI vendors have the same approach to privacy and data protection. As an in-house professional, it’s helpful to look beyond marketing slogans and understand the underlying policies. Here is a checklist to guide your vendor due diligence:

  1. Scrutinize the Vendor’s Data Policy

  • Does the vendor provide a clear, public commitment that they will not train on your data?

  • Are there any exceptions or caveats? Look for vague language like “except where needed to comply with law or combat misuse,” which could create loopholes.

  1. Investigate the Technical Implementation

  • Is “no-training” the default setting, or is it an optional feature that requires configuration?

  • Where is the data processed and is it encrypted both in transit and at rest? For EU-based companies, ensuring EU data residency is critical.

  1. Verify Compliance and Audit Capabilities

  • Does the vendor have independent, third-party security certifications like SOC 2 Type 2?

  • Are they compliant with relevant regulations like GDPR or CPRA?

  1. Understand the Vendor’s Philosophy

  • Is the platform AI-native and built with security as a core principle, or is AI layered on top of an older tech stack?

  • Does the vendor have in-house legal and security experts who understand the unique challenges of legal data?

A New Risk Vector: Your Outside Counsel’s AI Tools

While you can control the tools your in-house team adopts, you also need to be aware of the tools your outside counsel may be using. Many law firms are adopting AI tools, which are designed for legal research and analysis. However, these tools may not have the same “no-training” guarantees that a purpose-built in-house platform like Wordsmith.ai provides.

This creates a new risk vector for in-house teams. You may have a strict policy against training on your data, but your law firm could be inadvertently exposing your confidential information by using a less secure AI tool.

Questions to Ask Your Outside Counsel:

  • “What AI tools is your firm using on our matters?”

  • “Does your firm have a policy against training AI models on client data?”

  • “What contractual guarantees do you have from your AI vendors that our data will not be used for training?”

  • “How do you ensure that our confidential information is not being exposed to other clients through your use of AI?”

By asking these questions, you can ensure that your data is protected, no matter who is working on it. It also reinforces the value of using a purpose-built in-house platform where you have direct control over data policies.

The Wordsmith Approach: Comprehensive Security for the Modern In-House Team

At Wordsmith, we believe that in-house teams should not have to choose between innovation and security. Our platform was built from the ground up with a deep understanding of the legal profession’s demands. Our approach to privacy and data protection is a core part of our commitment to your security.

Our Commitment:

  • A Clear ‘No-Training’ Promise: We make a clear promise to our customers that your data is never stored or used to train our platform or the underlying LLMs.

  • AI-Native Architecture: Wordsmith is an AI-native platform, not a legacy CLM with an AI layer bolted on. This allows us to enforce security at every level of the stack. We utilize a multi-provider AI model, switching between leading models from Anthropic, OpenAI, and Google, for example, to ensure reliability and avoid vendor lock-in, all while enforcing our strict “no-training” policy.

  • Comprehensive Compliance: We are SOC 2 Type 2 certified and GDPR compliant, with all data hosted within the European Union by default on AWS infrastructure (or in the US if you would prefer). Our platform provides detailed audit logs to meet your compliance needs.

  • Expert-Led Design: Our team includes legal and security experts who have engineered our platform to meet the specific workflows and security needs of in-house teams.

Conclusion: Empowering Legal Teams to Innovate Securely

Zero Data Retention (ZDR) is more than a security feature; it is a business enabler. By eliminating the risk of data leakage and waiving privilege, ZDR allows in-house legal teams to fully leverage the power of legal AI tools. It transforms legal from a cost center wary of new technology into a strategic partner that drives business forward.

When evaluating in-house counsel software and other legal AI solutions, consider making Zero Data Retention a key part of your evaluation. Understanding a vendor’s approach to data handling is crucial for unlocking the transformative potential of AI while upholding your duty as a guardian of your company’s most sensitive information.

Frequently Asked Questions (FAQ)

What is the difference between “Zero Training” and “Zero Retention”?
“Zero Training” means your data is not used to train AI models. “Zero Retention” is often used to imply that data is immediately discarded, but for a platform to be useful, it must retain data for your use. The most important promise is “Zero Training.”

How does Wordsmith protect privilege?
By guaranteeing that we will not train our models on your data, we ensure that your confidential legal information is not being used for any purpose other than your direct instruction. This helps maintain the integrity of privileged communications.

Can I delete my data from Wordsmith?
Yes. You are in full control of your data and can delete it from our platform at any time.

Where is my data stored?
Secure and Compliant: All your data is encrypted in transit and at rest and is stored on secure AWS servers in the EU. We are also SOC 2 Type 2 certified, demonstrating our commitment to enterprise-grade security.

These measures form part of our wider privacy and data protection program and our Privacy Charter, which is our commitment to protect the data we process and to comply with privacy and data protections laws.

Is Wordsmith SOC 2 certified?
Yes, we are SOC 2 Type 2 certified, which is an independent verification of our enterprise-grade security controls.

FEATURE SECTION

Explore

Copyright © 2026 Wordsmith AI. All rights reserved. WORDSMITH is a registered trade mark of Wordsmith Law LLP and is used under licence.

Copyright © 2026 Wordsmith AI. All rights reserved. WORDSMITH is a registered trade mark of Wordsmith Law LLP and is used under licence.