AI Privacy Audit: Which Chatbots Collect Your Data and How to Opt Out

We ranked the major AI chatbots by data collection. Meta AI grabs 32 of 35 possible data types. Here's what each service collects and how to protect yourself.

Every AI chatbot you use is collecting data about you. The question is how much, what they do with it, and whether you can stop them.

Surfshark analyzed the data collection practices of the top AI chatbots, examining 35 possible data types. The results vary wildly - from Meta AI’s aggressive 32-type collection to ChatGPT’s comparatively restrained 10.

Here’s what each major service collects and how to limit your exposure.

The Rankings: Who Collects the Most

ChatbotData Types CollectedTracks UsersThird-Party Ads
Meta AI32 of 35YesYes
Google Gemini22 of 35NoNo
Microsoft Copilot~15YesNo
DeepSeek11NoNo
ChatGPT10NoNo
Perplexity~10NoNo
ClaudeMinimalNoNo

Meta AI stands alone in collecting financial information, health and fitness data, and even sensitive categories like racial/ethnic data and sexual orientation. It’s also the only major chatbot that uses your data for third-party advertising - 24 different data types can be shared with advertisers.

What Each Service Actually Collects

ChatGPT (OpenAI)

Collects: Contact info, conversation content, identifiers, usage data, diagnostics.

Training: Free tier conversations may be used to improve models unless you opt out. ChatGPT Teams, Enterprise, and API plans never use your data for training.

Retention: Chat history stored until you delete it. Deleted content removed from systems within 30 days.

Human review: Authorized staff can review conversations for model improvement, bug fixes, and safety investigations.

Encryption: TLS 1.2+ in transit, AES-256 at rest. No end-to-end encryption.

Google Gemini

Collects: Precise location, contacts, user content, search history, browsing history, identifiers, usage data.

Training: Consumer Gemini uses your content to improve services. Enterprise versions don’t.

Retention: Human-reviewed conversations kept for up to three years - even if you delete your activity.

The catch: Google explicitly warns users not to enter confidential information they wouldn’t want a human reviewer to see.

Claude (Anthropic)

Collects: Account info, conversation content, usage data.

Training: As of late 2024, Anthropic updated its policy to let users opt in to data training. The default depends on when you signed up - older accounts weren’t enrolled, newer ones may need to check settings.

Retention: Standard 30-day retention. If you opt into training, retention extends to five years.

Selling data: Explicitly does not sell user data to third parties.

Microsoft Copilot

Collects: Varies by product. Consumer Copilot tracks browsing history, app usage, document interactions, and communication patterns across Microsoft services.

Training: Microsoft 365 Copilot doesn’t use prompts or responses for foundation model training. Consumer Copilot is murkier.

The expansion: In 2026, Microsoft expanded Copilot’s data collection to include activity across Windows, Edge, Bing, and other services. This is often enabled by default.

Encryption: Enterprise-grade for business products. Consumer protections are less clear.

Meta AI

Collects: Almost everything - 32 of 35 data categories including location, financial info, health data, contacts, browsing history, and “sensitive information” like racial/ethnic data.

Training: All your public Facebook and Instagram content trains Meta AI. This includes photos, comments, and posts from anyone over 18.

Opt-out reality: U.S. users have no real opt-out option. EU users could previously object under GDPR, but Meta removed that option in May 2025.

You can’t turn it off: There’s no switch to disable Meta AI on Facebook, WhatsApp, or Instagram. It’s embedded in search bars and messaging.

Perplexity

Collects: Search queries, account details, device info, usage data.

Training: Free and Pro users have AI Data Retention enabled by default, but can opt out in settings.

Email protection: Perplexity explicitly states email service information is never used for AI training.

Enterprise: Enterprise data is never used for training.

DeepSeek

Collects: User input including chat history, device info, usage data - 11 data types total.

Where it goes: Data is stored on servers in the People’s Republic of China and retained “as long as necessary.”

The concern: Subject to Chinese data laws, which can compel disclosure to government authorities.

How to Opt Out

ChatGPT

  1. Go to Settings > Data Controls
  2. Turn off “Improve the model for everyone”
  3. Alternative: Use temporary chat mode for sensitive conversations

Google Gemini

  1. Go to Gemini Activity settings
  2. Select “Turn off” > “Turn off and delete activity”
  3. Note: Data is still retained for 72 hours “for stability”

Claude

  1. Go to Settings > Privacy
  2. Check whether “Allow use of data to improve Claude” is enabled
  3. Turn it off if you don’t want training participation

Microsoft Copilot

  1. Visit your Microsoft account privacy page
  2. Navigate to Privacy settings
  3. Turn off “Model Training on Text” and “Model Training on Voice”
  4. For Copilot’s expanded data collection: dig through Windows privacy settings

Meta AI

U.S. users: You cannot opt out. Your only option is making your Facebook account private - Meta says it won’t use private account data.

EU users: The GDPR objection form was removed in May 2025. You’re now included.

Perplexity

  1. Go to Settings
  2. Find and disable AI Data Retention
  3. Note: This doesn’t affect Enterprise accounts (which never train)

The Most Private Options

If privacy is your priority, here are your best choices:

For a commercial chatbot: Claude offers the strongest default privacy protections. It doesn’t train on your data by default for older accounts, doesn’t track users, and doesn’t show third-party ads.

For maximum privacy: Duck.ai strips your IP address and identifying metadata before sending prompts to underlying models (OpenAI, Anthropic, or Meta). You get AI capabilities without direct tracking.

For complete control: Run a local model. Ollama with Llama 3 or similar means your data never leaves your machine. Zero collection, zero training, zero third parties.

What You Can Do

  1. Check your settings now. Most services have opt-outs buried in menus. Find them.

  2. Use incognito/temporary modes. Claude, Perplexity, and Gemini all offer temporary chat options that aren’t saved or trained on.

  3. Don’t put sensitive info in chatbots. Assume human reviewers will see it. Google says this explicitly.

  4. Consider local alternatives. For sensitive work, a local LLM eliminates the privacy question entirely.

  5. Watch for policy changes. Anthropic, Meta, and others have all changed their policies in the past year. What’s opt-in today might be opt-out tomorrow.

The AI industry has made data collection the default. But you don’t have to accept it. A few minutes in settings can significantly reduce what these companies know about you.