What to Ask Before Giving an AI Tool Access to Your Code
“There’s hesitancy around giving LLM access to our entire repo.”
A founder said this to me recently during a product demo, and he’s not alone. Every week I talk to engineering leaders and CTOs who are evaluating AI-powered tools that need code access. They’re excited about the potential, but worried about the security implications.
Those concerns are valid. Your codebase contains business logic, security implementations, proprietary algorithms, and sometimes secrets that shouldn’t leave your infrastructure. Giving an external service read access is a real decision that deserves real scrutiny.
Here’s what I’d ask before granting that access to any vendor.
Question 1: What Exact Access Do You Need?
Not all code access is equal. Some tools need:
- Read access to repository metadata (commits, PRs, branches)
- Read access to diffs and patches
- Read access to full file contents
- Write access (comments, labels, checks)
Understand the least access needed. If a tool only needs to analyze diffs, it shouldn’t require access to your entire file tree. If it only needs commit messages, it shouldn’t need to read source code.
Ask: Can you function with a subset of repositories? Can I control which repos you access? What happens if I revoke access to certain repos?
The answer should be granular control, not all-or-nothing.
Question 2: What Gets Sent to LLM Providers?
This is the big one. The tool has access to your code, but what actually gets transmitted to external AI providers?
Some tools send complete file contents to LLMs. Others send snippets. Some send obfuscated or anonymized fragments. Some process everything locally and only send summaries.
The right answer depends on your risk tolerance, but you should know exactly what the data flow looks like.
Questions to ask:
- Do you send raw source code to LLM APIs?
- How do you chunk or fragment code before transmission?
- Do you obfuscate variable names, function names, or identifiable patterns?
- Is there any pre-processing to remove secrets or sensitive patterns?
- What’s the largest context window you send in a single API call?
A good vendor should be able to explain their data flow in detail, not just say “we take security seriously.”
Question 3: Which LLM Providers Do You Use?
Different AI providers have different data policies. OpenAI’s API terms are different from Claude’s, which are different from open-source models you might self-host.
Key questions:
- Which LLM providers power your tool?
- What are their data retention policies?
- Do they use customer data for model training? (Most enterprise APIs do not, but verify.)
- Can you use a specific provider if our security policy requires it?
- Do you support self-hosted or on-premise LLM options?
For enterprise customers with strict data residency requirements, the ability to use specific providers or self-hosted models can be a dealbreaker.
Question 4: What Compliance Certifications Do You Have?
Compliance frameworks exist to systematize these questions. SOC 2 Type II is the baseline for most B2B SaaS. It means an independent auditor has verified that security controls are in place and operating effectively over time.
Beyond SOC 2, depending on your industry:
- HIPAA for healthcare data
- GDPR compliance for EU data
- ISO 27001 for international standards
- FedRAMP for government work
Ask for the actual reports, not just badges on a website. A SOC 2 Type II report is a real document you can review. Ask for it. Read at least the summary of controls and any exceptions.
Question 5: How Do You Handle Secrets?
Every codebase has secrets: API keys, database credentials, tokens, internal URLs. Some are properly managed in secret stores. Many are accidentally committed somewhere in git history.
A responsible AI tool should:
- Not require access to secret management systems
- Have scanning to identify and exclude potential secrets before LLM transmission
- Document what happens if a secret is accidentally processed
Ask specifically: If someone commits an API key to our repo, what stops it from reaching an LLM API?
Question 6: What’s Your Breach Response Plan?
No system is perfectly secure. What matters is what happens when something goes wrong.
Questions to ask:
- How would you detect if someone accessed our code inappropriately?
- What’s your incident notification timeline?
- Do you have cyber insurance?
- Can you provide references from customers who’ve been through your security review process?
The willingness to discuss breach scenarios honestly is itself a signal. Vendors who wave away security questions are often the ones you should worry about most.
Question 7: Can We Audit Access?
Trust but verify. You should be able to see what the tool is actually doing with your code.
Ask about:
- Access logs showing which repositories and files the tool touched
- Audit trails for any data transmitted to external services
- Ability to export or review what data the tool processed
- Retention policies for logs and processed data
If you can’t audit it, you can’t verify the answers to any other question.
Question 8: What Happens When We Offboard?
The relationship won’t last forever. When you stop using the tool:
- Is all processed data deleted?
- What’s the timeline for deletion?
- Can you get a certificate of data destruction?
- What persists in their systems after you leave?
Document this before you start, not when you’re trying to leave.
A Framework for Evaluation
Different teams have different risk tolerances. A two-person startup building a consumer app has different constraints than a financial services company with regulatory requirements.
Here’s a simple framework:
Low sensitivity (public repos, open source): Basic SOC 2, clear data policies, reputable LLM providers. Move fast.
Medium sensitivity (private repos, proprietary but not regulated): SOC 2 Type II verified, detailed data flow documentation, obfuscation of code before LLM transmission, audit logging.
High sensitivity (regulated industries, security-critical systems): Everything above plus on-premise options, specific LLM provider requirements, legal review of contracts, reference calls with similar customers.
The Trust Decision
Ultimately, giving an AI tool access to your code is a trust decision. You’re trusting that the vendor will handle your data responsibly, that their security controls work, and that the benefit is worth the risk.
Vendors earn that trust through transparency, not marketing claims.
The vendors who are doing security well will have good answers to these questions. They’ll have the documentation ready. They’ll welcome the scrutiny.
The ones who don’t? You probably don’t want them reading your code anyway.
If you have security questions about how Changebot handles code access, let’s chat. We’re SOC 2 Type II certified and happy to walk through our security architecture in detail.