LLMs in CI/CD Security

8 min read
SecurityCI/CDAIDevSecOpsLLM

AI coding assistants are now part of normal software delivery, but the security model around CI/CD has not changed: every artifact deployed to production should be reviewed, built in a controlled environment, scanned, signed, and traceable. The new risk is not that an LLM is magically malicious. The risk is that AI-assisted changes can be accepted faster than teams can reason about their impact. A pull request generated with help from an LLM may include a vulnerable dependency, an over-permissive IAM policy, a risky GitHub Actions permission, a copied secret, or a Kubernetes manifest that weakens workload isolation. Treat that code as untrusted input until the pipeline proves otherwise.

This article focuses on a practical security workflow for CI/CD teams, not on generic AI fear. The goal is to keep developer speed while adding specific controls around AI-assisted code. If your platform also runs Kubernetes, pair this guidance with the Secpros guide to <a href="/blog/kubernetes-secrets-management-beyond-base64/">Kubernetes secrets management</a> and the article on <a href="/blog/kubernetes-gitops-admission-provenance/">GitOps admission and provenance</a>.

## Start with a threat model for AI-assisted changes

The useful question is not, "Did an LLM write this?" The useful question is, "What trust boundary did this change cross?" A comment-only change has a different risk profile from a Terraform change that grants cloud permissions, a workflow file that receives secrets, or a Dockerfile that downloads binaries during build. CI/CD security should classify pull requests by impact and then apply checks based on that impact.

For AI-assisted code, include at least four threat categories. First, insecure generated patterns: SQL queries without parameterization, weak crypto choices, shell injection, unsafe deserialization, or broad CORS configuration. Second, supply-chain risk: new packages, changed lockfiles, container base image changes, curl-piped installers, or unpinned GitHub Actions. Third, pipeline abuse: expanded workflow permissions, pull_request_target misuse, secret exposure to forked PRs, or new deployment jobs. Fourth, infrastructure drift: overly broad IAM roles, public storage buckets, privileged Kubernetes pods, or admission bypass annotations.

OWASP's LLM Top 10 is useful here because it frames prompt injection, insecure output handling, supply-chain concerns, and excessive agency as engineering risks rather than abstract AI concerns. In CI/CD terms, insecure output handling means accepting generated code without validation. Excessive agency means allowing generated scripts or agents to run with credentials that can deploy, delete, or exfiltrate without human review.

## Pipeline controls that actually reduce risk

A strong AI-aware pipeline is layered. No single scanner understands the full context, so combine fast checks on every pull request with deeper checks before deployment. At minimum, run secret detection, dependency and container scanning, infrastructure-as-code policy checks, static analysis for application code, and provenance verification for released artifacts.

Secret detection should run before code lands in the default branch. Tools such as Gitleaks can scan commits and pull requests for common credential formats. This does not replace proper secret handling, but it catches accidental leaks early. In GitHub Actions, make sure secrets are not available to untrusted fork workflows, avoid printing environment variables, and prefer short-lived cloud credentials through OIDC instead of long-lived static keys.

Dependency and container scanning should focus on the exact artifact being built. Trivy can scan dependency manifests, container images, filesystem contents, and infrastructure definitions. The important detail is timing: scan both the pull request diff and the final built image, because the final image may include operating system packages and transitive dependencies not obvious in the source diff. Treat critical vulnerabilities differently depending on exploitability and exposure, but require an explicit waiver when shipping with known high-impact findings.

Policy-as-code should cover the parts AI tools often touch incorrectly: Terraform, Kubernetes YAML, Helm values, and CI workflow permissions. Checkov, Conftest with Open Policy Agent, or cloud-native policy engines can block known-bad patterns such as 0.0.0.0/0 admin ingress, wildcard IAM actions, privileged pods, hostPath mounts, mutable image tags in production, and GitHub Actions workflows with unnecessary write-all permissions. A policy check is most valuable when the failure message tells the developer exactly how to fix the issue.

## Practical workflow: review an AI-assisted pipeline change

Use this workflow when a pull request modifies application code, dependency manifests, Dockerfiles, Terraform, Kubernetes manifests, Helm charts, or CI files and the author says they used an AI assistant. You can also apply it to unknown-origin changes, because the controls are source-agnostic.

### Step 1: Label the change by blast radius

Classify the pull request before reading every line. Low risk: documentation, tests, or isolated refactors without dependency or deployment changes. Medium risk: application code paths, package updates, Dockerfile changes, or Helm values. High risk: CI workflow changes, cloud IAM, secrets handling, admission controls, network exposure, or production deployment logic. High-risk changes require human approval from someone who owns the affected system, even if all automated checks pass.

### Step 2: Inspect the diff for generated-risk patterns

Look for patterns LLMs commonly produce because they are convenient examples: catch-all exception handling, disabled TLS verification, hard-coded tokens, overly broad regex allowlists, permissive CORS, temporary admin credentials, shell commands assembled from user input, and infrastructure examples copied with placeholder defaults. In CI files, check permissions blocks, event triggers, artifact upload/download steps, and whether secrets can reach jobs triggered by untrusted input.

### Step 3: Run automated checks on the pull request

A baseline GitHub Actions job can run gitleaks for secrets, trivy fs for dependencies and IaC, trivy image after the container build, and a policy tool such as conftest or checkov for Terraform and Kubernetes manifests. Fail fast on leaked credentials, critical policy violations, and workflow permission escalation. Report lower-severity findings as annotations so developers see the debt without blocking unrelated fixes.

### Step 4: Verify provenance before deploy

After merge, build artifacts in CI rather than on a developer laptop. Generate an SBOM, sign the container image with Sigstore Cosign, and attach provenance that identifies the repository, commit, workflow, and builder. SLSA gives a vocabulary for this: the higher your required assurance level, the more you should care about a hardened build service, non-forgeable provenance, and isolation between build steps. Admission control in production should verify signatures and reject unsigned or incorrectly signed images.

### Step 5: Record exceptions like production risk

AI-assisted delivery creates pressure to accept automated suggestions quickly, so make waivers explicit. If a vulnerability, policy violation, or unsigned artifact is allowed through, require a ticket, owner, expiry date, and reason. Expiring exceptions prevent "temporary" bypasses from becoming permanent gaps. This is especially important for generated infrastructure code, where a broad permission may remain unnoticed until an incident.

## Example GitHub Actions security gate

The example below is intentionally small. It shows the shape of a pull-request gate, not a complete enterprise pipeline. The important ideas are least-privilege workflow permissions, secret scanning, filesystem/IaC scanning, and separate policy checks for infrastructure code.

```yaml name: ai-assisted-security-gate on: pull_request: branches: [main] permissions: contents: read security-events: write jobs: security: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - name: Detect committed secrets uses: gitleaks/gitleaks-action@v2 - name: Scan filesystem and IaC uses: aquasecurity/trivy-action@master with: scan-type: fs scan-ref: . format: sarif output: trivy-results.sarif severity: HIGH,CRITICAL exit-code: '1' - name: Upload Trivy SARIF uses: github/codeql-action/upload-sarif@v3 if: always() with: sarif_file: trivy-results.sarif - name: Check Terraform and Kubernetes policy run: | checkov -d . --quiet ```

For a production version, pin action versions to immutable SHAs, add language-specific SAST, build and scan the final image, sign the image, and enforce an admission rule that accepts only signed images from your trusted CI identity. OpenSSF Scorecard can also help identify repository-level supply-chain weaknesses such as dangerous workflows, unpinned dependencies, or missing branch protection.

## What not to claim about LLM security

Avoid marketing claims such as "AI-generated code is always less secure" or "AI will replace security review." The evidence is more nuanced and changes quickly. The defensible position is operational: AI-assisted code should go through the same or stronger controls as human-written code because it can introduce the same classes of defects at higher speed. Security teams should measure concrete outcomes: number of secret leaks blocked, policy violations prevented, unsigned artifacts rejected, time to remediate vulnerable dependencies, and percentage of production images with verified provenance.

## Sources

External references used for this article: <a href="https://owasp.org/www-project-top-10-for-large-language-model-applications/">OWASP Top 10 for LLM Applications</a>, <a href="https://slsa.dev/">SLSA</a>, <a href="https://github.com/ossf/scorecard">OpenSSF Scorecard</a>, <a href="https://docs.sigstore.dev/cosign/overview/">Sigstore Cosign</a>, <a href="https://aquasecurity.github.io/trivy/">Trivy</a>, and <a href="https://gitleaks.io/">Gitleaks</a>.

/ author

Pawel Bedynski

DevOps Engineer & Kubernetes Consultant. Building cloud-native infrastructure on GCP since 2019. 80+ production clusters deployed.

LinkedIn