Skip to content
Home SAST Tools Betterleaks
BE

Betterleaks

NEW
Category: SAST
License: Free (Open-Source, MIT)
Suphi Cankurt
Suphi Cankurt
AppSec Enthusiast
Updated March 19, 2026
7 min read
Key Takeaways
  • Built by Zachary Rice (zricethezav), the original creator of Gitleaks (25k+ stars) — now Head of Secrets Scanning at Aikido Security.
  • Token Efficiency Filter uses BPE tokenization (cl100k_base) instead of Shannon entropy, achieving 98.6% recall on CredData vs 70.4% with entropy.
  • CEL-based secrets validation fires HTTP requests against detected credentials to confirm whether they are actually live and active.
  • Drop-in replacement for Gitleaks — backwards-compatible configuration files and CLI flags.
  • Supports parallelized git scanning, recursive decoding (base64, hex, percent-encoding, unicode), and archive scanning (zip, tar).

Betterleaks is an open-source secrets scanner that detects and validates hardcoded credentials in git repositories, directories, and archives using BPE tokenization instead of traditional entropy-based filtering. Built by Zachary Rice (zricethezav) — the same developer who created Gitleaks (25,000+ GitHub stars) — Betterleaks achieves 98.6% recall on the CredData benchmark, compared to 70.4% with entropy-based detection. Rice is currently Head of Secrets Scanning at Aikido Security.

Betterleaks CLI scan output showing detected Slack webhook and Stripe API key secrets with redacted values

The project was created on February 3, 2026 and reached v1.1.1 by March 17, 2026. It is written in Go, licensed under MIT, and designed as a drop-in replacement for Gitleaks with backwards-compatible configuration files and CLI flags.

Overview

Betterleaks targets the same problem as Gitleaks — finding hardcoded secrets in git repositories — but improves detection accuracy, secrets validation, and scanning performance.

The main difference between Betterleaks and Gitleaks is the detection engine. Where Gitleaks relies on Shannon entropy to distinguish random strings from real secrets, Betterleaks uses BPE tokenization with the cl100k_base model (the same tokenizer used by GPT-4). On the CredData benchmark dataset, Betterleaks achieves 98.6% recall compared to Gitleaks’ 70.4% with entropy-based filtering. On large codebases, that gap translates to significantly fewer missed secrets.

Betterleaks also adds CEL-based secrets validation. When Betterleaks finds a potential credential, it can fire an HTTP request to the target service and check whether the credential is actually live. A finding goes from “possible leak” to “confirmed active secret,” which changes how teams prioritize remediation.

Because it is backwards-compatible with Gitleaks configuration files and CLI flags, migrating is straightforward. Existing .gitleaks.toml files work without modification.

Betterleaks scan time comparison benchmark showing 4-5x faster scanning than Gitleaks on Rails, Ruby, and GitLab repositories

The benchmark above (from the Betterleaks repository) compares scan times on three real-world repositories. With RE2 and 8 git workers enabled, Betterleaks scans the Rails repo in 5.8s vs Gitleaks’ 24.5s (4.2x faster), the Ruby repo in 10.3s vs 55.2s (5.4x faster), and the GitLab repo in 2m13s vs 11m28s (5.2x faster).

Token Efficiency Filter
Uses BPE tokenization (cl100k_base) instead of Shannon entropy for secret detection. Achieves 98.6% recall on CredData, compared to 70.4% with entropy-based filtering.
CEL Secrets Validation
Fires HTTP requests against detected credentials using CEL expressions to verify whether leaked secrets are still active and exploitable.
Parallelized Git Scanning
Distributes git history scanning across multiple workers via –git-workers flag, reducing scan times on large repositories.

Key Features

FeatureDetails
CLI commandsgit (scan repos), dir (scan directories), stdin (pipe input)
ConfigurationTOML format (.betterleaks.toml or .gitleaks.toml), backwards-compatible with Gitleaks
Detection engineBPE tokenization (cl100k_base) + regex rules; 98.6% recall on CredData
Secrets validationCEL expressions fire HTTP requests to verify if leaked credentials are still active
Output formatsJSON, CSV, JUnit, SARIF, custom Go templates
InstallationHomebrew, Docker, DNF (Fedora), from source
Regex enginesGo stdlib or RE2 (switchable); RE2 guarantees linear-time matching
Recursive decodingbase64, hex, percent-encoding, unicode escapes; configurable depth (default 5)
Archive supportzip, tar, and nested archives via --max-archive-depth
Git scanningParallelized via --git-workers; scans GitLab repo 5.2x faster than Gitleaks
Composite rulesMulti-part patterns with proximity matching to reduce false positives
Redaction--redact flag with configurable percentage (0-100%) for logs and stdout
Baseline support--baseline-path to ignore known findings and track only new secrets
LanguagePure Go (no CGO) — deploys anywhere without native library dependencies
LicenseMIT (no commercial restrictions)

Token Efficiency Filter

Traditional entropy-based detection measures the randomness of a string to decide whether it might be a secret. The problem: many real secrets do not have high enough entropy to pass the threshold, and many non-secrets (like UUIDs or hashes) have high entropy but are not credentials.

Betterleaks replaces entropy with BPE tokenization. It uses the cl100k_base tokenizer — the same model used by GPT-4 — to evaluate how efficiently a string compresses into tokens. Real secrets tend to tokenize inefficiently because they are genuinely random, while structured strings (variable names, UUIDs, file paths) tokenize efficiently.

On the CredData benchmark, Betterleaks’ Token Efficiency Filter produces 98.6% recall versus 70.4% with Shannon entropy. In practice, I found this means fewer missed secrets without a proportional increase in false positives.

Secrets validation via CEL

Detecting a secret is useful, but knowing whether it still works is what drives remediation urgency.

Betterleaks uses CEL (Common Expression Language) expressions to define validation logic per rule. When a rule matches, the CEL expression can fire an HTTP request to the target API and check the response. If the credential returns a valid response, the finding is marked as confirmed-active.

This is similar to what TruffleHog, an open-source secrets detection tool, does with its built-in verifiers. The main difference between Betterleaks and TruffleHog validation is configurability: Betterleaks makes the validation logic user-configurable via CEL expressions, while TruffleHog’s verifiers are hardcoded per detector.

Composite and multi-part rules

Like Gitleaks, Betterleaks supports composite rules: a primary pattern combined with auxiliary patterns that must appear within a specified proximity. This reduces false positives for patterns that only matter when they appear near related identifiers (e.g., an API key near a specific service name).

Recursive decoding

Secrets are not always stored in plaintext. Betterleaks recursively decodes base64, hex, percent-encoding, and unicode escape sequences before applying detection rules. This catches secrets that developers have obfuscated or that build tools have encoded along the way.

Archive scanning

Betterleaks scans inside compressed archives (zip, tar, etc.), so secrets buried in vendored dependencies or bundled artifacts do not slip through.

Regex engine switching

You can switch between Go’s standard library regex engine and RE2. RE2 provides guaranteed linear-time matching, which matters when scanning large files with complex patterns.

Use Cases

Best for
Teams already using Gitleaks that want better detection accuracy and live secrets validation without changing their workflow. Also a strong choice for greenfield secret scanning setups that need verified findings out of the box.

CI/CD pipeline scanning. Run Betterleaks in your CI pipeline to block pull requests that introduce secrets. The --git-workers flag keeps scan times manageable even on large repositories. SARIF output feeds directly into GitHub Advanced Security.

Pre-commit hook. Install Betterleaks as a pre-commit hook to catch secrets before they reach version control. Same workflow as Gitleaks — existing pre-commit configurations work with minimal changes.

Incident response. When you discover a leaked credential, use CEL-based validation to quickly determine whether the secret is still active. This tells you whether rotation is urgent or can be scheduled.

Legacy codebase audits. Recursive decoding and archive scanning help surface secrets that are base64-encoded, hex-encoded, or buried inside zip files — common patterns in older codebases.

Getting Started

Betterleaks CLI help output showing available commands and key flags

1
Install — Run brew install betterleaks on macOS, or pull the Docker image with docker pull ghcr.io/betterleaks/betterleaks:latest. On Fedora, use dnf install betterleaks. You can also build from source with Go.
2
Scan a repository — Run betterleaks git /path/to/repo to scan git history for secrets. Use betterleaks dir /path/to/dir for non-git directories. Add --git-workers 4 for parallelized scanning and -v for verbose output.
3
Migrate from Gitleaks — Drop your existing .gitleaks.toml into the repository root. Betterleaks reads it natively. CLI flags are backwards-compatible — swap gitleaks for betterleaks in your scripts.
4
Review findings — Use --report-path results.json --report-format json to save findings. Validated secrets are marked as confirmed-active. Upload SARIF output to GitHub Advanced Security with --report-format sarif.

Strengths & Limitations

Strengths:

  • BPE tokenization is a measurably better approach to secret detection than Shannon entropy (98.6% vs 70.4% recall on CredData).
  • CEL-based validation is user-configurable, unlike hardcoded verification in other tools.
  • Drop-in Gitleaks replacement — no migration friction.
  • Parallelized git scanning reduces wall-clock time on large repos.
  • Recursive decoding catches encoded and obfuscated secrets.
  • MIT license with no commercial restrictions.

Limitations:

  • Very new project (created February 2026). Betterleaks’ rule library is smaller than mature SAST tools like Gitleaks or TruffleHog.
  • 468 GitHub stars — small community compared to Gitleaks (25k+) or TruffleHog (25k+). Ecosystem integrations (GitHub Actions, pre-commit hooks) are still catching up.
  • No managed cloud platform — this is a CLI tool. Teams wanting dashboards, team management, or hosted scanning should look at GitGuardian or TruffleHog’s commercial offering.
  • CEL validation requires writing expressions per rule. Out-of-the-box coverage for common services is still growing.

Comparison with alternatives

GitHub star history comparing secrets scanners — Gitleaks, TruffleHog, Betterleaks, Nosey Parker, and detect-secrets over time

FeatureBetterleaksGitleaksTruffleHogGitGuardian
Detection methodBPE tokenization + regexEntropy + regex800+ detectorsPattern matching + ML
Secrets validationCEL expressions (configurable)NoBuilt-in verifiers (hardcoded)Yes (commercial)
LicenseMITMITAGPL-3.0Freemium
Scan targetsGit, directories, stdin, archivesGit, directories, stdinGit, Slack, S3, Docker, etc.Git, CI/CD (commercial)
Parallelized scanningYes (–git-workers)NoYesYes
Recursive decodingYes (base64, hex, etc.)Yes (v8.26+)LimitedYes
GitHub Stars46825,50025,100N/A

Betterleaks is better for teams that prioritize detection accuracy and configurable validation logic, especially those already using Gitleaks who want a zero-friction upgrade. TruffleHog is better for organizations that need scanning beyond git repositories — including Slack, S3, and Docker images — thanks to its broader scan target support. GitGuardian, a commercial secrets detection platform, is better for enterprises that need dashboards, team management, and hosted scanning out of the box.

Frequently Asked Questions

What is Betterleaks?
Betterleaks is an open-source secrets scanner created by Zachary Rice, the original author of Gitleaks. It is a drop-in replacement for Gitleaks that adds BPE-based token efficiency filtering, CEL-based secrets validation, parallelized git scanning, and recursive decoding. Licensed under MIT.
How does Betterleaks compare to Gitleaks?
Betterleaks is backwards-compatible with Gitleaks configurations and CLI flags, so migration is straightforward. Key improvements include token efficiency filtering (BPE tokenization instead of Shannon entropy for better recall), live secrets validation via CEL expressions, parallelized git scanning, and recursive decoding for base64, hex, and other encodings.
What is the Token Efficiency Filter in Betterleaks?
The Token Efficiency Filter replaces Shannon entropy with BPE tokenization using the cl100k_base model (the same tokenizer used by GPT-4). On the CredData benchmark, this achieves 98.6% recall compared to 70.4% with traditional entropy-based filtering, significantly reducing missed secrets.
Can Betterleaks validate if leaked secrets are still active?
Yes. Betterleaks uses CEL (Common Expression Language) expressions to fire HTTP requests against detected credentials. If a credential returns a valid response, Betterleaks marks the finding as confirmed-active. This helps teams prioritize remediation by focusing on secrets that are actually exploitable.
Is Betterleaks free?
Yes. Betterleaks is completely free and open-source under the MIT license. You can use it in personal projects, commercial codebases, and CI/CD pipelines without licensing fees.