Introduction Link to heading
In software development, it’s not uncommon to accidentally commit sensitive files to a Git repository. This could be configuration files containing API keys, environment variables, or other sensitive information. Once these files are pushed to a remote repository, they become part of the Git history, which can pose security risks.
In this post, I’ll walk you through the process of completely removing a sensitive file from Git history.
The Problem Link to heading
During the development of a private project, a .env.local
file containing sensitive configuration was accidentally committed to the repository. This file needed to be removed not just from the current state of the repository, but from the entire Git history.
The Solution Link to heading
Step 1: Remove the File from Current Working Directory Link to heading
First, we need to remove the file from Git’s tracking and the working directory:
git rm --cached .env.local
rm .env.local
Step 2: Remove from Git History Link to heading
I used git-filter-repo
, a more modern and safer alternative to git filter-branch
, to remove the file from the entire Git history:
git filter-repo --invert-paths --path .env.local --force
This command:
- Scans through all commits in the repository
- Removes any occurrence of the
.env.local
file - Rewrites the Git history to exclude these files
- Maintains the integrity of the remaining history
Step 3: Prevent Future Accidents Link to heading
To prevent similar accidents in the future, we added the file to .gitignore
:
echo ".env.local" >> .gitignore
Step 4: Push Changes to Remote Link to heading
Since we’ve rewritten the Git history, we need to force push the changes to the remote repository:
git remote add origin git@github.com:user/upstream.git
git push origin --force --all
Important Notes Link to heading
Force Push Warning: Force pushing rewrites the remote history, which can cause issues for other developers. They’ll need to:
git fetch origin git reset --hard origin/main
Backup First: Always make a backup of your repository before performing history rewrites.
Consider Alternatives: For less sensitive information, you might consider:
- Using environment variables
- Implementing a secrets management system
- Using configuration files that are gitignored by default
Best Practices for Sensitive Data Link to heading
Prevention is Better than Cure:
- Add sensitive files to
.gitignore
before starting development - Use environment variables for sensitive data
- Consider using a secrets management service
- Add sensitive files to
Regular Audits:
- Periodically check your repository for sensitive data
- Use tools like
git-secrets
ortrufflehog
to scan for sensitive information
Team Education:
- Ensure all team members understand the importance of not committing sensitive data
- Document procedures for handling sensitive information
Conclusion Link to heading
While removing sensitive files from Git history is possible, it’s always better to prevent such situations in the first place. By implementing proper security practices and educating team members, you can avoid the need for history rewriting and maintain a secure codebase.
Remember: Once sensitive data is pushed to a public repository, even if you remove it, it might have already been exposed. Always rotate any exposed credentials or keys immediately.