Introduction Link to heading

In software development, it’s not uncommon to accidentally commit sensitive files to a Git repository. This could be configuration files containing API keys, environment variables, or other sensitive information. Once these files are pushed to a remote repository, they become part of the Git history, which can pose security risks.

In this post, I’ll walk you through the process of completely removing a sensitive file from Git history.

The Problem Link to heading

During the development of a private project, a .env.local file containing sensitive configuration was accidentally committed to the repository. This file needed to be removed not just from the current state of the repository, but from the entire Git history.

The Solution Link to heading

Step 1: Remove the File from Current Working Directory Link to heading

First, we need to remove the file from Git’s tracking and the working directory:

git rm --cached .env.local
rm .env.local

Step 2: Remove from Git History Link to heading

I used git-filter-repo, a more modern and safer alternative to git filter-branch, to remove the file from the entire Git history:

git filter-repo --invert-paths --path .env.local --force

This command:

  • Scans through all commits in the repository
  • Removes any occurrence of the .env.local file
  • Rewrites the Git history to exclude these files
  • Maintains the integrity of the remaining history

Step 3: Prevent Future Accidents Link to heading

To prevent similar accidents in the future, we added the file to .gitignore:

echo ".env.local" >> .gitignore

Step 4: Push Changes to Remote Link to heading

Since we’ve rewritten the Git history, we need to force push the changes to the remote repository:

git remote add origin git@github.com:user/upstream.git
git push origin --force --all

Important Notes Link to heading

  1. Force Push Warning: Force pushing rewrites the remote history, which can cause issues for other developers. They’ll need to:

    git fetch origin
    git reset --hard origin/main
    
  2. Backup First: Always make a backup of your repository before performing history rewrites.

  3. Consider Alternatives: For less sensitive information, you might consider:

    • Using environment variables
    • Implementing a secrets management system
    • Using configuration files that are gitignored by default

Best Practices for Sensitive Data Link to heading

  1. Prevention is Better than Cure:

    • Add sensitive files to .gitignore before starting development
    • Use environment variables for sensitive data
    • Consider using a secrets management service
  2. Regular Audits:

    • Periodically check your repository for sensitive data
    • Use tools like git-secrets or trufflehog to scan for sensitive information
  3. Team Education:

    • Ensure all team members understand the importance of not committing sensitive data
    • Document procedures for handling sensitive information

Conclusion Link to heading

While removing sensitive files from Git history is possible, it’s always better to prevent such situations in the first place. By implementing proper security practices and educating team members, you can avoid the need for history rewriting and maintain a secure codebase.

Remember: Once sensitive data is pushed to a public repository, even if you remove it, it might have already been exposed. Always rotate any exposed credentials or keys immediately.

Additional Resources Link to heading