Tom Cannaerts

Why you shouldn’t expose your git repository through your web server

Git is a widely used version control system and is an essential part of a lot of (semi)automatic deployment strategies. While it offers a lot of benefits, not setting up your deployments and server correctly can lead to your source code being publicly available through your web server. This can give an attacker useful information to gain further access to your website and the data it contains.

Downloading the repository

If the .git folder is directly accessible under you document root and the web server does not prevent access to it, it can be downloaded by anyone. If your server happens to have directory listings enabled, anyone can download it through wget with a line as simple as this:

# wget –mirror –no-parent http://<your-target>/.git/

Once they have downloaded the repository, the source code can be restored to the last version after cleaning up some index.html files that wget created.

# cd <your-target>
# find . -name “index.html*” -exec rm {} \;
# git checkout — .

When directory listings aren’t enabled, but the .git folder is still accessible through the web server, it might still be possible to extract the repository. Each git repository contains a number of files that are almost always there, like .git/HEAD or .git/refs/head/master. These files point to objects under the .git/objects folder using the sha1 object id. After downloading the object, you can use the git command to examine the object and find references to other objects, and so on. It’s hard to do manually, but there are tools out there that can automate that for you.

Inside the repository

So now that they have the repository checked out, they can start looking at the code in the hopes of finding other attack vectors. These could be design or logic flaws in the code, but also checked-in configuration files with API endpoint and keys, database credentials or even database dumps with production data (yes, it happens). Combine that with a publicly available phpMyAdmin of remotely accessible MySQL server (as you might typically find on shared hosting), and they basically control all your data. In the light of GDPR, that would really a bad thing if your site is collecting personal information of your visitors.

Also keep in mind that the repository does not only hold the current state, but also the history of the codebase. This means that they can go back in time and restore files that have already been modified of deleted. Removing a file doesn’t remove it from git.

So how big is this problem?

Let’s start by just Googling that. Searching for the terms “index of” and “/.git” immediately yields some results. This means that the owners of these websites did not only manage to get their web server configuration wrong, but also somehow managed to get the .git folder indexed in Google. The real number is actually higher.

So I took it to the test and checked a number of .be and .nl websites by checking if either .git/ or .git/refs/heads/master could be retrieved. For .be, approximately 0.4% of the tested websites (+/- 300.000) were vulnerable and for .nl this was 0.2% (of 1.160.000 tested websites). Although this might seem small, this still means that approximately 6.000 .be and 11.600 .nl websites expose their git repository.

Skimming through the list of websites, there’s a broad variety in the types of websites that are affected, but the majority of affected sites are company/business websites. For me this is not really a surprise, since those companies (or the agencies that built their website) are more likely to be using git in the first place. The list includes websites of restaurants, e-commerce webshops, famous people, community websites, but also websites handling more sensitive information like insurance brokers or other financial services. There was also a website of a reasonable sized city, the website of a political party and a number of government operated (mostly informational) websites. Lead by example… I guess not!

Recommendations on protecting you from this

There are a number of things than can be done to protect you from this kind of attack, and most of them aren’t even hard to implement. Implementing the non-git specific issues will probably prevent a number of other attacks as well.

Don’t put the .git folder online

If you’re using git as a deployment strategy by pushing your changes directly to the server or by pulling your changes periodically from an upstream repository, push/pull to a folder outside the document root and use hooks to do a git checkout-index or git archive (piped to tar) to create a copy of the git repository without the actual git structures in it.

# git archive master | tar -x -C /path/to/documentroot

# git checkout-index -a -f –prefix=/path/to/documentroot/

Don’t make root of the repository the document root of the project.

Add an additional level to your repository that will contain all the files that need to be in the document root and set that folder as the document root in your web server config. Some frameworks (eg. Symfony) will already do this, but you can easily do this yourself. This also allows you to put other useful things in your repository without exposing them, like documentation, provisioning information, Vagrant or docker files, …

project
  - .git
    - <...>
  - documentation
  - htdocs <-- set this as the document root of your web server
    - index.php
  - README.md
  - Vagrantfile

Don’t put your config files in git

Passwords, API keys and other sensitive information does not belong in git. Look into the concept of .dist files if you want to provide some sort of template people can use to copy when setting up the project. Note that removing them from the files after you have checked them in, will not remove them from the repository. If you’re in this situation, either start over with a clean repository and import the old one (you’ll lose your commit history though), of look into the more dangerous git filter-branch command to remove the data from the history.

Prevent your git folder from being downloaded your web server config

There are probably only a few valid use cases for making anything that starts with a dot accessible through your web server, so prevent this all together and add exceptions when needed. If you use Let’s Encrypt, only allow the .well-known/acme-challenge folder, not the rest.

Use a web application firewall

A web application firewall (WAF) is a filter that will analyze requests before they are handed over to the web server for normal processing. This could be a dedicated appliance, but might also just be a module in your web server software (like mod_security for Apache). They are generally a bit more difficult to setup and require complete access to the hosting environment, but can offer protection from a broad range of attacks like SQL injections and XSS.

Use standard web server configuration

If using a WAF isn’t possible, you can also prevent this in the configuration files of the web server itself.

For Apache, this can be done in the server config or in .htaccess.

<FilesMatch "^\.(.*)$">
    Require all denied
</FilesMatch>
<DirectoryMatch "/\.(.*)">
    Require all denied
</DirectoryMatch>

Or with mod_rewrite in .htaccess

RewriteEngine on
RewriteRule ^\. - [F]

For Nginx

location ~ /. {
    deny all;
}

For other web servers, please consult the documentation of your specific product to see how to accomplish this.

Turn of Directory listings

When a user hits a URL that maps to a directory on disk and no default document is provided (eg. index.php), in most cases your web server should probably not return a directory listing, exposing all the files in the directory. Again, prevent it by default and add exceptions for the cases where you do want this.

Apache

Options -Indexes

Nginx

location / {
    autoindex off;
}

Conclusion

Don’t expose your repository through your web server. Even if the repository itself does not contain sensitive information like credentials or API keys, it gives an attacker insight in your applications. This information can be used to further gain access to the application and/or the data behind it. Preventing it can be done in various ways and isn’t hard to do.

 

Hi there! My name is Tom. I’m a Linux System Administrator and PHP Developer. My job is to keep PHP websites running as smooth as possible. Being both on the ‘dev’ and ‘ops’ side gives me a broad skillset across the entire stack.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.