Google Dorking – What We Did

This week we were looking at how to ‘Google Dork’, this is a hacking technique that uses Google Custom Searches to find security holes in websites and networks. We have done a slightly more basic session on how to use Google Custom Searches to do things like only search on a certain website, how to search for certain file types, etc.. so we could use this knowledge to do some Google Dorking.

First, we learnt about robots.txt files. These are special files that are put on a website to tell search engines (e.g Google, Bing) what they can and cannot put on their search results. For example, this is what you get when you look at https://www.bbc.co.uk/robots.txt:

# v.4.8.6

User-agent: *

Sitemap: https://www.bbc.co.uk/sitemap.xml

Sitemap: https://www.bbc.co.uk/sitemaps/https-index-uk-archive.xml

Sitemap: https://www.bbc.co.uk/sitemaps/https-index-uk-news.xml

Disallow: /cbbc/search$

Disallow: /cbbc/search/

Disallow: /cbbc/search?

Disallow: /cbeebies/search$

Disallow: /cbeebies/search/

Disallow: /cbeebies/search?

Disallow: /chwilio/

Disallow: /chwilio$

Disallow: /chwilio?

Disallow: /iplayer/bigscreen/

Disallow: /iplayer/cbbc/episodes/

Disallow: /iplayer/cbbc/search

Disallow: /iplayer/cbeebies/episodes/

Disallow: /iplayer/cbeebies/search

Disallow: /iplayer/search

Disallow: /indepthtoolkit/smallprox$

Disallow: /indepthtoolkit/smallprox/

Disallow: /modules/musicnav/language/

Disallow: /news/0

Disallow: /radio/aod/

Disallow: /radio/aod$

Disallow: /radio/player/

Disallow: /radio/player$

Disallow: /search/

Disallow: /search$

Disallow: /search?

Disallow: /ugc$

Disallow: /ugc/

Disallow: /ugcsupport$

Disallow: /ugcsupport/

Disallow: /ugcstatic$

Disallow: /ugcstatic/

We can see in the robots.txt file above that there’s a lot of directories that BBC don’t want to be visible on search engines such as /ugc, /ugcsupport, /ugcstatic, etc..

We then went on to look at a website with a password protected section. When we looked in the robots.txt file for that website we managed to find the hidden folder with all of the password protected things in which we could look through freely.

Now we moved on to using Google Dorking to try and find some hidden websites. First of all we tried to find passwords that people might have left online using the search term username password email filetype:xls. This term will search for usernames, passwords and emails which are contained in an Excel Spreadsheet (XLS). Luckily, we didn’t find anything of use when searching that!

Then we found that we could find servers containing lots of MP3 and MP4 files using the search term queen mp3 intitle:"index of". This term will find MP3s containing the word ‘queen’ on pages that have a title of ‘index of’. This means that we’ll be targeting file servers that have a simple web interface for viewing files. This got us lots of results.

Finally, we moved onto looking for devices that shouldn’t be on the internet at all, like printers. We were able to find printers on Google by searching intitle:"hp laserjet" inurl:info_configuration.htm, this will find HP laserjet printer configuration pages. We were all very surprised with how many printers came up because printers are meant to only be accessible from the internal network, so some big mistakes must be happening for them to end up on Google! Most of the ones we found were thankfully password protected.

This session showed how easy that it is for people to find pages that you might be hiding on your website, and also how to find device that you might not want to be accessed by the public.