The deep web, or invisible web, is all of the web accessible content that is not indexed by the major search engines. Millions of pages of information that cannot be found using Google.
The principal category of deep web sources is those sources where web pages are dynamically generated in response to queries while the underlying information is not stored in web page format. Examples of these include:
- company databases (e.g. Belgian business register),
- business opportunities (e.g. India’s EProcure),
- professional registers (e.g. GMC’s Medical Register),
- official registers (e.g. ICO’s Data protection register),
- land registries (e.g. Irish Republic’s Property Registration Authority Land Direct),
- telephone directories (e.g. British Telecom’s Phone Book),
- registers of bankruptcies and insolvencies (e.g. UK Individual Insolvency Register),
- planning registers (e.g. Camden Council, London),
- court and tribunal reports (e.g. England and Wales Care Standards Tribunal)
Other categories of deep web source include:
- websites in non-standard domains (e.g. domains such as “.chan” “.geek” which are hosted by OpenNIC),
- websites in the publicly accessible dark web: (e.g. “.onion” sites accessed through the Tor network),
- sites requiring intermediate step to access the pages (e.g. login, CAPTCHA),
- sites where the owners have restricted access using the robots.txt system or the “noindex” meta tag (e.g. www.aylesburyvale.gov.uk, www.claimsregulation.gov.uk),
- sites where search engines are forbidden by law to index the contents( e.g. search results removed under data protection law in Europe),
- sites that ISPs are legally required to block (e.g. “UK High Court orders ISPs to block Switch hacking sites”),
- information that major search engines do not index for privacy reasons (e.g. “Personal information that Google will remove”),
- non-current pages in web archives(e.g. this web site)
- “orphan pages” – pages in private networks or that are not linked to other pages
For more information please contact Tony Hay: