A Web Crawler is a computer program that automatically browses the World Wide Web in a methodical way. Web Crawlers is also called ant, bot, worm or Web spider. The process of scanning the WWW is called Web crawling or spidering.
What Web Crawlers do?
Web Crawling is used by Search engines to provide up-to-date data to the users. What Web Crawlers essentially do is to create a copy of all the visited pages for later processing by a Search Engine. The search engine will then index the downloaded pages in order to provide fast searches.
Web Crawlers are also used for automating tasks on websites such as checking links or validating HTML code.
A Web crawler usually starts with a list of URLs to visit (called the seeds). As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit (crawl frontier). URLs from the frontier are then recursively visited according to a set of policies.
Here is a picture that I did to show you the architecture of a Web Crawler:















