Module Web.Crawler

Description

This module implements a generic web crawler.

Features:

Fully asynchronous operation (Several hundred simultaneous requests)

Supports the /robots.txt exclusion standard

Extensible

URI Queues

Allow/Deny rules

Configurable

Number of concurrent fetchers

Bits per second (bandwidth throttling)

Number of concurrent fetchers per host

Delay between fetches from the same host

Supports HTTP and HTTPS