The Internet Archive discovers and captures web pages through many different web crawls.
At any given time several distinct crawls are running, some for months, and some every day or longer.
View the web archive through the .
The seeds for this crawl came from:
251 million Domains that had at least one link from a different domain in the Wayback Machine, across all time
~ 300 million Domains that we had in the Wayback, across all time
55,945,067 Domains from https://archive.org/details/wide00016
This crawl was run with a Heritrix setting of "maxHops=0" (URLs including their embeds)
The WARC files associated with this crawl are not currently available to the general public.