Tools / Crawler / Getting started

Since you can only crawl and index content from domains you own, you need to verify each domain you want to crawl.

Add domains

  1. Open the Domains page in the Crawler dashboard.
  2. Click Add new domain.
  3. In the App ID field, enter your Algolia application ID, which you can find in the Algolia dashboard. The Crawler creates and updates indices in this application.
  4. In the Domains and subdomains field, enter the exact domains or subdomains you want to crawl, for example, algolia.com, www.algolia.com, or support.algolia.com.
  5. Click Add domain.

Verify the domain

To verify that you or your organization owns the domain you want to crawl, you need to add a verification code to your site’s robots.txt file. Each Algolia application has a unique verification code.

  1. On the Domain page, click Copy verification code to clipboard.
  2. Add the verification code to the robots.txt file of each site you want to crawl.

    1
    2
    3
    4
    5
    
    # Algolia-Crawler-Verif: XXXX
    
    User-Agent: *
    Allow: /
    # ...
    
  3. After confirming that the updated robots.txt file is online, click Verify now.

Create a new crawler

After verifying your domain, you can create a new crawler.

  1. Open the Crawlers page in the Crawler dashboard.
  2. Click New Crawler and enter the following information:

    • Your crawler name. Enter a descriptive name for your crawler.
    • App ID. Enter the same Algolia application ID you specified when adding a domain. The indices and extracted records will be added to this application.
    • Start URL. Enter a URL as the starting point for the crawler. The best starting URL is your sitemap.xml. If your site doesn’t have a sitemap, enter the URL with the most links to other pages.
    • Crawler template. If you want to configure a new crawler one of the supported static site generators, select that configuration template. Otherwise, select the default template.
  3. Click Create to finish the configuration of your crawler and run a test crawl.

Run the test crawl

To test if the crawler can access your site, find links, extract content, and upload them to an Algolia index, the initial crawl visits up to 100 URLs. If the test was successful, you can inspect the status, extracted content, and found links for each URL. You can review the records, which the crawler created during this crawl, in the Algolia dashboard.

For a summary of your test crawl, go to the Overview page.

Crawler overview after a crawl has finished

Next steps

The default configuration for a test crawl often falls short of what’s needed. For instance, you might need to set up scheduled automatic crawls, choose specific URLs to include or exclude, and determine what information to extract from each page. To learn more, see Configure a crawler.

Did you find this page helpful?