Web content crawler and content inventory software
The first step to auditing your content is to crawl your sites. Adding sites is quick and easy, just enter your URL and give your site a name. Content Auditor will crawl your site and send you an email when it’s done.
- enter URL and site name
- you’ll get an email when the crawl is done
Crawl from a subdirectory
There are different reasons you might want to crawl a URL that includes a subdirectory. In some cases you may want to only crawl content within that subdomain, and in other cases your site’s homepage might be redirecting to subdirectory. In each of these cases, you’ll have to tell our crawler how you’d like it to behave - do you want it to stay within that subdirectory, or crawl the entire site from the root? If Content Auditor sees a subdirectory in your URL it will prompt you to make a decision.
Crawl from a sitemap
In some cases you may already have a sitemap that you want to base your crawl on. If Content Auditor sees a URL ending with “sitemap.xml” it will automatically change it’s behaviour to follow the map instead of doing a full crawl.
Sometimes you want the crawler to avoid certain pages. For example, maybe you discovered in a previous crawl that when the crawler visits your search page it finds thousands of search result pages that skew your crawl results. The blacklist gives you a way to solve problems like this. The blacklist is a list of URLs that you want the crawler to ignore. You can enter full URLs to ignore specific pages, or you can enter partial URLs to ignore entire sections of your site. The screenshot below shows a setup where both the “/search” page and the “/blog” section would be ignored by the crawler.