README - Test Site Web Crawler Validation
=========================================

INSTALLATION
------------
No installation required. This is a static test site.

USAGE
-----
1. Point your web crawler to the index.html file
2. Configure crawl parameters (depth, delay, etc.)
3. Monitor crawler behavior

EXPECTED BEHAVIOR
-----------------
A properly functioning crawler should:
- Respect robots.txt rules
- Not visit /private/ directory
- Wait at least 1 second between requests (for MunicipalCrawler)
- Detect and skip duplicate URLs
- Handle both relative and absolute links
- Download and process text files

TEST SCENARIOS
--------------
1. Depth Testing
   - Set max_depth=2 to exclude docs/ pages
   - Set max_depth=3 to include all pages

2. Duplicate Detection
   - Multiple links to same pages should be visited only once
   - Check crawler logs for duplicate skipping

3. Robots.txt Compliance
   - Crawler should read and parse robots.txt
   - Should not access /private/ section
   - Should implement specified crawl delay

4. File Handling
   - Should download .txt files
   - Should parse .html files for links

TROUBLESHOOTING
---------------
- If crawler accesses /private/, check robots.txt parsing
- If visiting duplicates, verify URL normalization
- If too fast, check crawl delay implementation

For support, see contact.html or visit the FAQ.