For this Project 4 Crawl milestone, your project must maintain the functionality from the Project v3.1 Tests assignment, as well as create a web crawler that can add a single web page to the inverted index.

TABLE OF CONTENTS


Prerequisites

You must complete the following assignments before beginning to work on this one:

Untitled

Functionality

Your main method must be placed in a class named Driver and must process the following additional command-line arguments:

These are in addition to the command-line arguments from the previous Project v3.1 Tests assignment.

The command-line flag/value pairs may be provided in any order or not at all. Do not convert paths to absolute form when processing command-line input!

Output user-friendly error messages in the case of exceptions or invalid input. Under no circumstance should your main() method output a stack trace to the user!

HTML Processing

Web pages must be requested using sockets and HTTP/S from the web server as follows:

For efficiency (and to avoid being blocked or rate-limited by the web server), do not download unnecessary content and only download necessary content exactly once from the web server. Specifically: