For this Project 1 Index milestone, your project must also store and output an in-memory inverted index of the processed file(s) alongside the previously computed word counts.

TABLE OF CONTENTS


Prerequisites

You must complete the following assignments before beginning to work on this one:

Untitled

Functionality

Your main method must be placed in a class named Driver and must process the following additional command-line arguments:

These are in addition to the command-line arguments from the previous release of the project.

The command-line flag/value pairs may be provided in any order or not at all. Do not convert paths to absolute form when processing command-line input!

Output user-friendly error messages in the case of exceptions or invalid input. Under no circumstance should your main() method output a stack trace to the user!

Text Processing

The input files should be cleaned, parsed, and stemmed as before, however your code must now also create an in-memory inverted index data structure alongside the word counts. The inverted index must store a mapping from a word to the document location(s) the word was found, and the numeric position(s) in that document the word is located. The positions should start at 1. This will require nesting multiple built-in data structures.

Each file should only be opened once; the word counts and the inverted index should be built at the same time.

For example, suppose we have the following inverted index:

{
  "capybara": {
    "input/mammals.txt": [
      11
    ]
  },
  "platypus": {
    "input/dangerous/venomous.txt": [
      2
    ],
    "input/mammals.txt": [
      3,
      8
    ]
  }
}