For this Project 2 Search milestone, your project must maintain the functionality from the Project 1 Index project, as well as process multi-line multi-word query text files, conduct an exact search of each multi-word query line, rank the results using term frequency, and produce the results in a pretty JSON format.
TABLE OF CONTENTS
You must complete the following assignments before beginning to work on this one:
<aside> <img src="/icons/info-alternate_gray.svg" alt="/icons/info-alternate_gray.svg" width="40px" /> It is also strongly recommended that you wait until you have completed Project v1.2 Review before starting.
</aside>
Your main
method must be placed in a class named Driver
and must process the following additional command-line arguments:
-query [path]
where the flag -query
indicates the next argument [path]
is a path to a query file. This will trigger an exact search for each of the multi-word line of queries in the query file.
If this flag is not provided, then no search should be performed. See the subsections below for details.
-results [path]
where the flag -results
indicates the next argument [path]
is the path to use for the search results output file.
If the [path]
argument is not provided, use results.json
as the default output filename. If the -results
flag is not provided, your code should still calculate the search results but should not produce an output file of those results.
See the “Output Format” section below for details on the pretty JSON output format required.
These are in addition to the command-line arguments from the previous Project v1.1 Tests assignment.
The command-line flag/value pairs may be provided in any order or not at all. Do not convert paths to absolute form when processing command-line input!
Output user-friendly error messages in the case of exceptions or invalid input. Under no circumstance should your main()
method output a stack trace to the user!
Search queries will be provided in a multi-line text file with one multi-word search query per line. When processing this file, your query parsing code must normalize, stem, and optimize the queries as follows:
Clean and parse each query line. Perform the same transformations to each line of query words as used when populating your inverted index. This includes cleaning the line of any non-alphabetic characters, converting the remaining characters to lowercase, splitting the cleaned line into words by whitespace, and stemming each word. For example, the query line:
Observers observing 99 HIDDEN capybaras!
…should be processed into the cleaned, parsed, and stemmed words [observ, observ, hidden, capybara]
after this step.