For this homework assignment, you will create a class that is able to clean and parse text into stemmed words using the SnowballStemmer class. Use UTF_8 and try-with-resources when writing your files. Do not use the java.io.File class.

Motivation

Before we can index and store data for our search engine, we need to figure out how to process text into a consistent form (such as converting to lowercase). Many search engines also stem words (converting words like practicing and practiced to a common root pratic) to help return relevant results no matter the form of the word used within the text.

We will rely on the Snowball stemmer algorithm found within the Apache OpenNLP library for stemming in this class.

<aside> <img src="/icons/git_gray.svg" alt="/icons/git_gray.svg" width="40px" /> This homework assignment is directly useful for your project. Consider copying this class into your project repository when done!

</aside>

Hints

Below are some hints that may help with this homework assignment:

These hints are optional. There may be multiple approaches to solving this homework.