For this homework, you will use the HttpsFetcher
code from lecture to create an HtmlFetcher
class specifically for downloading only HTML content from web servers efficiently using sockets and the HTTP/S protocol.
Your search engine project starting with Project 4 Crawl will index web pages instead of text files. Before doing that, it must be able to download HTML web pages over a socket connection from a web server. For efficiency, content should not be downloaded unless certain conditions are met first.
<aside> <img src="/icons/info-alternate_gray.svg" alt="/icons/info-alternate_gray.svg" width="40px" /> Before using this homework with Project 4 Crawl, you need to make some modifications. Specifically, you eventually need to make your HtmlFetcher, HtmlCleaner, and LinkFinder classes work together so that you fetch HTML, then parse links after stripping block elements, but before stripping tags and entities. (This is not required for the homework, however.)
</aside>
Below are some hints that may help with this homework assignment:
HttpsFetcher.fetch(URI uri)
in your implementation. Instead, setup the sockets and get the headers. Based on those headers, decide how to proceed.These hints are optional. There may be multiple approaches to solving this homework.