# 819. Most Common Word

Reference: LeetCode
Difficulty: Easy

## Problem

Given a paragraph and a list of banned words, return the most frequent word that is not in the list of banned words. It is guaranteed there is at least one word that isn’t banned, and that the answer is unique.

Words in the list of banned words are given in lowercase, and free of punctuation. Words in the paragraph are not case sensitive. The answer is in lowercase.

Note:

• 1 <= paragraph.length <= 1000.
• 0 <= banned.length <= 100.
• 1 <= banned[i].length <= 10.
• The answer is unique, and written in lowercase (even if its occurrences in paragraph may have uppercase symbols, and even if it is a proper noun.)
• Paragraph only consists of letters, spaces, or the punctuation symbols !?',;..
• There are no hyphens or hyphenated words.
• Words only consist of letters, never apostrophes or other punctuation symbols.

Example:

## Analysis

### Hash Set + Hash Map

Note: str.split("\\s+") is equivalent to str.split("\\s+", 0). It means split the string for as many times as possible, and remove empty result "". So trim() is not necessary here.

Original code:

Here is the preprocess function:

Or:

replace() vs. replaceAll():

• replace(char oldChar, char newChar)
• replace(CharSequence target, CharSequence replacement)
• replaceAll(String regex, String replacement)

Notice that they all replace all occurrences. All in the name of replaceAll doesn’t mean only it can replace all occurrences.

A succinct version:

• \\w+ matches all alphanumeric characters and _.
• \\W+ matches all characters except alphanumeric characters and _.
• They are opposite.

A more succinct version:

Rules about split(): 271. Encode and Decode Strings

Comment
Junhao Wang
a software engineering cat