Reference: LeetCode
Difficulty: Hard

## Problem

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

For example,

• [2,3,4], the median is 3

• [2,3], the median is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

• void addNum(int num) - Add a integer number from the data stream to the data structure.
• double findMedian() - Return the median of all elements so far.

Example:

Follow up: (Reference: Solutions to follow-ups)

1. If all integer numbers from the stream are between 0 and 100, how would you optimize it?
• We can maintain an integer array of length 100 to store the count of each number along with a total count. Then, we can iterate over the array to find the middle value to get our median. Time and space complexity would be O(100) = O(1).
1. If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?
• In this case, we can keep a counter for lessThanHundred and greaterThanHundred. As we know the solution will be definitely in 0-100 we don’t need to know those number which are >100 or <0, only count of them will be enough.

## Analysis

Reference: Solution Post

### Simple Sorting

Every time we add in a new value, we need to re-sort the array.

Time: addNum() takes $O(N\log{N})$ 462 ms
Space: $O(N)$

Linear Search:

When a new number comes, we have to add it to the list while maintaining the sorted nature of the list. This is achieved easily by finding the correct place to insert the incoming number.

Time: $O(N)$ 265 ms
Space: $O(N)$

Pop quiz Can we use binary search to improve efficiency? Sure!

Since the list is always sorted, we can use binary search.

Binary Search: (improvement)

Once the position is found, we need to shift all higher elements by one space to make room for the number, which takes $O(N)$.

Note: Here we use lower bound binary search since we need to find the position where the value is >= num.

Time: $O(N)$ 76 ms

• Binary search takes $O(\log{N})$ time.
• Insertion can take up to $O(N)$ time.

Space: $O(N)$

### Two Heaps

Idea: We only need a consistent way to access the median elements. Keeping the entire input sorted is not a requirement.

As it turns out there are two data structures for the job: heaps and self-balancing binary search trees.

Our goal is to maintain two heaps (balancing) in the following way:

• A max-heap to store the smaller half of the input numbers.
• A min-heap to store the larger half of the input numbers.
• Constraint: The max-heap is allowed to store, at most, one more element than the min-heap does.

Steps: Add the new number to max-heap. Since the max-heap received a new element, we must do a balancing step for the min-heap.

• Size Check (maxPQ > minPQ + 1): If the constraint is not satisfied, then remove the largest element from the max-heap and offer it to the min-heap.
• Value Check (maxPQ.peek() > minPQ.peek()): Check the max value and min value from two heaps where maxVal <= minVal; otherwise, remove the largest element from the max-heap and offer it to the min-heap.
• Size Check (maxPQ < minPQ): If the min-heap then holds one more elements than the max-heap after the previous operation. We fix that by removing the smallest element from the min-heap and offering it to the max-heap.

Time: $O(5\log{N}) = O(\log{N})$
Space: $O(N)$

### Multiset and Two Pointers

Reference: LeetCode Solution

It is based on self-balancing trees.

Comment Junhao Wang
Hi, I was a master student at USC studying Computer Science.