350. Intersection of Two Arrays II

Reference: LeetCode
Difficulty: Easy

Problem

Given two arrays, write a function to compute their intersection.

Note:

Each element in the result should appear as many times as it shows in both arrays.
The result can be in any order.

Example:

Input: nums1 = [1,2,2,1], nums2 = [2,2]
Output: [2,2]

Input: nums1 = [4,9,5], nums2 = [9,4,9,8,4]
Output: [4,9]

Follow up:

What if the given array is already sorted? How would you optimize your algorithm?

Binary search
Two pointers

What if nums1‘s size is small compared to nums2‘s size? Which algorithm is better? (Reference)

The size of nums1 is $M$ ($M < N$).
Two Pointers: The worst case runtime is $O(N)$.
Binary Search: The worst case runtime is $O(M\log{N})$.
So the binary search method is better.

What if elements of nums2 are stored on disk, and the memory is limited such that you cannot load all elements into the memory at once?

If only nums2 cannot fit in memory, put all elements of nums1 in a hash map. Read chunks of array that fit into the memory and calculate the intersections. It’s better to use the smaller array (nums1) to construct the counter hash map.
If both nums1 and nums2 are so large that neither fit into the memory, sort them individually, then read two elements from each array at a time and calculate their intersections.

Analysis

Hash Map

Let nums2 be the smaller array ($M > N$) and convert it to a hash map. We use this hash map to count the occurrence of elements.

Note:

public int[] intersect(int[] nums1, int[] nums2) {
  List<Integer> result = new ArrayList<>();
  Map<Integer, Integer> map = new HashMap<>();
  
  // Optimization for space
  if (nums1.length < nums2.length) {
    int[] temp = nums1;
    nums1 = nums2;
    nums2 = temp;
  }
  
  // Init Map<val2, count> for nums2 (which has smaller size)
  for (int val2 : nums2) { // O(N)
    map.put(val2, map.getOrDefault(val2, 0) + 1);
  }
  
  // For each val1 in nums1
  for (int val1 : nums1) { // O(M)
    if (map.containsKey(val1)) {
      int count = map.get(val1);
      if (count > 0) {
        result.add(val1);
        map.put(val1, count - 1);
      }
    }
  }
  
  // To array
  int[] output = new int[result.size()];
  for (int i = 0; i < result.size(); ++i) {
    output[i] = result.get(i);
  }
  return output;
}

Time: $O(M + N)$
Space: $O(N)$ where $M > N$.

Binary Search

Assume the arrays nums1 and nums2 are sorted, otherwise it takes $O(M\log{M})$ and $O(N\log{N})$ time to sort them.

Let the size of nums1 is smaller than nums2 ($M < N$). For each element in nums1, it takes $O(M)$ and for each step we do a lower bound binary search in nums2 so it takes $O(\log{N})$. In total, the runtime should be $O(M\log{N})$ if the arrays are pre-sorted.

Note:

Write an extra boundary checking function since we check boundary usually.
Consider the following cases:
- Case 1:
  - nums1: [1 1 1 2 3]
  - nums2: [1 2 3 4 5 6 7]
- Case 2:
  - nums1: [1 2 3]
  - nums2: [1 1 1 2 3 4]
- Case 3:
  - nums1: [1 2 3]
  - nums2: [1 2 3]
So for each iteration in nums1 we need to do the following checking:
- Continue adding to result list when nums1[idx1] == nums2[idx2].
- In each iteration we are inspecting curr in nums1. As idx1 increases, we make sure nums1[idx1] == curr. (Case 3)
When nums1[idx1] != nums2[idx2], we need to update idx1 until nums1[idx1] != curr since we have done with curr. (Case 1)

public int[] intersect(int[] nums1, int[] nums2) {
  List<Integer> result = new ArrayList<>();
  
  // Sorting
  Arrays.sort(nums1); // O(MlogM)
  Arrays.sort(nums2); // O(NlogN)
  
  // Optimization (let nums2 be the larger one)
  if (nums1.length > nums2.length) {
    int[] temp = nums1;
    nums1 = nums2;
    nums2 = temp;
  }
  
  // For each val1 in nums1
  int idx1 = 0;
  while (idx1 < nums1.length) { // O(MlogN) or O(NlogM)
    // current val
    int curr = nums1[idx1];
    // lower bound binary search
    int idx2 = lowerBoundBS(nums2, curr);
    // watch out for the case: if using nums1[idx1] == nums2[idx2]
    // 1 2 3
    // 1 2 3
    while (inBound(nums1, idx1) && inBound(nums2, idx2) && nums1[idx1] == curr && nums1[idx1] == nums2[idx2]) {
      result.add(nums1[idx1]); // add to result
      idx1 += 1;
      idx2 += 1;
    }
    // move idx1 to a value not equal to curr
    while (inBound(nums1, idx1) && nums1[idx1] == curr) {
      idx1 += 1;
    }
  }
  
  // To array
  int[] output = new int[result.size()];
  for (int i = 0; i < result.size(); ++i) {
    output[i] = result.get(i);
  }
  return output;
}

private boolean inBound(int[] nums, int index) {
  return index >= 0 && index < nums.length;
}

private int lowerBoundBS(int[] nums, int target) {
  int n = nums.length;
  int lo = 0, hi = n - 1;
  while (lo <= hi) {
    int mid = lo + (hi - lo) / 2;
    if (nums[mid] >= target) {
      hi = mid - 1;
    } else {
      lo = mid + 1;
    }
  }
  // check
  if (inBound(nums, lo) && nums[lo] == target) {
    return lo;
  } else {
    return -1;
  }
}

Time: $O(M\log{N})$
Space: $O(1)$

Two Pointers

Based on the solution in 349. Intersection of Two Arrays, we change the hash set to a list.

public int[] intersect(int[] nums1, int[] nums2) {
  // Assume they are sorted
  Arrays.sort(nums1);
  Arrays.sort(nums2);

  List<Integer> result = new ArrayList<>();

  int n1 = nums1.length, n2 = nums2.length;
  int i1 = 0, i2 = 0;
  while (i1 < n1 && i2 < n2) {
    while (i1 < n1 && i2 < n2 && nums1[i1] < nums2[i2]) ++i1;
    while (i1 < n1 && i2 < n2 && nums1[i1] > nums2[i2]) ++i2;

    // critical: consider nums1[i1] < nums2[i2] --> if it is, we should continue to the next loop
    if (i1 < n1 && i2 < n2 && nums1[i1] == nums2[i2]) {
      result.add(nums1[i1]); // we use hash set --> no need to handle duplicates
      ++i1;
      ++i2; // update indices should be inside the if logic
    }
  }

  int[] output = new int[result.size()];
  int i = 0;
  for (int val : result) {
    output[i] = val;
    ++i;
  }

  return output;
}

Time: $O(M)$ or $O(N)$
Space: $O(1)$

As for the time complexity, consider:

Case 1:
- [1 2 3 4 5 6 7 8] $O(M)$
- [9 10]
Case 2:
- [1 2 3] $O(M)$
- [4 5 6 7 8 9 10]

So we say the time complexity is $O(M)$ OR $O(N)$. In the worst case, it is $O(\text{max}(M, N))$