Algorithms 101: Optimizing Longest Consecutive Sequence with Hash-Based Algorithm

Optimizing Longest Consecutive Sequence with Hash-Based Algorithm

Finding the longest consecutive sequence in an array is a classic problem in computer science, often posed in technical interviews and algorithmic challenges. One efficient way to solve this problem is by using a hash-based algorithm that leverages hash sets for fast lookups. In this article, we explore the workings of this algorithm, its advantages, and how it compares to alternative approaches.

Problem Statement

Given an unsorted array of integers, determine the length of the longest sequence of consecutive integers.

Example:

Input: nums = [2, 20, 4, 10, 3, 4, 5]
Output: 4

Explanation: The longest consecutive sequence is [2, 3, 4, 5].

Hash-Based Algorithm

This algorithm efficiently solves the problem in (O(n)) time using the following steps:

Key Idea

Use a hash set to store all unique elements from the input array. Then, iterate through the array and identify the starting points of consecutive sequences (numbers that do not have a predecessor). For each starting point, compute the length of the sequence.

Algorithm Steps

Convert Array to Hash Set: This allows for (O(1)) lookups to check if a number exists in the set.
Find Sequence Start Points: For each number, check if it has no predecessor (i.e., (n – 1) is not in the set).
Expand the Sequence: For each start point, count the length of the sequence by checking subsequent numbers (n + 1, n + 2, \dots) in the set.
Update the Maximum Length: Track the longest sequence length encountered during the process.

Code Implementation

class Solution:
    def longestConsecutive(self, nums: List[int]) -> int:
        ans = 0
        set_n = set(nums)  # Convert array to hash set

        for n in nums:
            if n - 1 not in set_n:  # Check if n is the start of a sequence
                nxt = n + 1
                while nxt in set_n:  # Expand the sequence
                    nxt += 1
                ans = max(ans, nxt - n)  # Update maximum length

        return ans

Example Walkthrough

Input: nums = [2, 20, 4, 10, 3, 4, 5]

Hash Set Conversion:
set_n = {2, 20, 4, 10, 3, 5}
Finding Start Points:
- For n = 2: Start of a sequence (no predecessor 1 in the set).
- For n = 20: Start of a sequence (no predecessor 19 in the set).
- For n = 4, 10, 3, 5: Skip as they have predecessors in the set.
Expanding the Sequence:
- Starting at 2: Expand to [2, 3, 4, 5] (length = 4).
- Starting at 20: Sequence [20] (length = 1).
Result: Longest sequence is [2, 3, 4, 5] with length 4.

Algorithm Complexity

Time Complexity: (O(n))
- Building the hash set takes (O(n)).
- Each number is processed at most twice: once when checking if it’s a sequence start and once when expanding a sequence.
Space Complexity: (O(n)) for the hash set.

Comparison with Other Approaches

1. Sorting-Based Algorithm

Idea: Sort the array and find the longest consecutive subarray.
Time Complexity: (O(n \log n)) due to sorting.
Space Complexity: (O(1)) (in-place sorting) or (O(n)) (additional storage for sorted array).

Code:

def longestConsecutive(nums: List[int]) -> int:
  if not nums: return 0
  nums.sort()
  longest, current = 1, 1

  for i in range(1, len(nums)):
      if nums[i] == nums[i - 1]:
          continue
      elif nums[i] == nums[i - 1] + 1:
          current += 1
      else:
          longest = max(longest, current)
          current = 1

  return max(longest, current)

2. Union-Find Algorithm

Idea: Treat each number as a node in a graph. Use union-find to group numbers into connected components and find the largest component size.
Time Complexity: (O(n \alpha(n))) (nearly linear with path compression).
Space Complexity: (O(n)).
Use Case: Suitable for multi-dimensional or graph-based problems.

Advantages of the Hash-Based Algorithm

Efficiency: Achieves (O(n)) time complexity, making it optimal for this problem.
Simplicity: Straightforward to implement without additional data structures like heaps or union-find.
Avoids Sorting: Works directly with the unsorted array, saving the overhead of sorting.

Conclusion

The hash-based algorithm is one of the most efficient solutions for finding the longest consecutive sequence in an array. By leveraging the fast lookup properties of hash sets and focusing only on sequence start points, it avoids unnecessary computations and achieves optimal performance. Compared to sorting or union-find approaches, it strikes an excellent balance between simplicity and efficiency, making it the preferred choice for most use cases.

Post Views: 65

CODEBITWAVE