Optimizing Longest Consecutive Sequence with Hash-Based Algorithm
Finding the longest consecutive sequence in an array is a classic problem in computer science, often posed in technical interviews and algorithmic challenges. One efficient way to solve this problem is by using a hash-based algorithm that leverages hash sets for fast lookups. In this article, we explore the workings of this algorithm, its advantages, and how it compares to alternative approaches.
Problem Statement
Given an unsorted array of integers, determine the length of the longest sequence of consecutive integers.
Example:
Input: nums = [2, 20, 4, 10, 3, 4, 5]
Output: 4
Explanation: The longest consecutive sequence is [2, 3, 4, 5]
.
Hash-Based Algorithm
This algorithm efficiently solves the problem in (O(n)) time using the following steps:
Key Idea
Use a hash set to store all unique elements from the input array. Then, iterate through the array and identify the starting points of consecutive sequences (numbers that do not have a predecessor). For each starting point, compute the length of the sequence.
Algorithm Steps
- Convert Array to Hash Set: This allows for (O(1)) lookups to check if a number exists in the set.
- Find Sequence Start Points: For each number, check if it has no predecessor (i.e., (n – 1) is not in the set).
- Expand the Sequence: For each start point, count the length of the sequence by checking subsequent numbers (n + 1, n + 2, \dots) in the set.
- Update the Maximum Length: Track the longest sequence length encountered during the process.
Code Implementation
class Solution:
def longestConsecutive(self, nums: List[int]) -> int:
ans = 0
set_n = set(nums) # Convert array to hash set
for n in nums:
if n - 1 not in set_n: # Check if n is the start of a sequence
nxt = n + 1
while nxt in set_n: # Expand the sequence
nxt += 1
ans = max(ans, nxt - n) # Update maximum length
return ans
Example Walkthrough
Input: nums = [2, 20, 4, 10, 3, 4, 5]
-
Hash Set Conversion:
set_n = {2, 20, 4, 10, 3, 5}
-
Finding Start Points:
- For
n = 2
: Start of a sequence (no predecessor1
in the set). - For
n = 20
: Start of a sequence (no predecessor19
in the set). - For
n = 4, 10, 3, 5
: Skip as they have predecessors in the set.
- For
-
Expanding the Sequence:
- Starting at
2
: Expand to[2, 3, 4, 5]
(length = 4). - Starting at
20
: Sequence[20]
(length = 1).
- Starting at
-
Result: Longest sequence is
[2, 3, 4, 5]
with length4
.
Algorithm Complexity
- Time Complexity: (O(n))
- Building the hash set takes (O(n)).
- Each number is processed at most twice: once when checking if it’s a sequence start and once when expanding a sequence.
- Space Complexity: (O(n)) for the hash set.
Comparison with Other Approaches
1. Sorting-Based Algorithm
- Idea: Sort the array and find the longest consecutive subarray.
- Time Complexity: (O(n \log n)) due to sorting.
- Space Complexity: (O(1)) (in-place sorting) or (O(n)) (additional storage for sorted array).
-
Code:
def longestConsecutive(nums: List[int]) -> int: if not nums: return 0 nums.sort() longest, current = 1, 1 for i in range(1, len(nums)): if nums[i] == nums[i - 1]: continue elif nums[i] == nums[i - 1] + 1: current += 1 else: longest = max(longest, current) current = 1 return max(longest, current)
2. Union-Find Algorithm
- Idea: Treat each number as a node in a graph. Use union-find to group numbers into connected components and find the largest component size.
- Time Complexity: (O(n \alpha(n))) (nearly linear with path compression).
- Space Complexity: (O(n)).
- Use Case: Suitable for multi-dimensional or graph-based problems.
Advantages of the Hash-Based Algorithm
- Efficiency: Achieves (O(n)) time complexity, making it optimal for this problem.
- Simplicity: Straightforward to implement without additional data structures like heaps or union-find.
- Avoids Sorting: Works directly with the unsorted array, saving the overhead of sorting.
Conclusion
The hash-based algorithm is one of the most efficient solutions for finding the longest consecutive sequence in an array. By leveraging the fast lookup properties of hash sets and focusing only on sequence start points, it avoids unnecessary computations and achieves optimal performance. Compared to sorting or union-find approaches, it strikes an excellent balance between simplicity and efficiency, making it the preferred choice for most use cases.
Leave a Reply