Python is known for its simplicity and elegance, making it an excellent language for both beginners and experienced programmers. One of the best ways to learn Python is by solving coding problems and understanding real-world examples. In this blog, we’ll explore a classic problem, "Group Anagrams," and solve it step-by-step using Python.
Problem: Group Anagrams
Problem Statement:
Given an array of strings, group the anagrams together. An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.
For example, given the input ["eat", "tea", "tan", "ate", "nat", "bat"]
, the function should return:
[
["eat", "tea", "ate"],
["tan", "nat"],
["bat"]
]
Solution Overview
To solve this problem, we’ll use a hash map (dictionary in Python) to group words that are anagrams of each other. The key insight is that two anagrams, when sorted, will have the same sequence of characters. However, instead of sorting, we’ll count the frequency of each character to create a unique signature for each group of anagrams. This method is more efficient than sorting, especially for longer strings.
Python Code Implementation
Let’s dive into the Python code for solving the Group Anagrams problem.
from typing import List
from collections import defaultdict
class Solution:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
res = defaultdict(list)
for s in strs:
# Initialize a list of 26 zeros for counting each character (a-z)
count = [0] * 26
for c in s:
# Update the count of the current character in the list
count[ord(c) - ord("a")] += 1
# Convert the count list to a tuple (which can be used as a key in the dictionary)
res[tuple(count)].append(s)
# Return the values of the dictionary as the grouped anagrams
return list(res.values())
Code Explanation
Step-by-Step Breakdown:
-
Initialization:
res = defaultdict(list)
English:
We start by initializing adefaultdict
calledres
, which will store our grouped anagrams. Thedefaultdict
is a type of dictionary that automatically initializes a default value for non-existent keys. Here, the default value is an empty list.Chinese:
我们首先初始化一个名为res
的defaultdict
,它将存储我们分组的异位词。defaultdict
是一种字典类型,它会自动为不存在的键初始化一个默认值。在这里,默认值是一个空列表。 -
Counting Character Frequencies:
count = [0] * 26
English:
For each strings
in the input list, we create a listcount
of size 26 (representing each letter from ‘a’ to ‘z’). This list will store the frequency of each character in the string.Chinese:
对于输入列表中的每个字符串s
,我们创建一个大小为 26 的列表count
(表示从 ‘a’ 到 ‘z’ 的每个字母)。这个列表将存储字符串中每个字符的频率。for c in s: count[ord(c) - ord("a")] += 1
English:
We iterate over each characterc
in the strings
. Theord(c) - ord("a")
expression calculates the index of the characterc
in the alphabet (e.g., ‘a’ -> 0, ‘b’ -> 1, …, ‘z’ -> 25). We increment the corresponding index in thecount
list.Chinese:
我们遍历字符串s
中的每个字符c
。ord(c) - ord("a")
表达式计算字符c
在字母表中的索引(例如 ‘a’ -> 0,’b’ -> 1,…,’z’ -> 25)。我们递增count
列表中相应索引的值。 -
Grouping Anagrams:
res[tuple(count)].append(s)
English:
After processing the string, we convert thecount
list to a tuple and use it as a key in the dictionaryres
. We append the strings
to the list of anagrams associated with this key. Since anagrams will have the same character frequency distribution, they will share the same tuple key.Chinese:
处理完字符串后,我们将count
列表转换为一个元组,并将其用作字典res
中的键。我们将字符串s
添加到与此键相关的异位词列表中。由于异位词具有相同的字符频率分布,因此它们将共享相同的元组键。 -
Returning the Result:
return list(res.values())
English:
Finally, we return the values of the dictionaryres
, which are lists of grouped anagrams.Chinese:
最后,我们返回字典res
的值,这些值是分组异位词的列表。
Why This Approach Works
English:
This approach works efficiently because:
- Counting instead of Sorting: Counting character frequencies takes linear time relative to the length of the string, whereas sorting would take
O(N log N)
time. This makes the algorithm more efficient for larger inputs. - Hashable Keys: Using tuples as keys in a dictionary ensures that all anagrams with the same character frequency distribution are grouped together.
Chinese:
这种方法之所以有效,是因为:
- 计数代替排序: 计算字符频率所需的时间与字符串长度成线性关系,而排序则需要
O(N log N)
时间。对于较大的输入,这使得算法更为高效。 - 可哈希键: 使用元组作为字典中的键,确保具有相同字符频率分布的所有异位词被分组在一起。
Python Tips from This Example
1. defaultdict
:
- English: A
defaultdict
is a powerful tool in Python for handling cases where you need a dictionary with default values. - Chinese:
defaultdict
是 Python 中处理需要带有默认值的字典的强大工具。
2. ord()
Function:
- English: The
ord()
function returns the ASCII value of a character, which can be useful for calculating positions in the alphabet. - Chinese:
ord()
函数返回字符的 ASCII 值,这在计算字母表中的位置时非常有用。
3. Tuples as Dictionary Keys:
- English: Tuples can be used as dictionary keys because they are immutable and hashable, making them ideal for storing complex, compound keys.
- Chinese: 元组可以用作字典键,因为它们是不可变且可哈希的,非常适合存储复杂的复合键。
Conclusion
English:
By working through the Group Anagrams problem, you’ve learned how to use some powerful features of Python, such as defaultdict
, the ord()
function, and tuples. Understanding and applying these concepts will help you write more efficient and readable Python code. As you continue learning Python, try solving more problems, breaking them down step-by-step, and implementing them using the language’s rich set of tools.
Chinese:
通过解决异位词分组问题,您已经学会了如何使用 Python 的一些强大功能,如 defaultdict
、ord()
函数和元组。理解和应用这些概念将帮助您编写更高效、更具可读性的 Python 代码。在继续学习 Python 的过程中,尝试解决更多问题,逐步分析并使用该语言丰富的工具集来实现它们。
Happy coding!
编程愉快!
Leave a Reply