Python 101: Learning Python Through Code Examples

Python is known for its simplicity and elegance, making it an excellent language for both beginners and experienced programmers. One of the best ways to learn Python is by solving coding problems and understanding real-world examples. In this blog, we’ll explore a classic problem, "Group Anagrams," and solve it step-by-step using Python.

Problem: Group Anagrams

Problem Statement:
Given an array of strings, group the anagrams together. An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

For example, given the input ["eat", "tea", "tan", "ate", "nat", "bat"], the function should return:

[
  ["eat", "tea", "ate"],
  ["tan", "nat"],
  ["bat"]
]

Solution Overview

To solve this problem, we’ll use a hash map (dictionary in Python) to group words that are anagrams of each other. The key insight is that two anagrams, when sorted, will have the same sequence of characters. However, instead of sorting, we’ll count the frequency of each character to create a unique signature for each group of anagrams. This method is more efficient than sorting, especially for longer strings.

Python Code Implementation

Let’s dive into the Python code for solving the Group Anagrams problem.

from typing import List
from collections import defaultdict

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        res = defaultdict(list)
        for s in strs:
            # Initialize a list of 26 zeros for counting each character (a-z)
            count = [0] * 26
            for c in s:
                # Update the count of the current character in the list
                count[ord(c) - ord("a")] += 1

            # Convert the count list to a tuple (which can be used as a key in the dictionary)
            res[tuple(count)].append(s)

        # Return the values of the dictionary as the grouped anagrams
        return list(res.values())

Code Explanation

Step-by-Step Breakdown:

  1. Initialization:

    res = defaultdict(list)

    English:
    We start by initializing a defaultdict called res, which will store our grouped anagrams. The defaultdict is a type of dictionary that automatically initializes a default value for non-existent keys. Here, the default value is an empty list.

    Chinese:
    我们首先初始化一个名为 resdefaultdict,它将存储我们分组的异位词。defaultdict 是一种字典类型,它会自动为不存在的键初始化一个默认值。在这里,默认值是一个空列表。

  2. Counting Character Frequencies:

    count = [0] * 26

    English:
    For each string s in the input list, we create a list count of size 26 (representing each letter from ‘a’ to ‘z’). This list will store the frequency of each character in the string.

    Chinese:
    对于输入列表中的每个字符串 s,我们创建一个大小为 26 的列表 count(表示从 ‘a’ 到 ‘z’ 的每个字母)。这个列表将存储字符串中每个字符的频率。

    for c in s:
       count[ord(c) - ord("a")] += 1

    English:
    We iterate over each character c in the string s. The ord(c) - ord("a") expression calculates the index of the character c in the alphabet (e.g., ‘a’ -> 0, ‘b’ -> 1, …, ‘z’ -> 25). We increment the corresponding index in the count list.

    Chinese:
    我们遍历字符串 s 中的每个字符 cord(c) - ord("a") 表达式计算字符 c 在字母表中的索引(例如 ‘a’ -> 0,’b’ -> 1,…,’z’ -> 25)。我们递增 count 列表中相应索引的值。

  3. Grouping Anagrams:

    res[tuple(count)].append(s)

    English:
    After processing the string, we convert the count list to a tuple and use it as a key in the dictionary res. We append the string s to the list of anagrams associated with this key. Since anagrams will have the same character frequency distribution, they will share the same tuple key.

    Chinese:
    处理完字符串后,我们将 count 列表转换为一个元组,并将其用作字典 res 中的键。我们将字符串 s 添加到与此键相关的异位词列表中。由于异位词具有相同的字符频率分布,因此它们将共享相同的元组键。

  4. Returning the Result:

    return list(res.values())

    English:
    Finally, we return the values of the dictionary res, which are lists of grouped anagrams.

    Chinese:
    最后,我们返回字典 res 的值,这些值是分组异位词的列表。

Why This Approach Works

English:
This approach works efficiently because:

  • Counting instead of Sorting: Counting character frequencies takes linear time relative to the length of the string, whereas sorting would take O(N log N) time. This makes the algorithm more efficient for larger inputs.
  • Hashable Keys: Using tuples as keys in a dictionary ensures that all anagrams with the same character frequency distribution are grouped together.

Chinese:
这种方法之所以有效,是因为:

  • 计数代替排序: 计算字符频率所需的时间与字符串长度成线性关系,而排序则需要 O(N log N) 时间。对于较大的输入,这使得算法更为高效。
  • 可哈希键: 使用元组作为字典中的键,确保具有相同字符频率分布的所有异位词被分组在一起。

Python Tips from This Example

1. defaultdict:

  • English: A defaultdict is a powerful tool in Python for handling cases where you need a dictionary with default values.
  • Chinese: defaultdict 是 Python 中处理需要带有默认值的字典的强大工具。

2. ord() Function:

  • English: The ord() function returns the ASCII value of a character, which can be useful for calculating positions in the alphabet.
  • Chinese: ord() 函数返回字符的 ASCII 值,这在计算字母表中的位置时非常有用。

3. Tuples as Dictionary Keys:

  • English: Tuples can be used as dictionary keys because they are immutable and hashable, making them ideal for storing complex, compound keys.
  • Chinese: 元组可以用作字典键,因为它们是不可变且可哈希的,非常适合存储复杂的复合键。

Conclusion

English:
By working through the Group Anagrams problem, you’ve learned how to use some powerful features of Python, such as defaultdict, the ord() function, and tuples. Understanding and applying these concepts will help you write more efficient and readable Python code. As you continue learning Python, try solving more problems, breaking them down step-by-step, and implementing them using the language’s rich set of tools.

Chinese:
通过解决异位词分组问题,您已经学会了如何使用 Python 的一些强大功能,如 defaultdictord() 函数和元组。理解和应用这些概念将帮助您编写更高效、更具可读性的 Python 代码。在继续学习 Python 的过程中,尝试解决更多问题,逐步分析并使用该语言丰富的工具集来实现它们。

Happy coding!
编程愉快!

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *