Leetcode SQL: 1661. Average Time of Process per Machine

1661. Average Time of Process per Machine

SQL Code:

SELECT machine_id, 
       ROUND(AVG(end_time - start_time), 3) AS processing_time
FROM (
    SELECT machine_id, 
           process_id,
           MAX(CASE WHEN activity_type = 'end' THEN timestamp END) AS end_time,
           MAX(CASE WHEN activity_type = 'start' THEN timestamp END) AS start_time
    FROM Activity
    GROUP BY machine_id, process_id
) AS process_times
GROUP BY machine_id;

Explanation:

Subquery:
- We use a subquery to calculate the start and end times for each machine_id and process_id pair.
- The MAX(CASE WHEN activity_type = 'end' THEN timestamp END) retrieves the end timestamp for each process, and MAX(CASE WHEN activity_type = 'start' THEN timestamp END) retrieves the start timestamp.
- The GROUP BY machine_id, process_id ensures that we group the rows by both machine_id and process_id, which are unique pairs as described in the schema.
Subquery Explanation in Chinese:
- 我们使用子查询来计算每个 machine_id 和 process_id 对应的开始和结束时间。
- MAX(CASE WHEN activity_type = 'end' THEN timestamp END) 获取每个过程的 end 时间戳，MAX(CASE WHEN activity_type = 'start' THEN timestamp END) 获取 start 时间戳。
- GROUP BY machine_id, process_id 确保我们按 machine_id 和 process_id 分组，这些是唯一的组合。
SELECT machine_id, ROUND(AVG(end_time – start_time), 3) AS processing_time:
- After obtaining the start and end times for each process, we calculate the difference (end_time - start_time) for each process to get the process duration.
- The AVG() function is used to calculate the average processing time for each machine.
- We use ROUND() to round the result to 3 decimal places, as required by the problem.
- Finally, we group the results by machine_id to get the average processing time for each machine.
Explanation in Chinese:
- 获取每个过程的开始和结束时间后，我们计算每个过程的持续时间 (end_time - start_time)。
- 使用 AVG() 函数来计算每台机器的平均处理时间。
- 使用 ROUND() 将结果保留到小数点后 3 位。
- 最后，按 machine_id 分组，获取每台机器的平均处理时间。
GROUP BY machine_id:
- The final GROUP BY machine_id ensures that the average processing time is calculated for each machine individually.
Explanation in Chinese:
- 最后的 GROUP BY machine_id 确保我们为每台机器分别计算平均处理时间。

Key Concepts:

Subquery with Aggregation:
In the subquery, we use MAX() with a CASE WHEN clause to extract the start and end timestamps from the Activity table. This ensures that we get both timestamps for each process_id and machine_id.

带聚合的子查询:
在子查询中，我们使用 MAX() 和 CASE WHEN 子句从 Activity 表中提取 start 和 end 时间戳。这确保我们获取每个 process_id 和 machine_id 的两个时间戳。
Time Difference:
The time difference (end_time - start_time) gives us the duration for each process, and averaging this difference provides the average processing time for each machine.

时间差:
时间差 (end_time - start_time) 提供每个过程的持续时间，计算此差值的平均值可以得到每台机器的平均处理时间。
ROUND():
We use the ROUND() function to round the result to 3 decimal places, as required.

ROUND() 函数:
使用 ROUND() 函数将结果四舍五入到小数点后 3 位。

Warnings:

Consistency in Time Units:
Ensure that the timestamp values in the Activity table are consistent in terms of time units (e.g., seconds). Since the question specifies timestamp as a float representing seconds, the calculation should proceed without issues.

时间单位的一致性:
确保 Activity 表中的 timestamp 值的时间单位一致（例如，秒）。问题中指定 timestamp 是表示秒的浮点数，因此计算应该不会有问题。

Summary:

This query calculates the average time each machine takes to complete a process by calculating the time difference between end and start timestamps for each process and then averaging these values for each machine_id. The result is rounded to 3 decimal places, as required.

中文总结:
该查询通过计算每个过程的 end 和 start 时间戳之间的时间差来计算每台机器完成一个过程的平均时间，然后为每个 machine_id 取这些值的平均值。结果保留到小数点后三位。

Tips:

Indexing for Performance:
Consider adding indexes on machine_id and process_id to optimize the performance of this query, especially if the Activity table is large.

性能优化的索引:
考虑在 machine_id 和 process_id 列上创建索引，以优化查询性能，特别是在 Activity 表很大的情况下。

CODEBITWAVE