PROJECT 07高级
IoT 传感器时序数据异常检测
IoT Sensor Anomaly Detection
对 IoT 传感器时序数据进行重采样、移动平均平滑与动态阈值异常检测。
resample 重采样移动平均动态阈值EWMA异常标记
项目背景
某智能制造工厂部署了 200+ 温度/压力传感器。传感器数据采样频率不一致,存在噪声毛刺。需要建立实时监控管道,自动检测异常读数以避免设备故障。
模拟数据集
timestamp,sensor_id,temperature,pressure,vibration
2024-07-01 08:00:00,S001,72.5,1.02,0.15
2024-07-01 08:00:03,S001,72.8,1.03,0.14
2024-07-01 08:00:07,S001,73.1,1.01,0.16
2024-07-01 08:01:00,S001,72.9,1.02,0.15
2024-07-01 08:01:30,S001,95.2,1.80,0.85
2024-07-01 08:02:00,S001,73.0,1.03,0.14
2024-07-01 08:02:15,S001,73.2,1.02,0.15
2024-07-01 08:03:00,S001,150.0,2.50,1.20
2024-07-01 08:04:00,S001,73.1,1.01,0.16代码练习区
在下方编辑器中编写你的 Pandas 代码。可记录笔记、编写伪代码,参考答案在下方。
pandas_exercise.py
Loading...
参考答案
reference_solution.py
import pandas as pd
import numpy as np
df = pd.read_csv('sensor_data.csv')
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.set_index('timestamp')
# 1. 重采样到固定频率 (1分钟)
df_resampled = df.groupby('sensor_id').resample('1min').mean().dropna()
# 2. 移动平均平滑
df_resampled['temp_ma5'] = df_resampled['temperature'].rolling(window=5, min_periods=1).mean()
df_resampled['temp_ewma'] = df_resampled['temperature'].ewm(span=5).mean()
# 3. 动态阈值 (滚动均值 +/- 3*滚动标准差)
df_resampled['rolling_mean'] = df_resampled['temperature'].rolling(window=10).mean()
df_resampled['rolling_std'] = df_resampled['temperature'].rolling(window=10).std()
df_resampled['upper_bound'] = df_resampled['rolling_mean'] + 3 * df_resampled['rolling_std']
df_resampled['lower_bound'] = df_resampled['rolling_mean'] - 3 * df_resampled['rolling_std']
# 4. 异常标记
df_resampled['is_anomaly'] = (
(df_resampled['temperature'] > df_resampled['upper_bound']) |
(df_resampled['temperature'] < df_resampled['lower_bound'])
).fillna(False)
# 5. 异常持续检测 (连续 N 个异常才告警)
df_resampled['anomaly_group'] = (df_resampled['is_anomaly'] != df_resampled['is_anomaly'].shift()).cumsum()
anomaly_groups = df_resampled[df_resampled['is_anomaly']].groupby('anomaly_group')
sustained_anomalies = anomaly_groups.filter(lambda x: len(x) >= 3)业务解读
IoT 异常检测的关键是区分 '真实异常' 与 '传感器噪声'。移动平均平滑消除高频噪声,动态阈值比固定阈值更适应工况变化。'连续 N 个异常才告警' 策略可大幅降低误报率,避免运维人员产生告警疲劳。