PROJECT 03进阶
RFM 模型与用户价值聚类分层
RFM Customer Segmentation
构建 RFM 模型,使用 KMeans 聚类对用户进行价值分层,识别高价值客户群体。
RFM 建模KMeans 聚类StandardScalerPCA 降维轮廓系数
项目背景
某在线教育平台需要对 5000+ 注册用户进行精细化运营。传统的 RFM 模型(Recency, Frequency, Monetary)结合机器学习聚类,可以更精准地识别用户价值层级。
模拟数据集
user_id,signup_date,last_purchase,purchase_count,total_spent,course_completions
U1001,2023-01-15,2024-06-20,12,3580.00,8
U1002,2023-03-22,2024-01-05,3,890.00,1
U1003,2023-06-10,2024-07-01,25,8900.00,20
U1004,2023-02-28,2023-08-15,1,299.00,0
U1005,2023-09-01,2024-06-30,18,5200.00,15
U1006,2023-04-18,2024-03-10,7,2100.00,5
U1007,2023-11-05,2024-07-10,30,12000.00,28
U1008,2023-07-20,2023-12-01,2,598.00,1代码练习区
在下方编辑器中编写你的 Pandas 代码。可记录笔记、编写伪代码,参考答案在下方。
pandas_exercise.py
Loading...
参考答案
reference_solution.py
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics import silhouette_score
df = pd.read_csv('user_data.csv')
analysis_date = pd.Timestamp('2024-07-15')
# 1. 构建 RFM 指标
df['last_purchase'] = pd.to_datetime(df['last_purchase'])
df['recency'] = (analysis_date - df['last_purchase']).dt.days
df['frequency'] = df['purchase_count']
df['monetary'] = df['total_spent']
rfm = df[['user_id', 'recency', 'frequency', 'monetary']].copy()
# 2. 标准化
scaler = StandardScaler()
rfm_scaled = scaler.fit_transform(rfm[['recency', 'frequency', 'monetary']])
# 3. 最佳 K 值选择 (肘部法则)
inertias = []
for k in range(2, 8):
kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
kmeans.fit(rfm_scaled)
inertias.append(kmeans.inertia_)
# 4. 聚类 (假设 K=4)
kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)
rfm['cluster'] = kmeans.fit_predict(rfm_scaled)
# 5. PCA 降维可视化
pca = PCA(n_components=2)
rfm_2d = pca.fit_transform(rfm_scaled)
rfm['pca1'], rfm['pca2'] = rfm_2d[:, 0], rfm_2d[:, 1]
# 6. 聚类轮廓系数
score = silhouette_score(rfm_scaled, rfm['cluster'])
print(f"轮廓系数: {score:.3f}")
# 7. 用户分层命名
cluster_stats = rfm.groupby('cluster').mean()
# 根据 R/F/M 均值命名: 高价值、潜力、一般、流失业务解读
RFM 模型是用户分层的经典方法。Recency 反映活跃度,Frequency 反映忠诚度,Monetary 反映贡献度。结合 KMeans 聚类可自动发现用户群体,避免人为分箱的主观性。高价值用户(近+频+高消费)应重点维护,流失用户(远+低频)需要召回策略。