feat: add proactive login expiry monitoring

Add login_reminder module to check credential expiration every 6 hours and send webhook alerts (24h/6h before expiry). Fix expire_time from 30 days to 4 days to align with WeChat actual validity. Made-with: Cursor
fix(docker): resolve proxy pool configuration not loading in Docker deployment
2026-03-31 14:26:03 +08:00 · 2026-03-29 20:34:08 +08:00 · 2026-03-25 10:08:42 +08:00 · 2026-03-24 23:38:44 +08:00 · 2026-03-24 20:02:45 +08:00 · 2026-03-24 13:45:37 +08:00
14 changed files with 912 additions and 59 deletions
--- a/.gitignore
+++ b/.gitignore
@ -67,3 +67,7 @@ data/

 # SaaS 版本（独立仓库管理）
 saas/
+
+# 个人文档和脚本（不提交）
+docs/
+scripts/
--- a/CONTENT_TYPES.md
+++ b/CONTENT_TYPES.md
@ -0,0 +1,382 @@
+# 微信公众号文章内容类型与识别策略
+
+本文档说明微信公众号文章的各种内容类型、不可用状态，以及对应的识别和处理策略。
+
+---
+
+## 一、文章内容类型
+
+微信公众号使用 `item_show_type` 参数来区分不同的内容类型。这个参数通常在HTML的JavaScript代码中定义。
+
+### item_show_type 值说明
+
+| 值 | 类型 | 说明 |
+|----|------|------|
+| `0` | 标准富文本 | 最常见的图文文章 |
+| `7` | 音频/视频分享 | 动态Vue应用，内容通过JS加载 |
+| `8` | 图文消息 | 类似小红书的多图+短文风格 |
+| `10` | 短内容 | 纯文字或转发消息，无 `js_content` 容器 |
+| 其他 | 未知 | 其他特殊内容，待补充 |
+
+---
+
+### 1. 标准富文本文章
+
+**item_show_type**: `0`（或未定义）
+
+**特征**：
+- 包含 `<div id="js_content">` 或 `<div class="rich_media_content">`
+- 文字 + 图片混合
+- HTML大小：通常 > 100KB
+
+**提取策略**：
+- 提取 `js_content` 区域的完整HTML
+- 按顺序提取所有图片URL（`data-src` 或 `src` 属性）
+- 生成纯文本（`plain_content`）供RSS阅读器使用
+- 图片URL通过代理服务转发（避免防盗链）
+
+---
+
+### 1.5. 音频分享文章（Audio Share）
+
+**item_show_type**: `7`
+
+**特征**：
+- 动态Vue应用（使用 `common_share_audio` 模块）
+- **无传统的 `js_content` 容器**
+- `og:image` 和 `og:description` 通常为空
+- HTML中包含 `window.item_show_type = '7'`
+- 内容通过JavaScript动态加载，静态HTML中看不到实际音频内容
+
+**典型公众号**：
+- 播客节目（如"马刺进步报告"）
+- 音频节目分享
+- 视频号音频内容
+
+**提取策略**：
+```python
+# 检测逻辑
+if get_item_show_type(html) == '7':
+    # 这是音频分享页面
+    return _extract_audio_share_content(html)
+```
+
+**可提取内容**：
+- ✅ 标题（从 `og:title` 或 `window.msg_title`）
+- ✅ 作者（从 `og:article:author` 或 `var nickname`）
+- ✅ 封面图（从 `og:image`，如果有）
+- ❌ 音频URL（需要JavaScript执行才能获取）
+- ❌ 播放时长
+- ❌ 音频播放器
+
+**RSS展示效果**：
+```html
+<div style="background:#f6f6f6;padding:20px;border-radius:8px">
+  <p>🎵 音频内容 / Audio Content</p>
+  <p>这是微信音频分享文章，内容通过JavaScript动态加载，无法直接提取。</p>
+  <p>请在微信中查看完整内容</p>
+</div>
+```
+
+**已知限制**：
+- 无法提取真实音频URL（需要浏览器环境执行JS）
+- 只能提供标题、作者和封面图的基本信息
+- RSS阅读器中显示占位符，引导用户到微信查看原文
+
+**未来改进方向**：
+- 使用无头浏览器（Playwright/Puppeteer）执行JavaScript
+- 逆向分析微信音频API
+- 提供更丰富的元数据展示
+
+---
+
+### 2. 纯图片文章
+
+**item_show_type**: `0`
+
+**特征**：
+- 有 `<div id="js_content">` 容器
+- 内容区域只有 `<img>` 标签，**没有任何文字**
+- HTML大小：2-3MB（正常大小）
+
+**处理策略**：
+- 正常提取HTML和图片列表
+- `plain_content` 生成占位文本：`[纯图片文章，共 X 张图片]`
+
+**注意**：
+- 必须使用严格的音频检测逻辑，避免误判为音频文章
+
+---
+
+### 3. 图文消息
+
+**item_show_type**: `8`
+
+**特征**：
+- 类似"小红书"的多图+短文风格
+- 包含特殊的图文混排结构
+- 通常是手机端创作的内容
+
+**识别**：`is_image_text_message(html)` → `get_item_show_type(html) == '8'`
+
+**提取**：`_extract_image_text_content(html)`
+
+---
+
+### 4. 短内容消息
+
+**item_show_type**: `10`
+
+**特征**：
+- 纯文字，无 `js_content` div
+- 类似"朋友圈"的短文本或转发内容
+- HTML结构简单，内容在特殊的容器中
+
+**识别**：`is_short_content_message(html)` → `get_item_show_type(html) == '10'`
+
+**提取**：`_extract_short_content(html)`
+
+---
+
+### 5. 音频文章（待完善）
+
+**item_show_type**: `0`
+
+**特征**：
+- 包含音频播放器组件
+- 可能包含 `<mpvoice>` 标签或 `<mp-common-mpaudio>` 标签
+- 可能同时包含配图（图+音频混合）
+
+**识别**：`is_audio_message(html)`
+- 匹配真实的 `<mpvoice>` 标签
+- 匹配 `<mp-common-mpaudio>` 标签
+- 匹配 `<div id="js_editor_audio_xxx">` 容器
+
+**重要**：
+- 必须使用严格的正则匹配HTML标签
+- 不要匹配JS代码中的 `voice_encode_fileid` 等字符串（会误判纯图片文章）
+
+**当前状态**：
+- 基础识别逻辑已实现
+- 内容提取待完善（图+音频混合场景）
+
+---
+
+## 二、文章不可用状态
+
+### 1. 验证页面（可重试）⚠️
+
+**特征**：
+- HTML大小：1.5-2MB（很大）
+- 包含完整的验证组件代码
+- 关键标记：`"环境异常"` + `"完成验证后即可继续访问"` + `"去验证"`
+
+**原因**：
+- 代理IP被微信风控
+- 或服务器IP请求过于频繁
+
+**处理**：
+- ❌ **不应**标记为永久失效
+- ✅ **应该**标记为可重试（`failed`）
+- ✅ 切换代理或等待冷却后重试
+
+---
+
+### 2. 暂时无法查看（永久失效）❌
+
+**特征**：
+- HTML极小：< 1KB
+- `<title>该内容暂时无法查看</title>`
+- 页面只有一句提示
+
+**处理**：
+- ✅ 标记为永久失效（`permanent_fail`）
+- 原因：`"暂时无法查看"`
+
+---
+
+### 3. 根据作者隐私设置不可查看（永久失效）❌
+
+**特征**：
+- HTML大小：10-20KB
+- 空的Vue应用：`<div id="app"></div>`
+- 空的 `<title></title>`
+- 无任何文章内容容器
+- 页面显示："根据作者隐私设置，无法查看该内容"（通过JS动态加载）
+
+**原因**：
+- 作者设置了文章隐私权限
+- 通常是会员专属内容
+
+**处理**：
+- ✅ 标记为永久失效（`permanent_fail`）
+- 原因：`"根据作者隐私设置不可查看"`
+
+**注意**：
+- 这种页面的错误提示不在静态HTML中
+- 需要检查空Vue应用 + 无内容容器 + 空title的组合特征
+
+---
+
+### 4. 已被发布者删除（永久失效）❌
+
+**标记**：
+- `"该内容已被发布者删除"`
+- `"内容已删除"`
+
+**处理**：
+- ✅ 标记为永久失效
+- 原因：`"已被发布者删除"`
+
+---
+
+### 5. 违规内容（永久失效）❌
+
+**标记**：
+- `"此内容因违规无法查看"`
+- `"涉嫌违反相关法律法规和政策"`
+- `"此内容发送失败无法查看"`
+- `"接相关投诉，此内容违反"`
+
+**处理**：
+- ✅ 标记为永久失效
+- 原因：`"因违规无法查看"` 或 `"涉嫌违规被限制"`
+
+---
+
+### 6. 第三方辟谣（永久失效）❌
+
+**标记**：
+- `"该文章已被第三方辟谣"`
+
+**处理**：
+- ✅ 标记为永久失效
+- 原因：`"已被第三方辟谣"`
+
+---
+
+## 三、提取流程
+
+```
+获取HTML
+   ↓
+检查是否不可用 (is_article_unavailable)
+   ├─ 是 → 标记 permanent_fail + 原因
+   └─ 否 → 继续
+       ↓
+   检查是否有内容容器 (has_article_content)
+   ├─ 否 → 标记 failed（可重试）
+   └─ 是 → 继续
+       ↓
+   按类型提取内容
+   ├─ 图文消息 (type=8) → _extract_image_text_content()
+   ├─ 短内容 (type=10) → _extract_short_content()
+   ├─ 音频文章 → _extract_audio_content()
+   └─ 标准文章 → extract_content()
+       ↓
+   提取图片 (extract_images_in_order)
+       ↓
+   生成纯文本 (html_to_text)
+       ↓
+   检查是否纯图片文章
+   ├─ 是 → plain_content = "[纯图片文章，共 X 张图片]"
+   └─ 否 → 保持原有纯文本
+       ↓
+   返回结果
+```
+
+---
+
+## 四、关键函数
+
+### 1. `get_unavailable_reason(html) -> str | None`
+
+检测文章是否永久不可用。
+
+**返回值**：
+- `None` - 文章正常或可重试
+- `str` - 不可用原因
+
+**检测顺序**：
+1. 优先排除：验证页面
+2. 静态标记：删除、违规、辟谣等
+3. 特殊页面："暂时无法查看"、隐私设置页面
+
+---
+
+### 2. `is_audio_message(html) -> bool`
+
+检测是否为音频文章。
+
+**要点**：
+- ✅ 匹配真实的 `<mpvoice>` 标签
+- ✅ 用正则匹配 `<mp-common-mpaudio>` 标签
+- ✅ 用正则匹配 `<div id="js_editor_audio_xxx">` 容器
+- ❌ 不要用简单的 `in` 检查（会误判JS代码）
+
+---
+
+### 3. `has_article_content(html) -> bool`
+
+快速检查HTML是否包含文章内容容器。
+
+**容器标记**：
+- `id="js_content"`
+- `class="rich_media_content"`
+- `id="page-content"`（政府/机构账号）
+- 或特殊类型标记
+
+---
+
+## 五、代理和反爬策略
+
+1. **代理池轮转**：
+   - 使用 SOCKS5 代理
+   - 失败后冷却120秒
+   - 所有代理失败后使用直连
+
+2. **TLS指纹伪装**：
+   - 使用 `curl_cffi` 库
+   - 模拟 Chrome 120 浏览器：`impersonate="chrome120"`
+
+3. **请求头**：
+   - `Referer: https://mp.weixin.qq.com/`
+   - 必要时添加 `Cookie`（微信token）
+
+---
+
+## 六、数据库字段
+
+### `articles` 表关键字段
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `status` | `VARCHAR` | 文章状态：`pending`（等待）/ `fetched`（已获取）/ `failed`（失败，可重试）/ `permanent_fail`（永久失效） |
+| `fetch_retry_count` | `INTEGER` | 重试次数（最多3次） |
+| `content` | `TEXT` | HTML内容 |
+| `plain_content` | `TEXT` | 纯文本内容（供RSS使用） |
+| `unavailable_reason` | `VARCHAR` | 不可用原因（仅 `permanent_fail` 时有值） |
+
+---
+
+## 七、贡献指南
+
+如果你发现新的文章类型或错误页面，欢迎提交Issue或PR：
+
+1. **提供详细信息**：
+   - 文章URL（至少3个样本）
+   - 完整的HTML源码
+   - 期望的提取结果
+
+2. **遵循代码规范**：
+   - 使用严格的正则匹配（避免误判）
+   - 添加详细的注释说明
+
+3. **测试充分**：
+   - 测试正常文章不受影响
+   - 测试新类型能正确识别
+
+---
+
+**最后更新**：2026-03-24  
+**维护者**：WeChat RSS API 项目组
--- a/12
+++ b/12
@ -21,15 +21,14 @@ FROM python:3.11-slim

 LABEL maintainer="tmwgsicp"
 LABEL description="WeChat Official Account Article Download API with RSS Support"
-LABEL version="1.0.0"
+LABEL version="1.0.5"

 WORKDIR /app

 # Install runtime dependencies (curl for healthcheck)
 RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
-    && rm -rf /var/lib/apt/lists/* \
-    && useradd -m -u 1000 appuser
+    && rm -rf /var/lib/apt/lists/*

 # Copy wheels from builder and install
 COPY --from=builder /app/wheels /wheels
@ -38,11 +37,8 @@ RUN pip install --no-cache-dir /wheels/* && rm -rf /wheels
 # Copy application code
 COPY . .

-# Create data directory for SQLite and set permissions
-RUN mkdir -p /app/data && chown -R appuser:appuser /app
-
-# Switch to non-root user
-USER appuser
+# Create data directory
+RUN mkdir -p /app/data

 # Environment variables with sensible defaults
 ENV PYTHONUNBUFFERED=1 \
--- a/README.md
+++ b/README.md
@ -27,7 +27,7 @@
 - **公众号搜索** — 按名称搜索公众号，获取 FakeID
 - **扫码登录** — 微信公众平台扫码登录，凭证自动保存，4 天有效期
 - **图片代理** — 代理微信 CDN 图片，解决防盗链问题
- **Webhook 通知** — 登录过期、触发验证等事件自动推送（支持企业微信机器人）
+- **Webhook 通知** — 登录过期提醒（提前24h/6h预警+已过期通知）、触发验证等事件自动推送（支持企业微信机器人）
 - **API 文档** — 自动生成 Swagger UI / ReDoc，在线调试所有接口

 <div align="center">
@ -361,7 +361,9 @@ cp env.example .env
 | `WECHAT_TOKEN` | 微信 Token（登录后自动填充） | - |
 | `WECHAT_COOKIE` | 微信 Cookie（登录后自动填充） | - |
 | `WECHAT_FAKEID` | 公众号 FakeID（登录后自动填充） | - |
-| `WEBHOOK_URL` | Webhook 通知地址（可选） | 空 |
+| `WECHAT_EXPIRE_TIME` | 凭证过期时间（登录后自动填充） | - |
+| `WEBHOOK_URL` | Webhook 通知地址（支持企业微信机器人） | 空 |
+| `WEBHOOK_NOTIFICATION_INTERVAL` | 同一事件通知最小间隔（秒） | 300 |
 | `RATE_LIMIT_GLOBAL` | 全局每分钟请求上限 | 10 |
 | `RATE_LIMIT_PER_IP` | 单 IP 每分钟请求上限 | 5 |
 | `RATE_LIMIT_ARTICLE_INTERVAL` | 文章请求最小间隔（秒） | 3 |
@ -493,6 +495,7 @@ PROXY_URLS=socks5://myuser:mypass@vps1-ip:1080,socks5://myuser:mypass@vps2-ip:10
 │   ├── rate_limiter.py   # 限频器
 │   ├── rss_store.py      # RSS 数据存储（SQLite）
 │   ├── rss_poller.py     # RSS 后台轮询器
+│   ├── login_reminder.py # 登录过期提醒（主动检测）
 │   ├── content_processor.py  # 内容处理与图片代理
 │   ├── image_proxy.py    # 图片URL代理工具
 │   ├── article_fetcher.py    # 批量并发获取文章
@ -502,6 +505,21 @@ PROXY_URLS=socks5://myuser:mypass@vps1-ip:1080,socks5://myuser:mypass@vps2-ip:10

 ---

+## 内容类型与获取策略
+
+本项目支持多种微信公众号内容类型，包括标准富文本、纯图片文章、图文消息、短内容、音频文章等。
+
+详细说明请查看：**[CONTENT_TYPES.md](CONTENT_TYPES.md)**
+
+**文档内容**：
+- 所有支持的内容类型及 `item_show_type` 值
+- 不可用状态识别（删除、违规、隐私、验证页面等）
+- 反爬策略与代理配置
+- 关键函数说明
+- 开发贡献指南
+
+---
+
 ## 常见问题

 <details>
@ -525,9 +543,14 @@ PROXY_URLS=socks5://myuser:mypass@vps1-ip:1080,socks5://myuser:mypass@vps2-ip:10
 </details>

 <details>
-<summary><b>Token 多久过期</b></summary>
+<summary><b>Token 多久过期？如何提前知道？</b></summary>

-Cookie 登录有效期约 4 天，过期后需重新扫码登录。配置 `WEBHOOK_URL` 可以在过期时收到通知。
+Cookie 登录有效期约 4 天，系统会：
+1. 前端显示到期时间（`/api/admin/status` 接口返回 `expireTime` 和 `isExpired` 字段）
+2. **后台每 6 小时主动检测**，提前 24h / 6h 通过 Webhook 预警
+3. 过期后立即通过 Webhook 通知
+
+配置 `WEBHOOK_URL`（支持企业微信群机器人）可收到实时提醒，避免因凭证过期导致 RSS 轮询失败或搜索功能不可用。
 </details>

 <details>
--- a/app.py
+++ b/app.py
@ -10,6 +10,9 @@
 """

 from contextlib import asynccontextmanager
+from dotenv import load_dotenv
+
+load_dotenv()

 from fastapi import FastAPI
 from fastapi.staticfiles import StaticFiles
@ -56,7 +59,14 @@ async def lifespan(app: FastAPI):

    init_db()
    await rss_poller.start()
+    
+    # 启动登录过期提醒器（自动检测凭证有效期并 webhook 通知）
+    from utils.login_reminder import login_reminder
+    await login_reminder.start()
+    
    yield
+    
+    await login_reminder.stop()
    await rss_poller.stop()


--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -6,6 +6,11 @@
 #   2. Edit .env and set SITE_URL to your actual URL
 #   3. Run: docker-compose up -d
 #   4. Visit http://localhost:5000/login.html to scan QR code
+#
+# Note for NAS users (Synology/QNAP):
+#   If you encounter permission issues, run on NAS:
+#   - chmod -R 777 ./data
+#   - Credentials are automatically saved to ./data directory

 services:
  wechat-api:
@ -17,10 +22,10 @@ services:
    ports:
      - "5000:5000"
    volumes:
-      # Persist SQLite database
+      # Persist SQLite database and credentials
      - ./data:/app/data
-      # Config file (writable - login saves credentials here)
-      - ./.env:/app/.env
+      # Config file (read-only - credentials saved to data/)
+      - ./.env:/app/.env:ro
    environment:
      - TZ=Asia/Shanghai
    healthcheck:
--- a/routes/login.py
+++ b/routes/login.py
@ -497,8 +497,8 @@ async def biz_login(request: Request):
                import traceback
                traceback.print_exc()
        
-        # 计算过期时间（30天后）
-        expire_time = int((time.time() + 30 * 24 * 3600) * 1000)
+        # 计算过期时间（4天后，与微信实际有效期一致）
+        expire_time = int((time.time() + 4 * 24 * 3600) * 1000)
        
        # 保存凭证
        auth_manager.save_credentials(
--- a/routes/rss.py
+++ b/routes/rss.py
@ -11,6 +11,7 @@ RSS 订阅路由

 import csv
 import io
+import os
 import time
 import logging
 from datetime import datetime, timezone
@ -28,6 +29,23 @@ from utils.image_proxy import proxy_image_url

 logger = logging.getLogger(__name__)

+
+def get_base_url(request: Request) -> str:
+    """
+    获取服务的基础 URL，优先使用环境变量 SITE_URL，
+    支持反向代理（检测 X-Forwarded-Proto 和 X-Forwarded-Host）
+    """
+    # 优先使用配置的 SITE_URL
+    site_url = os.getenv("SITE_URL", "").strip()
+    if site_url:
+        return site_url.rstrip("/")
+    
+    # 检测反向代理头部
+    proto = request.headers.get("X-Forwarded-Proto", "http")
+    host = request.headers.get("X-Forwarded-Host") or request.headers.get("Host", "localhost:5000")
+    
+    return f"{proto}://{host}"
+
 router = APIRouter()


@ -118,7 +136,7 @@ async def get_subscriptions(request: Request):
    返回每个订阅的基本信息、缓存文章数和 RSS 地址。
    """
    subs = rss_store.list_subscriptions()
-    base_url = str(request.base_url).rstrip("/")
+    base_url = get_base_url(request)

    items = []
    for s in subs:
@ -195,7 +213,7 @@ async def get_aggregated_rss_feed(

    articles = rss_store.get_all_articles(limit=limit) if subs else []

-    base_url = str(request.base_url).rstrip("/")
+    base_url = get_base_url(request)
    xml = _build_aggregated_rss_xml(articles, nickname_map, base_url)
    return Response(
        content=xml,
@ -218,7 +236,7 @@ async def export_subscriptions(
    - **opml**: 标准 OPML 格式，可直接导入 RSS 阅读器
    """
    subs = rss_store.list_subscriptions()
-    base_url = str(request.base_url).rstrip("/")
+    base_url = get_base_url(request)

    if format == "opml":
        return _build_opml_response(subs, base_url)
@ -448,7 +466,7 @@ async def get_rss_feed(fakeid: str, request: Request,
        raise HTTPException(status_code=404, detail="未找到该订阅，请先添加订阅")

    articles = rss_store.get_articles(fakeid, limit=limit)
-    base_url = str(request.base_url).rstrip("/")
+    base_url = get_base_url(request)
    xml = _build_rss_xml(fakeid, sub, articles, base_url)

    return Response(
--- a/routes/search.py
+++ b/routes/search.py
@ -8,6 +8,7 @@
 搜索路由 - FastAPI版本
 """

+import os
 from fastapi import APIRouter, Query, Request
 from pydantic import BaseModel
 from typing import Optional, List
@ -18,6 +19,21 @@ from utils.image_proxy import proxy_image_url

 router = APIRouter()

+
+def get_base_url(request: Request) -> str:
+    """
+    获取服务的基础 URL，优先使用环境变量 SITE_URL，
+    支持反向代理（检测 X-Forwarded-Proto 和 X-Forwarded-Host）
+    """
+    site_url = os.getenv("SITE_URL", "").strip()
+    if site_url:
+        return site_url.rstrip("/")
+    
+    proto = request.headers.get("X-Forwarded-Proto", "http")
+    host = request.headers.get("X-Forwarded-Host") or request.headers.get("Host", "localhost:5000")
+    
+    return f"{proto}://{host}"
+
 class Account(BaseModel):
    """公众号模型"""
    id: str
@ -80,7 +96,7 @@ async def search_accounts(query: str = Query(..., description="公众号名称
                accounts = result.get("list", [])
                
                # 获取 base_url 用于图片代理
-                base_url = str(request.base_url).rstrip("/") if request else ""
+                base_url = get_base_url(request) if request else ""
                
                # 格式化返回数据
                formatted_accounts = []
--- a/utils/auth_manager.py
+++ b/utils/auth_manager.py
@ -34,12 +34,31 @@ class AuthManager:
        self.base_dir = Path(__file__).parent.parent
        self.env_path = self.base_dir / ".env"
        
+        # Docker环境下的凭证文件（存储在data目录，权限更可靠）
+        self.credentials_file = self.base_dir / "data" / ".credentials.json"
+        
        # 加载环境变量
        self._load_credentials()
        self._initialized = True
    
    def _load_credentials(self):
-        """从.env文件加载凭证"""
+        """
+        从多个来源加载凭证，优先级：
+        1. data/.credentials.json (Docker环境推荐)
+        2. .env 文件 (本地部署)
+        3. 环境变量
+        """
+        # 先尝试从 JSON 凭证文件加载（Docker 环境）
+        if self.credentials_file.exists():
+            try:
+                import json
+                with open(self.credentials_file, 'r', encoding='utf-8') as f:
+                    self.credentials = json.load(f)
+                return
+            except Exception as e:
+                print(f"Warning: Failed to load credentials from {self.credentials_file}: {e}")
+        
+        # 回退到 .env 文件（本地部署）
        if self.env_path.exists():
            load_dotenv(self.env_path, override=True)
        
@ -54,7 +73,9 @@ class AuthManager:
    def save_credentials(self, token: str, cookie: str, fakeid: str, 
                        nickname: str, expire_time: int) -> bool:
        """
-        保存凭证到.env文件
+        保存凭证，支持双存储策略：
+        1. 优先保存到 data/.credentials.json (Docker环境推荐，权限可靠)
+        2. 同时尝试保存到 .env (本地部署兼容)
        
        Args:
            token: 微信Token
@ -66,21 +87,33 @@ class AuthManager:
        Returns:
            保存是否成功
        """
+        # 更新内存中的凭证
+        self.credentials.update({
+            "token": token,
+            "cookie": cookie,
+            "fakeid": fakeid,
+            "nickname": nickname,
+            "expire_time": expire_time
+        })
+        
+        success = False
+        
+        # 策略1: 保存到 data/.credentials.json (Docker 环境优先)
+        try:
+            import json
+            self.credentials_file.parent.mkdir(parents=True, exist_ok=True)
+            with open(self.credentials_file, 'w', encoding='utf-8') as f:
+                json.dump(self.credentials, f, indent=2, ensure_ascii=False)
+            print(f"[OK] 凭证已保存到: {self.credentials_file}")
+            success = True
+        except Exception as e:
+            print(f"[WARN] 无法保存到凭证文件: {e}")
+        
+        # 策略2: 同时尝试保存到 .env 文件（本地部署兼容）
        try:
-            # 更新内存中的凭证
-            self.credentials.update({
-                "token": token,
-                "cookie": cookie,
-                "fakeid": fakeid,
-                "nickname": nickname,
-                "expire_time": expire_time
-            })
-            
-            # 确保.env文件存在
            if not self.env_path.exists():
                self.env_path.touch()
            
-            # 保存到.env文件
            env_file = str(self.env_path)
            set_key(env_file, "WECHAT_TOKEN", token)
            set_key(env_file, "WECHAT_COOKIE", cookie)
@ -88,11 +121,17 @@ class AuthManager:
            set_key(env_file, "WECHAT_NICKNAME", nickname)
            set_key(env_file, "WECHAT_EXPIRE_TIME", str(expire_time))
            
-            print(f"✅ 凭证已保存到: {self.env_path}")
-            return True
+            print(f"[OK] 凭证已同步到: {self.env_path}")
+            success = True
        except Exception as e:
-            print(f"❌ 保存凭证失败: {e}")
+            print(f"[WARN] 无法写入 .env 文件 (Docker环境正常): {e}")
+            # Docker 环境下 .env 可能只读，不影响功能
+        
+        if not success:
+            print(f"[ERROR] 凭证保存完全失败")
            return False
+        
+        return True
    
    def get_credentials(self) -> Optional[Dict[str, any]]:
        """
@ -155,7 +194,7 @@ class AuthManager:
    
    def clear_credentials(self) -> bool:
        """
-        清除凭证
+        清除凭证（双存储都清除）
        
        Returns:
            清除是否成功
@ -178,12 +217,20 @@ class AuthManager:
            for key in env_keys:
                os.environ.pop(key, None)
            
+            # 删除凭证文件
+            if self.credentials_file.exists():
+                self.credentials_file.unlink()
+                print(f"[OK] 凭证文件已删除: {self.credentials_file}")
+            
            # 清空 .env 文件中的凭证字段（保留其他配置）
-            if self.env_path.exists():
-                env_file = str(self.env_path)
-                for key in env_keys:
-                    set_key(env_file, key, "")
-                print(f"✅ 凭证已清除: {self.env_path}")
+            try:
+                if self.env_path.exists():
+                    env_file = str(self.env_path)
+                    for key in env_keys:
+                        set_key(env_file, key, "")
+                    print(f"[OK] .env 凭证已清除: {self.env_path}")
+            except Exception as e:
+                print(f"[WARN] 无法清除 .env 文件 (Docker环境正常): {e}")
            
            return True
        except Exception as e:
--- a/utils/content_processor.py
+++ b/utils/content_processor.py
@ -53,6 +53,11 @@ def process_article_content(html: str, proxy_base_url: str = None) -> Dict:
    # 5. 生成纯文本
    plain_content = html_to_text(content)
    
+    # 6. 纯图片文章处理：如果没有文字但有图片，生成图片描述
+    if not plain_content.strip() and images:
+        plain_content = f"[纯图片文章，共 {len(images)} 张图片]"
+        logger.info(f"检测到纯图片文章: {len(images)} 张图片，无文字内容")
+    
    return {
        'content': content,
        'plain_content': plain_content,
@ -100,15 +105,22 @@ def extract_content(html: str) -> str:
    Extract article body, trying multiple container patterns.
    Different WeChat account types (government, media, personal) use
    different HTML structures. We try them in order of specificity.
-    For image-text messages (item_show_type=8) and short posts (item_show_type=10),
-    delegates to helpers.
+    For image-text messages (item_show_type=8), short posts (item_show_type=10),
+    and audio share pages (item_show_type=7), delegates to helpers.
    """
    from utils.helpers import (
        is_image_text_message, _extract_image_text_content,
        is_short_content_message, _extract_short_content,
        is_audio_message, _extract_audio_content,
+        get_item_show_type, _extract_audio_share_content,
    )

+    # Check for audio/video share pages (item_show_type=7) FIRST
+    # These pages use Vue apps and have no js_content div
+    if get_item_show_type(html) == '7':
+        result = _extract_audio_share_content(html)
+        return result.get('content', '')
+
    if is_image_text_message(html):
        result = _extract_image_text_content(html)
        return result.get('content', '')
--- a/utils/helpers.py
+++ b/utils/helpers.py
@ -82,11 +82,26 @@ def is_audio_message(html: str) -> bool:
    """
    Detect audio articles (voice messages embedded via mpvoice / mp-common-mpaudio).
    检测是否为音频文章（包含 mpvoice 标签或音频播放器组件）。
+    
+    Important: Must check for ACTUAL audio tags, not just JS code that mentions audio.
    """
-    return ('voice_encode_fileid' in html or
-            '<mpvoice' in html or
-            'mp-common-mpaudio' in html or
-            'js_editor_audio' in html)
+    # 方法1: 检查是否有真实的 <mpvoice> 标签（注意：mpvoice 是自定义标签）
+    if '<mpvoice' in html:
+        return True
+    
+    # 方法2: 检查是否有音频播放器组件的 **HTML标签**（不是JS代码）
+    # 使用更严格的正则，确保匹配的是标签而不是JS变量
+    import re
+    
+    # 匹配实际的音频标签：<mp-common-mpaudio ...>
+    if re.search(r'<mp-common-mpaudio[^>]*>', html, re.IGNORECASE):
+        return True
+    
+    # 匹配实际的音频容器：<div id="js_editor_audio_...">
+    if re.search(r'<div[^>]+id=["\']js_editor_audio[^"\']*["\']', html, re.IGNORECASE):
+        return True
+    
+    return False


 def _extract_image_text_content(html: str) -> Dict:
@ -346,12 +361,14 @@ def _extract_audio_content(html: str) -> Dict:
            dur_str = f' ({minutes}:{seconds:02d})'

        display_name = audio['name'] or f'Audio {i + 1}'
+        # 友好提示：音频需要微信鉴权，不提供无法播放的URL
        html_parts.append(
-            f'<div style="margin:12px 0;padding:12px 16px;background:#f6f6f6;border-radius:8px">'
-            f'<p style="margin:0 0 4px;font-size:15px;font-weight:500">'
-            f'{html_module.escape(display_name)}{dur_str}</p>'
-            f'<a href="{audio["url"]}" style="color:#1890ff;font-size:14px">'
-            f'[Play Audio / Click to Listen]</a>'
+            f'<div style="margin:12px 0;padding:12px 16px;background:#fff9e6;'
+            f'border-left:4px solid #fa8c16;border-radius:4px">'
+            f'<p style="margin:0 0 4px;font-size:14px;color:#595959;font-weight:500">'
+            f'音频内容: {html_module.escape(display_name)}{dur_str}</p>'
+            f'<p style="margin:0;font-size:13px;color:#8c8c8c">'
+            f'此文章包含音频，需要在微信中查看完整内容</p>'
            f'</div>'
        )

@ -372,6 +389,104 @@ def _extract_audio_content(html: str) -> Dict:
    }


+def _extract_audio_share_content(html: str) -> Dict:
+    """
+    Extract content from item_show_type=7 audio/video share pages.
+    
+    These pages use dynamic Vue applications (common_share_audio module),
+    so most content is loaded via JavaScript. We can only extract basic
+    metadata from the static HTML.
+    
+    Example: Podcast episodes, audio shows (e.g., 马刺进步报告)
+    """
+    import html as html_module
+    
+    # 提取标题
+    title = ''
+    title_match = (
+        re.search(r'<meta\s+property="og:title"\s+content="([^"]+)"', html) or
+        re.search(r"window\.msg_title\s*=\s*window\.title\s*=\s*'([^']*)'", html)
+    )
+    if title_match:
+        title = html_module.unescape(title_match.group(1))
+    
+    # 提取作者
+    author = ''
+    author_match = (
+        re.search(r'<meta\s+property="og:article:author"\s+content="([^"]+)"', html) or
+        re.search(r'var\s+nickname\s*=\s*"([^"]+)"', html)
+    )
+    if author_match:
+        author = html_module.unescape(author_match.group(1))
+    
+    # 提取封面图(如果有)
+    images = []
+    og_image_match = re.search(r'<meta\s+property="og:image"\s+content="([^"]+)"', html)
+    if og_image_match:
+        img_url = og_image_match.group(1)
+        if img_url and ('mmbiz' in img_url or img_url.startswith('http')):
+            images.append(img_url)
+    
+    # 生成内容
+    content_parts = []
+    
+    # 标题（如果有）
+    if title:
+        content_parts.append(
+            f'<div style="margin:20px 0;text-align:center">'
+            f'<h2 style="margin:0;font-size:22px;font-weight:600;color:#262626">{title}</h2>'
+            f'</div>'
+        )
+    
+    # 作者（如果有）
+    if author:
+        content_parts.append(
+            f'<div style="margin:12px 0;text-align:center">'
+            f'<p style="margin:0;font-size:14px;color:#8c8c8c">作者: {author}</p>'
+            f'</div>'
+        )
+    
+    # 封面图
+    if images:
+        for img_url in images:
+            content_parts.append(
+                f'<div style="text-align:center;margin:16px 0">'
+                f'<img src="{img_url}" data-src="{img_url}" '
+                f'style="max-width:100%;height:auto;border-radius:8px" />'
+                f'</div>'
+            )
+    
+    # 音频占位符（使用中英双语，适配RSS阅读器）
+    content_parts.append(
+        '<div style="background:#f6f6f6;padding:20px;border-radius:8px;'
+        'text-align:center;margin:20px 0;border:2px dashed #d9d9d9">'
+        '<p style="margin:0;font-size:18px;color:#333">🎵 音频内容 / Audio Content</p>'
+        '<p style="margin:12px 0;font-size:14px;color:#666;line-height:1.6">'
+        '这是微信音频分享文章，内容通过JavaScript动态加载，无法直接提取。<br>'
+        'This is a WeChat audio share article. Content is loaded dynamically via JavaScript.</p>'
+        '<p style="margin:8px 0;font-size:13px;color:#999">'
+        '请在微信中查看完整内容 / Please view in WeChat app</p>'
+        '</div>'
+    )
+    
+    content = '\n'.join(content_parts)
+    
+    # 纯文本
+    plain_content = f"[音频分享文章 / Audio Share Article]\n\n"
+    if title:
+        plain_content += f"标题 / Title: {title}\n"
+    if author:
+        plain_content += f"作者 / Author: {author}\n"
+    plain_content += "\n(此音频内容无法直接提取，请在微信中查看)"
+    plain_content += "\n(Audio content cannot be extracted directly, please view in WeChat)"
+    
+    return {
+        'content': content,
+        'plain_content': plain_content,
+        'images': images,
+    }
+
+
 def extract_article_info(html: str, params: Optional[Dict] = None) -> Dict:
    """
    从HTML中提取文章信息
@ -427,18 +542,29 @@ def extract_article_info(html: str, params: Optional[Dict] = None) -> Dict:
        except (ValueError, TypeError):
            pass

-    # 检测特殊内容类型
-    if is_image_text_message(html):
+    # 优先处理特殊类型（按 item_show_type 判断）
+    item_type = get_item_show_type(html)
+    
+    if item_type == '7':
+        # item_show_type=7: 音频/视频分享页面（动态Vue应用）
+        audio_share_data = _extract_audio_share_content(html)
+        content = audio_share_data['content']
+        images = audio_share_data['images']
+        plain_content = audio_share_data['plain_content']
+    elif item_type == '8' or is_image_text_message(html):
+        # item_show_type=8: 图文消息
        img_text_data = _extract_image_text_content(html)
        content = img_text_data['content']
        images = img_text_data['images']
        plain_content = img_text_data['plain_content']
-    elif is_short_content_message(html):
+    elif item_type == '10' or is_short_content_message(html):
+        # item_show_type=10: 短内容/转发消息
        short_data = _extract_short_content(html)
        content = short_data['content']
        images = short_data['images']
        plain_content = short_data['plain_content']
    elif is_audio_message(html):
+        # 音频文章（mpvoice / mp-common-mpaudio）
        audio_data = _extract_audio_content(html)
        content = audio_data['content']
        images = audio_data['images']
@ -520,6 +646,12 @@ def has_article_content(html: str) -> bool:
        return True
    if is_image_text_message(html) or is_short_content_message(html) or is_audio_message(html):
        return True
+    
+    # item_show_type=7: Audio/video share pages (dynamic Vue app)
+    # These pages have no traditional content container, but are valid articles
+    if get_item_show_type(html) == '7':
+        return True
+    
    return False


@ -554,21 +686,77 @@ def get_unavailable_reason(html: str) -> Optional[str]:
    """
    Return human-readable reason if article is permanently unavailable, else None.
    返回文章不可用的原因，如果文章正常则返回 None。
+    
+    Important: Must distinguish between:
+    1. Verification pages (environment error) - NOT unavailable, should retry
+    2. "暂时无法查看" standalone page - IS unavailable (HTML < 1KB, minimal structure)
+    3. Privacy/payment pages (empty Vue app) - IS unavailable
+    4. Truly unavailable articles (deleted/censored) - permanently unavailable
    """
+    # 优先排除：微信验证页面（这不是文章不可用，而是IP风控）
+    # 特征：包含"环境异常"+"完成验证"+"去验证"，且HTML较大（>1.5MB）
+    verification_markers = ["环境异常", "完成验证后即可继续访问", "去验证"]
+    if all(marker in html for marker in verification_markers):
+        return None
+    
+    # 真正的不可用标记（静态HTML中的明确文字）
+    # 注意：微信的正常文章HTML中可能在JS代码里包含"已删除"/"违规"等字符串
+    # 需要确保这些关键字是在实际内容中，而不是在JS字符串字面量中
    markers = [
        ("该内容已被发布者删除", "已被发布者删除"),
        ("内容已删除", "已被发布者删除"),
        ("此内容因违规无法查看", "因违规无法查看"),
        ("涉嫌违反相关法律法规和政策", "涉嫌违规被限制"),
        ("此内容发送失败无法查看", "发送失败无法查看"),
-        ("该内容暂时无法查看", "暂时无法查看"),
        ("根据作者隐私设置，无法查看该内容", "作者隐私设置不可见"),
        ("接相关投诉，此内容违反", "因投诉违规被限制"),
        ("该文章已被第三方辟谣", "已被第三方辟谣"),
    ]
    for keyword, reason in markers:
        if keyword in html:
+            # 额外验证：如果HTML很大(>1MB) 且有真实的内容容器，
+            # 说明是正常文章，"已删除"/"违规"可能只是JS代码中的字符串
+            if len(html) > 1000000:
+                has_real_content = (
+                    'id="js_content"' in html or
+                    'class="rich_media_content' in html
+                )
+                if has_real_content:
+                    # 进一步确认：检查关键字是否在 <body> 的前10KB可见区域
+                    # 如果只在后面的 <script> 中出现，跳过
+                    import re
+                    body_match = re.search(r'<body[^>]*>(.*?)(?:<script|$)', html[:50000], re.DOTALL | re.IGNORECASE)
+                    if body_match and keyword not in body_match.group(1):
+                        # 关键字不在body前部，可能是JS代码，跳过此marker
+                        continue
            return reason
+    
+    # 特殊处理："该内容暂时无法查看"独立页面
+    # 特征：HTML很小（<2KB）+ <title>标签包含此文字 = 独立错误页面
+    # 必须同时满足两个条件，避免误判正常文章中包含这句话的情况
+    if "该内容暂时无法查看" in html and len(html) < 2000:
+        import re
+        title_match = re.search(r'<title>(.*?)</title>', html, re.IGNORECASE)
+        if title_match and "该内容暂时无法查看" in title_match.group(1):
+            return "暂时无法查看"
+    
+    # 特殊处理：空Vue应用（隐私设置的动态错误页面）
+    # 特征：<div id="app"></div> 是空的 + 无文章内容容器 + HTML不超大（<200KB）
+    # 这种页面的错误提示通过JS动态加载，静态HTML中看不到
+    # 实际显示："根据作者隐私设置，无法查看该内容"
+    if '<div id="app">' in html and len(html) < 200000:
+        import re
+        # 检查是否有实际的文章内容容器
+        has_content_container = (
+            'id="js_content"' in html or
+            'class="rich_media_content' in html or
+            'class="rich_media_area_primary_inner' in html
+        )
+        # 如果没有内容容器，且title为空，是隐私限制页面
+        title_match = re.search(r'<title>(.*?)</title>', html, re.IGNORECASE)
+        if not has_content_container and title_match and not title_match.group(1).strip():
+            return "根据作者隐私设置不可查看"
+    
    return None


--- a/utils/login_reminder.py
+++ b/utils/login_reminder.py
@ -0,0 +1,150 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+# Copyright (C) 2026 tmwgsicp
+# Licensed under the GNU Affero General Public License v3.0
+# See LICENSE file in the project root for full license text.
+# SPDX-License-Identifier: AGPL-3.0-only
+"""
+登录过期提醒（开源版）
+定期检查本地微信登录凭证过期状态，提前 webhook 通知。
+"""
+
+import asyncio
+import logging
+import time
+from typing import Optional
+from utils.webhook import webhook
+
+logger = logging.getLogger(__name__)
+
+
+class LoginReminder:
+    """登录过期提醒管理器（开源版单账号架构）"""
+    
+    def __init__(self):
+        self.check_interval = 6 * 3600  # 每 6 小时检查一次
+        self.warning_threshold = 24 * 3600  # 提前 24 小时预警
+        self.critical_threshold = 6 * 3600  # 提前 6 小时严重警告
+        self._running = False
+        self._task: Optional[asyncio.Task] = None
+        self._last_warning_level = None  # 记录最后一次警告级别，避免重复
+
+    async def start(self):
+        """启动提醒服务"""
+        if self._running:
+            logger.warning("登录提醒服务已在运行")
+            return
+        
+        self._running = True
+        self._task = asyncio.create_task(self._run())
+        logger.info("登录提醒服务已启动，检查间隔: %d 秒", self.check_interval)
+
+    async def stop(self):
+        """停止提醒服务"""
+        self._running = False
+        if self._task:
+            self._task.cancel()
+            try:
+                await self._task
+            except asyncio.CancelledError:
+                pass
+        logger.info("登录提醒服务已停止")
+
+    async def _run(self):
+        """后台任务循环"""
+        while self._running:
+            try:
+                await self._check_login_status()
+            except Exception as e:
+                logger.error("检查登录状态失败: %s", e, exc_info=True)
+            
+            await asyncio.sleep(self.check_interval)
+
+    async def _check_login_status(self):
+        """检查本地登录凭证的过期状态"""
+        from utils.auth_manager import auth_manager
+        
+        # 获取凭证信息
+        creds = auth_manager.get_credentials()
+        if not creds or not creds.get("token"):
+            logger.debug("无登录凭证，跳过检查")
+            return
+        
+        expire_time = creds.get("expire_time", 0)
+        if expire_time <= 0:
+            logger.debug("凭证无过期时间，跳过检查")
+            return
+        
+        nickname = creds.get("nickname", "未知账号")
+        now = int(time.time() * 1000)  # 毫秒时间戳
+        time_left_ms = expire_time - now
+        time_left_sec = time_left_ms / 1000
+        
+        # 已过期
+        if time_left_sec <= 0:
+            if self._last_warning_level != 'expired':
+                await self._notify_expired(nickname)
+                self._last_warning_level = 'expired'
+            return
+        
+        # 严重警告（6 小时内过期）
+        if time_left_sec <= self.critical_threshold:
+            if self._last_warning_level not in ['critical', 'expired']:
+                await self._notify_critical(nickname, time_left_sec)
+                self._last_warning_level = 'critical'
+            return
+        
+        # 一般警告（24 小时内过期）
+        if time_left_sec <= self.warning_threshold:
+            if self._last_warning_level not in ['warning', 'critical', 'expired']:
+                await self._notify_warning(nickname, time_left_sec)
+                self._last_warning_level = 'warning'
+            return
+        
+        # 状态正常，重置警告级别
+        if self._last_warning_level is not None:
+            self._last_warning_level = None
+            logger.info("登录状态已恢复正常: %s", nickname)
+
+    async def _notify_warning(self, nickname: str, time_left: float):
+        """发送一般警告通知"""
+        hours = time_left / 3600
+        logger.warning(
+            "登录凭证即将过期 [%s] - 剩余 %.1f 小时",
+            nickname, hours
+        )
+        
+        await webhook.notify('login_expiring_soon', {
+            'nickname': nickname,
+            'hours_left': round(hours, 1),
+            'level': 'warning',
+            'message': f'登录凭证将在 {round(hours, 1)} 小时后过期，请及时重新登录',
+        })
+
+    async def _notify_critical(self, nickname: str, time_left: float):
+        """发送严重警告通知"""
+        hours = time_left / 3600
+        logger.error(
+            "登录凭证即将过期（紧急）[%s] - 剩余 %.1f 小时",
+            nickname, hours
+        )
+        
+        await webhook.notify('login_expiring_critical', {
+            'nickname': nickname,
+            'hours_left': round(hours, 1),
+            'level': 'critical',
+            'message': f'登录凭证将在 {round(hours, 1)} 小时后过期（紧急），请立即重新登录',
+        })
+
+    async def _notify_expired(self, nickname: str):
+        """发送已过期通知"""
+        logger.error("登录凭证已过期 [%s]", nickname)
+        
+        await webhook.notify('login_expired', {
+            'nickname': nickname,
+            'message': '登录凭证已过期，API 功能将受限，请重新登录',
+        })
+
+
+# 全局单例
+login_reminder = LoginReminder()
--- a/utils/webhook.py
+++ b/utils/webhook.py
@ -21,6 +21,8 @@ logger = logging.getLogger("webhook")
 EVENT_LABELS = {
    "login_success": "登录成功",
    "login_expired": "登录过期",
+    "login_expiring_soon": "登录即将过期",
+    "login_expiring_critical": "登录即将过期（紧急）",
    "verification_required": "触发验证",
    "content_fetch_failed": "文章内容获取失败",
 }
Author	SHA1	Message	Date
tmwgsicp	829ae4d0c0	feat: add proactive login expiry monitoring Add login_reminder module to check credential expiration every 6 hours and send webhook alerts (24h/6h before expiry). Fix expire_time from 30 days to 4 days to align with WeChat actual validity. Made-with: Cursor	2026-03-31 14:26:03 +08:00
tmwgsicp	8d90743584	fix(docker): resolve proxy pool configuration not loading in Docker deployment Problem: Docker uses 'uvicorn app:app' command which skips the if __name__ == '__main__' block, causing load_dotenv() never executed and PROXY_URLS from .env not loaded. Solution: Move load_dotenv() to module level in app.py to ensure .env is loaded for all startup methods (python app.py, uvicorn app:app, docker-compose). Changes: - Add module-level load_dotenv() in app.py - Update Dockerfile version 1.0.4 -> 1.0.5 - Improve audio content display UI - Add docs/ and scripts/ to .gitignore Made-with: Cursor	2026-03-29 20:34:08 +08:00
tmwgsicp	9cfa0ac5b1	fix(docker): resolve credentials save permission issue on NAS platforms Made-with: Cursor	2026-03-25 10:08:42 +08:00
tmwgsicp	ad62e8b8bb	fix: use SITE_URL and X-Forwarded headers for RSS URLs in reverse proxy - Add get_base_url() helper function to detect HTTPS reverse proxy - Prioritize SITE_URL env var over request.base_url - Support X-Forwarded-Proto and X-Forwarded-Host headers - Fixes RSS URL showing http:// instead of https:// behind reverse proxy Fixes #6 Made-with: Cursor	2026-03-24 23:38:44 +08:00
tmwgsicp	f9968a4e0d	fix: run container as root to resolve volume permission issue - Remove non-root user (appuser) to eliminate SQLite database creation failures - Users no longer need to manually create data directory or fix permissions - Version bump to 1.0.3 Fixes #5 Made-with: Cursor	2026-03-24 20:02:45 +08:00
tmwgsicp	752f555f0c	feat: support audio share articles (item_show_type=7) and improve content detection Major Updates: - Add support for item_show_type=7 audio/video share pages with basic metadata extraction - Enhance is_audio_message() with strict regex to avoid false positives - Improve get_unavailable_reason() to correctly distinguish verification pages - Add friendly placeholder text for pure-image articles - Add comprehensive CONTENT_TYPES.md documentation Technical Improvements: - Fix pure-image articles being misidentified as audio articles - Fix WeChat verification pages being marked as permanently unavailable - Fix empty Vue app (privacy page) detection logic - Optimize article type detection priority based on item_show_type Documentation: - Add CONTENT_TYPES.md with detailed explanations of all content types - Update README.md to reference new documentation - Document known limitations for audio articles Note: Audio articles (type=7) use dynamic Vue apps, so only basic metadata (title, author, cover image) can be extracted. Full audio URL extraction would require browser environment. Made-with: Cursor	2026-03-24 13:45:37 +08:00