Compare commits

...

No commits in common. "11ca324d6b88af2e61a8a7b251b795a75100c512" and "677838c29ed421406dcf09537d9457ea09fee029" have entirely different histories.

31 changed files with 4967 additions and 1449 deletions

View File

@ -1,9 +1,16 @@
# LLM API 配置(用于提取会议结构化信息)
# LLM API
LLM_API_KEY=sk-your-api-key
LLM_BASE_URL=https://api.deepseek.com/v1
LLM_MODEL=deepseek-chat
# Embedding 配置(用于 LlamaIndex 向量存储)
# Embedding API
EMBEDDING_API_KEY=sk-your-embedding-key
EMBEDDING_BASE_URL=https://api.openai.com/v1
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_MODEL=text-embedding-3-small
# Neo4j
NEO4J_ENABLED=false
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=
NEO4J_DATABASE=neo4j

3
.gitignore vendored
View File

@ -1,5 +1,4 @@
__pycache__/
obsidian_vault/
vector_store_data/
.env
.vscode
.vscode

43
MIGRATION_TASKS.md 100644
View File

@ -0,0 +1,43 @@
# Migration Tasks
## Goal
Align this project toward `D:\github_project\graphiti` while keeping the meeting-processing flow usable and making the codebase easier to maintain.
Target direction:
- Neo4j is the only persistence layer for graph and retrieval data
- Retrieval is hybrid: semantic similarity + keyword/fact recall + graph relationship context
- Storage is more provenance-friendly, closer to `Meeting / Episode / Entity / Fact`
- Core implementation lives in package modules instead of the repository root
## In Progress
- [ ] No active migration tasks
## Todo
- [ ] Clean up any stale data directories only after explicit user confirmation
## Done
- [x] Step 1: Extract a shared embedding utility and stop coupling semantic retrieval to the old vector-store implementation
- [x] Step 2.1: Create a package structure and move shared foundations out of the repository root
- [x] Step 2.2: Move extraction, raw storage, and state tracking into package modules
- [x] Step 2.3: Move graph storage, processing, and CLI into package modules
- [x] Step 3: Redesign Neo4j schema from simple `Meeting -> Entity -> RELATES_TO` into `Meeting / Episode / Entity / Fact`
- [x] Step 4: Store semantic retrieval payload inside Neo4j instead of external vector storage
- [x] Step 5: Replace current query path with hybrid retrieval over Neo4j candidates
- [x] Step 6: Replace duplicate detection to use Neo4j-backed semantic matching and exact meeting lookup
- [x] Step 7: Remove runtime dependency on `llama-index` and `chroma`
- [x] Step 8: Update CLI stats output to reflect hybrid retrieval structures such as episodes and facts
- [x] Step 9: Update README and environment instructions to match the new architecture
- [x] Step 10: Run end-to-end verification on `process`, `query`, and `stats` with a real Neo4j environment
- [x] Remove Obsidian from the project documentation and dependency surface
- [x] Remove Obsidian from the runtime processing pipeline
- [x] Move raw meeting archival to `data/raw`
- [x] Move meeting state storage to `data/meeting_state.json`
- [x] Introduce Neo4j configuration and a minimal graph storage layer
- [x] Write extracted meeting entities and relations into Neo4j
- [x] Add graph statistics to the CLI status output
- [x] Redesign retrieval to combine vector recall with graph facts

279
README.md
View File

@ -1,203 +1,130 @@
# 会议纪要长期记忆系统
基于 LLM + LlamaIndex 向量库 + Obsidian 知识图谱的会议纪要长期记忆管理系统,支持**行动项状态追踪**与**双重内容去重**。
一个面向会议纪要的长期记忆原型,当前架构已经从“根目录脚本堆叠 + 外部向量库存储”迁移为更清晰的包结构,并收敛到:
## 工作流程
- `Neo4j` 作为唯一图存储与检索数据载体
- `Embedding + 关键词 + 图事实` 的混合检索模式
- 更接近 `graphiti``Meeting / Episode / Entity / Fact` 数据组织方式
```
会议纪要.md ──→ ①内容哈希查重 ──→ ②语义相似度查重 ──→ LLM 结构化抽取 ──→ 状态合并
│ │ │
│ │ ┌─────┘
│ │ │ meeting_state.json
│ │ │ (行动项/指标历史/内容哈希)
│ │ └─────┐
│ │ │
│ ├──→ ③标题+日期查重 ──┼──→ Obsidian Vault
│ │ │ ├── Raw/
│ │ │ ├── Meetings/
│ │ │ ├── Entities/
│ │ │ └── Graphs/
│ │ │
│ │ └──→ 向量索引持久化
│ │
└── 命中 → 跳过 ────┘── 命中 → [s]跳过 / [o]覆盖
## 当前能力
- 会议文本结构化抽取
- 原文归档到 `data/raw`
- 行动项和指标状态的跨会议合并
- 基于内容哈希和语义相似度的重复检测
- 基于 `Neo4j` 的图谱写入
- 基于 `Neo4j` 的混合检索
## 处理流程
```text
meeting.md
-> 内容哈希去重
-> Neo4j 语义相似去重
-> LLM 抽取结构化信息
-> 原文归档
-> 行动项 / 指标状态合并
-> 写入 Neo4j:
Meeting
Episode
Entity
Fact
```
## 快速开始
## 项目结构
```bash
cd meeting_memory
# 1. 安装依赖
python -m venv .venv
.venv\Scripts\pip install -r requirements.txt
# 2. 配置 API
cp .env.example .env
# 编辑 .env填入你的 LLM 和 Embedding API 信息
# 3. 处理一个会议纪要
.venv\Scripts\python main.py process 会议文件.md
# 4. 用 Obsidian 打开 obsidian_vault/ 查看知识图谱
```
## 使用方式
### 交互模式(推荐)
```bash
.venv\Scripts\python main.py
```
进入后可直接输入问题查询,支持以下命令:
| 命令 | 说明 |
|------|------|
| `query 问题` | 语义查询会议记忆 |
| `process 文件路径` | 处理新的会议文件 |
| `stats` | 查看统计 |
| `exit/quit` | 退出 |
非命令文本自动作为查询处理。
### 命令行模式
```bash
# 处理会议文件(重复时会交互询问跳过/覆盖)
python main.py process meeting_example.md
# 强制覆盖(不询问,清理旧数据后重新处理)
python main.py process meeting_example.md -f
# 语义查询
python main.py query "弱光指标目标值是多少?"
# 查看统计
python main.py stats
# 直接输入文本
python main.py text "今天会议讨论了..."
# 批量处理(自动交互,推荐加 -f 跳过确认)
python main.py batch "meetings/*.md" -f
```
## 架构
```
```text
meeting_memory/
├── config.py 配置 (LLM / Embedding / Obsidian / 向量库 / 状态路径)
├── extractor.py LLM 从会议纪要中抽取结构化信息
│ ├── title, date, participants
│ ├── entities (人物/组织/指标/概念)
│ ├── relations (主体-谓词-客体)
│ ├── action_items (任务+负责人+截止)
│ ├── metrics (指标+数值+趋势)
│ └── decisions (决策记录)
├── meeting_state.py ★ 跨会议状态追踪引擎
│ ├── ActionItem: 按 task+assignee 哈希匹配
│ ├── Metric: 按 metric_name+owner 哈希匹配
│ ├── 历史演变记录 (时间线)
│ ├── 会议系列自动识别 (去除期号后缀)
│ └── ★ 内容哈希注册表 (content_hashes) 防重复
├── vector_store.py LlamaIndex 向量索引管理
│ ├── 自定义 Embedding 适配 (兼容任意 OpenAI 兼容 API)
│ ├── 会议文档向量化存储 (含演变信息)
│ ├── 语义检索 (similarity_top_k)
│ ├── ★ 查重 + 按 meeting_id 删除覆盖
│ └── ★ 原文语义相似度查重 (find_similar_text)
├── obsidian_manager.py Obsidian Vault 生成器
│ ├── Raw/ — 未加工的原文 (status: unprocessed/processed)
│ ├── Meetings/ — 完整会议笔记 + YAML frontmatter
│ ├── Entities/ — 实体笔记 (含行动项时间线)
│ └── Graphs/ — 知识图谱总览 (MOC)
├── meeting_processor.py 主流程编排
│ ├─ 内容哈希查重 → 语义相似度查重 → LLM 抽取 → 状态合并 → Obsidian → 向量库
│ ├─ ★ 前置去重 (LLM 调用前),避免无效 API 调用
│ └─ ★ 重复处理时支持 skip/overwrite 选择
├── main.py CLI 入口 (交互模式 + 子命令,支持 -f 强制覆盖)
├── requirements.txt 依赖
├── .env 密钥配置
├── meeting_state.json ★ 跨会议状态持久化文件 (行动项/指标历史演变/内容哈希注册表)
├── vector_store_data/ 向量索引持久化目录
└── obsidian_vault/ Obsidian 知识库 (可直接用 Obsidian 打开)
├── .obsidian/ Obsidian 配置 (app.json, core-plugins.json)
├── Raw/ ★ 未加工原文 (处理前先保存)
├── Meetings/ 会议笔记 *.md
├── Entities/ 实体笔记 *.md (含历史时间线)
└── Graphs/ 知识图谱总览
├── meeting_memory/
│ ├── __init__.py
│ ├── cli.py
│ ├── config.py
│ ├── extractor.py
│ ├── graph_store.py
│ ├── meeting_processor.py
│ ├── meeting_state.py
│ ├── raw_store.py
│ └── services/
│ ├── __init__.py
│ └── embedding_service.py
├── data/
│ ├── raw/
│ └── meeting_state.json
├── main.py
├── MIGRATION_TASKS.md
└── requirements.txt
```
## 核心能力
说明:
### 1. LLM 结构化抽取
- `meeting_memory/` 包目录中是当前真实实现
- 根目录现在只保留 `main.py` 作为 CLI 入口,其他实现全部收拢到包目录
- `vector_store.py` 已移除,检索能力已迁到 `Neo4j` 图结构中
输入原始会议纪要,自动抽取:
## 环境配置
- **会议元信息**: 标题、日期、参会人
- **实体**: 人物、部门、项目、KPI指标、概念制度
- **关系**: 主体-谓词-客体 (如 `建维部 → 负责 → 网络运维`)
- **行动项**: 任务描述 + 负责人 + 截止时间 + 优先级
- **指标**: 指标名 + 当前值 + 目标值 + 趋势 (向好/持平/恶化)
- **决策**: 决策内容 + 提出人 + 状态
复制环境变量模板:
### 2. LlamaIndex 向量检索
```bash
copy .env.example .env
```
- 会议内容向量化存储
- 支持自然语言语义查询
- 持久化索引,重启自动加载
- 兼容任意 OpenAI 兼容的 Embedding API
### 3. 跨会议行动项追踪
- 每个行动项按 `task + assignee` 生成稳定哈希 ID
- 同系列会议(自动去除"第X期"后缀)中的相同任务被自动匹配
- 状态变更历史完整保留:`待办 → 进行中 → 已完成`
- Obsidian 笔记中展示完整时间线
- `meeting_state.json` 持久化所有历史
### 4. 双重内容去重
处理前在 LLM 调用之前进行两道去重检查,避免重复内容污染记忆库:
- **① 内容哈希指纹**SHA256(原文) 精确匹配,拦截完全相同的文件/文本纳秒级100% 确定)
- **② 语义相似度**:原文 Embedding 余弦相似度 > 0.92 时触发,拦截同一会议的不同转录版本
- **③ 标题+日期查重**兜底LLM 提取后,在向量库中检索相同标题/日期的会议
- 命中后交互询问:**[s]跳过** 或 **[o]覆盖**
- 覆盖模式:删除旧向量节点 + 旧 Obsidian 笔记 + 旧哈希注册,重新处理
- `-f / --force` 标志跳过所有确认,适用于批量处理
### 5. Obsidian 知识图谱
- 自动生成完整的 Obsidian Vault
- 所有实体独立笔记,`[[Wiki Link]]` 双向链接
- 实体笔记中的行动项显示**最新状态 + 历史演变**
- 打开 Obsidian Graph View 即可看到实体关系网络
- 知识图谱总览提供全局索引
- `.obsidian/` 配置自动生成
## 配置
编辑 `.env`:
填写配置:
```ini
# LLM API (用于结构化抽取)
LLM_API_KEY=sk-xxx
LLM_BASE_URL=https://api.deepseek.com/v1
LLM_MODEL=deepseek-chat
# Embedding API (用于向量检索)
EMBEDDING_API_KEY=sk-xxx
EMBEDDING_BASE_URL=https://api.openai.com/v1
EMBEDDING_MODEL=text-embedding-3-small
NEO4J_ENABLED=true
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password
NEO4J_DATABASE=neo4j
```
## 依赖
## 安装
- `openai` — LLM 调用
- `pydantic` — 结构化数据模型
- `llama-index` — 向量索引与语义检索
- `chromadb` — 向量数据库后端
- `python-dotenv` — 环境变量管理
- `pyvis` — 图谱可视化 (扩展功能)
```bash
python -m venv .venv
.venv\Scripts\pip install -r requirements.txt
```
## 使用方式
```bash
python main.py
python main.py process meeting_example.md
python main.py process meeting_example.md -f
python main.py text "今天会议讨论了弱光指标和交付节奏"
python main.py query "弱光指标目标值是多少"
python main.py stats
python main.py batch "meetings/*.md" -f
```
## 检索设计
当前查询不再依赖独立向量库,而是基于 `Neo4j` 中的三类候选进行混合排序:
- `Episode`:会议级文本上下文
- `Entity`:实体摘要与描述
- `Fact`:主体-关系-客体事实
排序信号包括:
- 语义相似度
- 关键词命中
- 图事实加权
## 迁移说明
迁移任务记录见 [MIGRATION_TASKS.md](/d:/github_project/my_code/meeting_memory/MIGRATION_TASKS.md:1)。
## 当前限制
- 当前环境如果没有安装 `neo4j` Python 包,导入图存储模块时会退化为禁用状态
- 由于本地运行环境限制,端到端验证仍然依赖可用的 Neo4j 实例和正确的凭据

View File

@ -0,0 +1,497 @@
{
"action_items": {
"59f75356": {
"item_id": "59f75356",
"task": "针对关键业务上量指标缺乏保障措施问题,出具具体可行方案并明确责任人",
"assignee": "建维部",
"series": "合川分公司周例会",
"created_meeting": "meeting_ed164adc704f.md",
"history": [
{
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"status": "进行中",
"priority": "高",
"deadline": "本周内"
}
],
"latest": {
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"status": "进行中",
"priority": "高",
"deadline": "本周内"
}
},
"b16a65ce": {
"item_id": "b16a65ce",
"task": "完成招聘情况、农村渠道进度及营销方案汇报",
"assignee": "市场部",
"series": "合川分公司周例会",
"created_meeting": "meeting_ed164adc704f.md",
"history": [
{
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"status": "待办",
"priority": "中",
"deadline": "本周内"
}
],
"latest": {
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"status": "待办",
"priority": "中",
"deadline": "本周内"
}
},
"691d7a64": {
"item_id": "691d7a64",
"task": "视频汇报运动会筹备情况",
"assignee": "市场部",
"series": "合川分公司周例会",
"created_meeting": "meeting_ed164adc704f.md",
"history": [
{
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"status": "待办",
"priority": "中",
"deadline": "本周六上午"
}
],
"latest": {
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"status": "待办",
"priority": "中",
"deadline": "本周六上午"
}
},
"8d9685f0": {
"item_id": "8d9685f0",
"task": "商客经理每日发送微信日报",
"assignee": "商客经理",
"series": "合川分公司周例会",
"created_meeting": "meeting_ed164adc704f.md",
"history": [
{
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"status": "待办",
"priority": "中",
"deadline": "每日"
}
],
"latest": {
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"status": "待办",
"priority": "中",
"deadline": "每日"
}
},
"723bdb36": {
"item_id": "723bdb36",
"task": "跟进学校IP限速机制建立",
"assignee": "宽带/客服部",
"series": "宽带运维、行政管理及市场业务推进会议",
"created_meeting": "meeting_5026dc1db2fe.md",
"history": [
{
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "进行中",
"priority": "高",
"deadline": ""
}
],
"latest": {
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "进行中",
"priority": "高",
"deadline": ""
}
},
"c1ebcaaf": {
"item_id": "c1ebcaaf",
"task": "处理客服内部机房问题清单并与客户沟通",
"assignee": "宽带/客服部",
"series": "宽带运维、行政管理及市场业务推进会议",
"created_meeting": "meeting_5026dc1db2fe.md",
"history": [
{
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "进行中",
"priority": "中",
"deadline": ""
}
],
"latest": {
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "进行中",
"priority": "中",
"deadline": ""
}
},
"9428cf05": {
"item_id": "9428cf05",
"task": "完成食堂改造审计及自饮机引入批复跟进",
"assignee": "综合部",
"series": "宽带运维、行政管理及市场业务推进会议",
"created_meeting": "meeting_5026dc1db2fe.md",
"history": [
{
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "进行中",
"priority": "中",
"deadline": ""
}
],
"latest": {
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "进行中",
"priority": "中",
"deadline": ""
}
},
"e4df98ca": {
"item_id": "e4df98ca",
"task": "落实招待费公示及纪检报备",
"assignee": "综合部",
"series": "宽带运维、行政管理及市场业务推进会议",
"created_meeting": "meeting_5026dc1db2fe.md",
"history": [
{
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "进行中",
"priority": "高",
"deadline": ""
}
],
"latest": {
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "进行中",
"priority": "高",
"deadline": ""
}
},
"e5e1449e": {
"item_id": "e5e1449e",
"task": "汇报招聘进度、农村渠道进度及营销方案",
"assignee": "市场部",
"series": "宽带运维、行政管理及市场业务推进会议",
"created_meeting": "meeting_5026dc1db2fe.md",
"history": [
{
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "待办",
"priority": "高",
"deadline": "本周内"
}
],
"latest": {
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "待办",
"priority": "高",
"deadline": "本周内"
}
},
"eb342fed": {
"item_id": "eb342fed",
"task": "每日微信发送满意度日报",
"assignee": "市场部",
"series": "宽带运维、行政管理及市场业务推进会议",
"created_meeting": "meeting_5026dc1db2fe.md",
"history": [
{
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "待办",
"priority": "高",
"deadline": "每日"
}
],
"latest": {
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "待办",
"priority": "高",
"deadline": "每日"
}
},
"3db9820d": {
"item_id": "3db9820d",
"task": "确定体育文化节方阵补充人员名单",
"assignee": "各部门",
"series": "宽带运维、行政管理及市场业务推进会议",
"created_meeting": "meeting_5026dc1db2fe.md",
"history": [
{
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "待办",
"priority": "高",
"deadline": "今日"
}
],
"latest": {
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "待办",
"priority": "高",
"deadline": "今日"
}
},
"684a31f4": {
"item_id": "684a31f4",
"task": "针对专线助账客指标拿出具体保障方案并回复",
"assignee": "各部门",
"series": "宽带运维、行政管理及市场业务推进会议",
"created_meeting": "meeting_5026dc1db2fe.md",
"history": [
{
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "待办",
"priority": "中",
"deadline": ""
}
],
"latest": {
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"status": "待办",
"priority": "中",
"deadline": ""
}
}
},
"metrics": {
"a76a5616": {
"metric_id": "a76a5616",
"metric_name": "弱光指标",
"owner": "建维部",
"history": [
{
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"value": "0.51",
"target": "",
"trend": "向好"
}
],
"latest": {
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"value": "0.51",
"target": "",
"trend": "向好"
}
},
"d64cea03": {
"metric_id": "d64cea03",
"metric_name": "三代终端年度目标",
"owner": "建维部",
"history": [
{
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"value": "",
"target": "5.5",
"trend": "需压降"
}
],
"latest": {
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"value": "",
"target": "5.5",
"trend": "需压降"
}
},
"13144224": {
"metric_id": "13144224",
"metric_name": "九零工程月度转化率",
"owner": "建维部",
"history": [
{
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"value": "87.35%",
"target": "90%",
"trend": "接近目标"
}
],
"latest": {
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"value": "87.35%",
"target": "90%",
"trend": "接近目标"
}
},
"e056b315": {
"metric_id": "e056b315",
"metric_name": "退单率",
"owner": "建维部",
"history": [
{
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"value": "6.53%",
"target": "",
"trend": ""
}
],
"latest": {
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"value": "6.53%",
"target": "",
"trend": ""
}
},
"12e0764a": {
"metric_id": "12e0764a",
"metric_name": "商客市场2月收入",
"owner": "商客市场部",
"history": [
{
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"value": "88.5万元",
"target": "",
"trend": "增长"
}
],
"latest": {
"date": "2026-05-06",
"meeting": "meeting_ed164adc704f.md",
"value": "88.5万元",
"target": "",
"trend": "增长"
}
},
"23942096": {
"metric_id": "23942096",
"metric_name": "FPTR",
"owner": "宽带/客服部",
"history": [
{
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"value": "达标",
"target": "达标",
"trend": "稳定"
}
],
"latest": {
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"value": "达标",
"target": "达标",
"trend": "稳定"
}
},
"671827ff": {
"metric_id": "671827ff",
"metric_name": "弱光指标",
"owner": "宽带/客服部",
"history": [
{
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"value": "趋近目标",
"target": "预设目标值",
"trend": "改善"
}
],
"latest": {
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"value": "趋近目标",
"target": "预设目标值",
"trend": "改善"
}
},
"5d622e23": {
"metric_id": "5d622e23",
"metric_name": "投诉率",
"owner": "各部门",
"history": [
{
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"value": "",
"target": "KPI考核核心值",
"trend": "需管控"
}
],
"latest": {
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"value": "",
"target": "KPI考核核心值",
"trend": "需管控"
}
},
"def6050a": {
"metric_id": "def6050a",
"metric_name": "工信部有责指标",
"owner": "各部门",
"history": [
{
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"value": "",
"target": "KPI考核核心值",
"trend": "需管控"
}
],
"latest": {
"date": "2026-05-06 13:37",
"meeting": "meeting_5026dc1db2fe.md",
"value": "",
"target": "KPI考核核心值",
"trend": "需管控"
}
}
},
"meeting_series": {
"合川分公司周例会": {
"latest_date": "2026-05-06",
"processed_titles": [
"合川分公司周例会2026第X期"
]
},
"宽带运维、行政管理及市场业务推进会议": {
"latest_date": "2026-05-06 13:37",
"processed_titles": [
"宽带运维、行政管理及市场业务推进会议"
]
}
},
"content_hashes": {
"090ff5313a9e5c0dfd8d91c8f8aeb5246bd40a3ed92def6e498bd8254d71a9a4": {
"title": "合川分公司周例会2026第X期",
"date": "2026-05-06",
"filename": "meeting_ed164adc704f.md"
},
"64078fdfd6dbe3c094ddad97b907bbcc0404df3de912488c020efb0e76fbe048": {
"title": "宽带运维、行政管理及市场业务推进会议",
"date": "2026-05-06 13:37",
"filename": "meeting_5026dc1db2fe.md"
}
}
}

View File

@ -0,0 +1,35 @@
---
title: "宽带运维、行政管理及市场业务推进会议"
date: "2026-05-06 13:37"
status: archived
---
# 宽带运维、行政管理及市场业务推进会议
会议概述
会议主要围绕宽带运维指标、综合行政与工会管理、市场政企业务推进、满意度提升及年度考核准备等议题展开,旨在总结阶段性工作进度,协调跨部门资源,明确后续重点任务与考核要求。
主要讨论点
宽带运维与网络质量上周上门量及安装进度受天气影响弱光指标趋近目标FPTR达标但主动过境偏后。PCDN专线在学校端持续恶化已报市公司分析IP并拟限速。超频基站故障已恢复专线巡检进度符合预期。客服培训后内部机房问题已梳理清单。
综合行政与工会事务2025年剩余两项工作按原计划推进。工会经费压减食堂改造拟于5月结合更替修补进行拟引入自饮机降本。第四届体育文化节筹备中因主力选手受伤需各部门抽调人员补充方阵。主题教育简报已发基层党组织学习已完成。
区表彰、主题教育与招待费管理区级担当作为表彰正在对接建议争取参与以拉开竞争差距。招待费实行每年公开1次制度综合部超预算需调整26年预算其他部门严控成本。政企对外接待需统筹严禁客户经理个人垫资。
拆迁商客、满意度考核与KPI准备拆迁以二次升套为主社区与单位集中营销并行。满意度测评发现开卷考试形式导致拉分相关经理思想松懈。KPI考核已明确工信部有责及投诉率两项核心指标强调日清日结与执行力。
决策事项
二级基站拆除及下电服务费调整需在4月15日前全量完成。
招待费实行每年公开1次制度综合部超预算需调整26年预算其他部门严控成本。
满意度测评取消开卷考试形式,后续采用“后机评”模式,市场部需按四公司规定制定满意客户样板及不满客户处理流程。
第四届体育文化节方阵缺编9人需各部门协调抽调确定名单后统一采购服装并安排下班后排练。
年度考核指标已初步定调,各部门需提前与市公司沟通争取有利政策,避免起跑线落后。
待办事项
宽带/客服部跟进学校IP限速机制建立处理客服内部机房问题清单并与客户沟通。
综合部:完成食堂改造审计及自饮机引入批复跟进;落实招待费公示及纪检报备。
市场部:本周内汇报招聘进度、农村渠道进度及营销方案;每日微信发送满意度日报。
各部门:今日确定体育文化节方阵补充人员名单;针对专线助账客指标拿出具体保障方案并回复。
关键信息
会议时间2026-05-06 13:37
核心考核导向KPI考核聚焦工信部有责与投诉率强调执行力与日清日结习惯。
业务风险点PCDN专线恶化、满意度测评因形式问题导致拉分、招待费预算超支。
AI建议
针对满意度考核风险,建议市场部立即复盘测评机制,避免形式主义拉低指标,并提前演练“后机评”应对策略。
针对招待费超支及客户经理个人垫资问题,建议综合部建立统一审批与结算台账,明确费用归属与报销时效,规避财务与合规风险。
针对KPI考核准备建议建立市公司指标动态跟踪表提前模拟考核场景强化跨部门协同与数据预埋确保年底考核不被动。

View File

@ -0,0 +1,81 @@
---
title: "合川分公司周例会2026第X期"
date: "2026-05-06"
status: archived
---
# 合川分公司周例会2026第X期
# 会议记录
议 题合川分公司周例会2026第X期
时 间2026年5月6日 13:37—14:23
地 点:分公司会议室
主持人AlanPaine
参加人:分公司领导、各部门经理及相关人员
议程:
一、各部门汇报
二、分公司领导指示部署
---
## 会议内容
### 一、各部门汇报
建维部、综合部、商客市场负责人按议程现场按顺序做汇报。建维部汇报宽带安装受天气影响进度偏后弱光指标0.51持续向好三代终端年度目标5.5需持续压降九零工程月度转化率87.35%接近90%目标退单率6.53%PCDN专线学校出口问题正协调限速机制二级基站拆除预计4月中旬完成综合部通报建委相关工作清单及投资计划已汇报打印设备已协调保障招投标需求工会经费压减后严考严用食堂改造及自饮机引入方案正在推进第四届体育文化节方阵人员招募与排练已部署商客市场2月收入88.5万元实现增长三期项目二期拆迁完成1145户社区与单位清洗服务5场落实签约量待提升。
---
### 二、部署强调
#### 建维部负责人强调:
1. **网络运维与指标管控:**
- 弱光指标0.51持续向好三代终端年度目标5.5需持续压降FPTR已达标但主动过境0.3靠后。
- 九零工程月度转化率87.35%接近90%目标退单率6.53%主要受用户原因及改约影响已建议施工优化BtoC审核撤单流程。
- PCDN专线因学校出口带宽问题持续恶化正协调限速机制超频基站故障已及时处理专线巡检按计划推进。
2. **工作反馈与执行要求:**
- 强调养成“日清日结”习惯,工作回复必须量化、有结果、有措施,杜绝工作拖延数月未动。
- 针对关键业务上量指标缺乏保障措施问题,要求本周内出具具体可行方案并明确责任人。
---
#### 综合部负责人强调:
1. **工会经费与后勤保障:**
- 工会经费全面压减,需严考严用、以更少资金办更好实事。软性工程(更衣室、食堂改造)已确定上报市公司,拟引入自饮机解决饮水问题并节约成本。
2. **奖项申报策略:**
- 针对河川区“担当作为先进集体和先进个人”申报,评选条件多为定性要求,建议提前与区领导或分管领导沟通确认意向后再行申报,避免盲目提交浪费资源。
---
#### 市场部负责人强调:
1. **季度收官与二季度谋划:**
- 市场部需提前谋划季度收官及二季度业务活动打破淡季思维全力推动商客、H业务及AI军团活动升温。
- 本周内完成招聘情况、农村渠道进度及营销方案汇报;本周六上午视频汇报运动会筹备情况。
2. **满意度与考核管控:**
- 深刻反思满意度测评前期工作未做到位问题要求主管亲自抓。明确满意度及投诉考核标准不满客户需严格按5:30及5:35节点操作报警。
- 要求商客经理每日微信发送日报,跟进考核细节,确保指标可控。
---
#### 分公司主要领导强调:
1. **强化执行力与作风:**
- 各部门及一线人员必须摒弃“知道怎么做却不去做”的作风,做到事不做好不收兵。分管领导需加强政企部等部门督导力度,必要时亲自沟通。
2. **年度考核提前摸底:**
- 针对四公司年度考核及集团相关指标提升,要求各部门提前深入了解考核细则及可能产生重大影响的不利因素并及时上报,切忌定稿后被动。
- 市公司会统筹考虑分公司整体情况,务必提前布局、赢在起跑线。

View File

@ -1,157 +0,0 @@
import json
import logging
import re
from typing import List, Optional
from pydantic import BaseModel
from openai import OpenAI
from config import config
logger = logging.getLogger(__name__)
client = OpenAI(
api_key=config.llm.api_key or None,
base_url=config.llm.base_url if config.llm.base_url else None,
)
class Entity(BaseModel):
name: str
entity_type: str
description: str = ""
class Relation(BaseModel):
subject: str
subject_type: str
predicate: str
object: str
object_type: str
description: str = ""
class ActionItem(BaseModel):
task: str
assignee: str = ""
deadline: str = ""
status: str = "待办"
priority: str = ""
class Decision(BaseModel):
content: str
proposer: str = ""
status: str = "已决"
class MeetingMetric(BaseModel):
metric_name: str
value: str
target: str = ""
owner: str = ""
trend: str = ""
class MeetingExtraction(BaseModel):
title: str
date: str = ""
participants: List[str] = []
agenda: List[str] = []
entities: List[Entity] = []
relations: List[Relation] = []
action_items: List[ActionItem] = []
decisions: List[Decision] = []
metrics: List[MeetingMetric] = []
summary: str = ""
EXTRACTION_SYSTEM_PROMPT = """
你是一个专业的会议纪要信息抽取专家你的任务是从中文会议记录中抽取结构化信息并严格按照要求的JSON格式返回
## 抽取内容
### 1. 实体
- 人物参会人员提及的人员
- 组织/部门公司部门团队
- 项目/任务正在进行的项目任务
- 指标/KPI关键绩效指标如转化率退单率等
- 概念/制度管理概念制度要求
- 地点会议地点项目地点
### 2. 关系 (主体-关系谓词-客体)
抽取事实性关系例如
- {"subject": "建维部", "subject_type": "组织", "predicate": "负责", "object": "网络运维", "object_type": "任务", "description": ""}
- {"subject": "弱光指标", "subject_type": "指标", "predicate": "目标值", "object": "0.5以下", "object_type": "数值", "description": ""}
### 3. 行动项
谁负责什么任务截止时间优先级
### 4. 决策
做出的决定和结论
### 5. 指标数据
具体的数字指标当前值目标值负责人趋势(向好/持平/恶化)
## 规则
- 只提取事实性信息
- 过滤比喻假设主观评价
- 数字指标要精确提取
- entitiesrelationsaction_itemsdecisionsmetrics 如果没有则返回空数组
"""
def _call_llm(system: str, user: str) -> str:
response = client.chat.completions.create(
model=config.llm.model,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user},
],
max_tokens=config.llm.max_tokens,
temperature=config.llm.temperature,
)
content = response.choices[0].message.content
if content is None:
raise ValueError("LLM returned empty response")
return content
def extract_meeting_info(text: str) -> MeetingExtraction:
user_prompt = f"""
从以下会议记录中抽取结构化信息
JSON字段说明
- title: 会议标题
- date: 会议日期
- participants: 参会人列表
- agenda: 议程列表
- entities: 实体列表每个实体包含 name(名称), entity_type(类型), description(描述)
- relations: 关系列表每个关系包含 subject(主体), subject_type(主体类型), predicate(关系谓词), object(客体), object_type(客体类型), description(描述)
- action_items: 行动项列表每条包含 task(任务), assignee(负责人), deadline(截止时间), status(状态), priority(优先级)
- decisions: 决策列表每条包含 content(决策内容), proposer(提出人), status(状态)
- metrics: 指标列表每条包含 metric_name(指标名), value(当前值), target(目标值), owner(负责人), trend(趋势)
- summary: 会议摘要
请直接返回JSON对象不要包含任何额外说明文字
会议记录
{text}
"""
content = _call_llm(EXTRACTION_SYSTEM_PROMPT, user_prompt)
data = _try_parse_json(content)
return MeetingExtraction(**data)
def _try_parse_json(content: str) -> dict:
try:
return json.loads(content)
except json.JSONDecodeError:
logger.warning("JSON解析失败尝试修复...")
match = re.search(r'\{.*\}', content, re.DOTALL)
if match:
try:
return json.loads(match.group())
except json.JSONDecodeError as e:
logger.error(f"修复后的JSON仍无法解析: {e}")
raise

225
main.py
View File

@ -1,226 +1,5 @@
import argparse
import logging
import os
import sys
if sys.stdout.encoding.lower() == "gbk":
sys.stdout.reconfigure(encoding="utf-8")
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
datefmt="%H:%M:%S",
)
logger = logging.getLogger(__name__)
def cmd_process(args):
from meeting_processor import meeting_processor
filepath = args.file
if not os.path.exists(filepath):
print(f"错误: 文件不存在: {filepath}")
sys.exit(1)
print(f"正在处理会议文件: {filepath}")
vault_path = meeting_processor.process_meeting_file(filepath, force=getattr(args, 'force', False))
if vault_path:
print(f"\n✅ 会议处理完成!")
print(f"📝 Obsidian 笔记: {vault_path}")
print(f"📂 Obsidian Vault: {os.path.dirname(vault_path)}")
else:
print("\n❌ 会议处理失败")
sys.exit(1)
def cmd_text(args):
from meeting_processor import meeting_processor
text = args.text
print("正在处理会议文本...")
vault_path = meeting_processor.process_meeting_text(text, force=getattr(args, 'force', False))
if vault_path:
print(f"\n✅ 会议处理完成!")
print(f"📝 Obsidian 笔记: {vault_path}")
else:
print("\n❌ 会议处理失败")
def cmd_query(args):
from meeting_processor import meeting_processor
question = args.question
print(f"🔍 查询: {question}")
print("-" * 40)
result = meeting_processor.query(question, top_k=args.top_k)
if result:
print(result)
else:
print("未找到相关信息")
def cmd_stats(args):
from meeting_processor import meeting_processor
stats = meeting_processor.stats()
print("📊 会议记忆系统统计")
print("-" * 40)
print(f"Obsidian 会议笔记: {stats.get('obsidian_meetings', 0)}")
print(f"Obsidian 实体笔记: {stats.get('obsidian_entities', 0)}")
print(f"向量索引节点数: {stats.get('vector_index', {}).get('node_count', 0)}")
print(f"Vault 路径: {stats.get('vault_path', '')}")
def cmd_batch(args):
from meeting_processor import meeting_processor
import glob as glob_module
pattern = args.pattern
files = glob_module.glob(pattern, recursive=True)
force = getattr(args, 'force', False)
if not files:
print(f"未匹配到任何文件: {pattern}")
sys.exit(1)
print(f"找到 {len(files)} 个文件,开始批量处理...")
success = 0
for f in files:
try:
print(f"\n处理: {f}")
meeting_processor.process_meeting_file(f, force=force)
success += 1
except Exception as e:
logger.error(f"处理失败: {f} - {e}")
print(f"\n✅ 批量处理完成: {success}/{len(files)} 成功")
def cmd_interactive(args=None):
from meeting_processor import meeting_processor
print("📋 会议纪要长期记忆系统 — 交互模式")
print("=" * 50)
print("可用命令:")
print(" query <问题> 语义查询会议记忆")
print(" process <路径> 处理会议文件")
print(" stats 查看统计")
print(" help 显示帮助")
print(" exit/quit 退出")
print("=" * 50)
while True:
try:
line = input("\n> ").strip()
except (EOFError, KeyboardInterrupt):
print()
break
if not line:
continue
if line in ("exit", "quit", "q"):
break
if line == "help":
print("可用命令:")
print(" query <问题> — 语义查询会议记忆")
print(" process <路径> — 处理一个会议markdown文件")
print(" stats — 查看系统统计")
print(" help — 显示此帮助")
print(" exit/quit — 退出")
continue
if line == "stats":
stats = meeting_processor.stats()
print(f"📊 会议: {stats.get('obsidian_meetings', 0)} | "
f"实体: {stats.get('obsidian_entities', 0)} | "
f"向量节点: {stats.get('vector_index', {}).get('node_count', 0)}")
continue
if line.startswith("process "):
filepath = line[8:].strip()
if not os.path.exists(filepath):
print(f"❌ 文件不存在: {filepath}")
continue
print(f"正在处理: {filepath}")
vault_path = meeting_processor.process_meeting_file(filepath)
if vault_path:
print(f"✅ 完成: {vault_path}")
else:
print("❌ 处理失败")
continue
if line.startswith("query "):
question = line[6:].strip()
else:
question = line
print(f"🔍 查询中...", end="", flush=True)
result = meeting_processor.query(question, top_k=3)
print("\r" + " " * 30 + "\r", end="")
if result:
print(result[:2000])
if len(result) > 2000:
print("... (结果过长已截断)")
else:
print("未找到相关信息")
print("bye!")
def main():
parser = argparse.ArgumentParser(
description="📋 会议纪要长期记忆系统",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
示例:
python main.py process meeting_example.md
python main.py query "弱光指标目标值是多少?"
python main.py stats
python main.py text "今天会议讨论了..."
无参数时进入交互模式
Powered by LlamaIndex + Obsidian + LLM
""",
)
subparsers = parser.add_subparsers(dest="command", help="子命令")
p_process = subparsers.add_parser("process", help="处理会议 markdown 文件")
p_process.add_argument("file", help="会议纪要 markdown 文件路径")
p_process.add_argument("-f", "--force", action="store_true", help="重复时自动覆盖,跳过确认")
p_text = subparsers.add_parser("text", help="直接输入会议文本")
p_text.add_argument("text", help="会议文本内容")
p_text.add_argument("-f", "--force", action="store_true", help="重复时自动覆盖,跳过确认")
p_query = subparsers.add_parser("query", help="语义查询会议记忆")
p_query.add_argument("question", help="查询问题")
p_query.add_argument("--top-k", type=int, default=3, help="返回结果数量")
p_stats = subparsers.add_parser("stats", help="查看系统统计")
p_batch = subparsers.add_parser("batch", help="批量处理会议文件")
p_batch.add_argument("pattern", help="文件 glob 模式, 如 'meetings/*.md'")
p_batch.add_argument("-f", "--force", action="store_true", help="重复时自动覆盖,跳过确认")
args = parser.parse_args()
if args.command == "process":
cmd_process(args)
elif args.command == "text":
cmd_text(args)
elif args.command == "query":
cmd_query(args)
elif args.command == "stats":
cmd_stats(args)
elif args.command == "batch":
cmd_batch(args)
else:
cmd_interactive(args)
from meeting_memory.cli import main
if __name__ == "__main__":
main()
main()

27
meeting1.md 100644
View File

@ -0,0 +1,27 @@
会议概述
会议主要围绕宽带运维指标、综合行政与工会管理、市场政企业务推进、满意度提升及年度考核准备等议题展开,旨在总结阶段性工作进度,协调跨部门资源,明确后续重点任务与考核要求。
主要讨论点
宽带运维与网络质量上周上门量及安装进度受天气影响弱光指标趋近目标FPTR达标但主动过境偏后。PCDN专线在学校端持续恶化已报市公司分析IP并拟限速。超频基站故障已恢复专线巡检进度符合预期。客服培训后内部机房问题已梳理清单。
综合行政与工会事务2025年剩余两项工作按原计划推进。工会经费压减食堂改造拟于5月结合更替修补进行拟引入自饮机降本。第四届体育文化节筹备中因主力选手受伤需各部门抽调人员补充方阵。主题教育简报已发基层党组织学习已完成。
区表彰、主题教育与招待费管理区级担当作为表彰正在对接建议争取参与以拉开竞争差距。招待费实行每年公开1次制度综合部超预算需调整26年预算其他部门严控成本。政企对外接待需统筹严禁客户经理个人垫资。
拆迁商客、满意度考核与KPI准备拆迁以二次升套为主社区与单位集中营销并行。满意度测评发现开卷考试形式导致拉分相关经理思想松懈。KPI考核已明确工信部有责及投诉率两项核心指标强调日清日结与执行力。
决策事项
二级基站拆除及下电服务费调整需在4月15日前全量完成。
招待费实行每年公开1次制度综合部超预算需调整26年预算其他部门严控成本。
满意度测评取消开卷考试形式,后续采用“后机评”模式,市场部需按四公司规定制定满意客户样板及不满客户处理流程。
第四届体育文化节方阵缺编9人需各部门协调抽调确定名单后统一采购服装并安排下班后排练。
年度考核指标已初步定调,各部门需提前与市公司沟通争取有利政策,避免起跑线落后。
待办事项
宽带/客服部跟进学校IP限速机制建立处理客服内部机房问题清单并与客户沟通。
综合部:完成食堂改造审计及自饮机引入批复跟进;落实招待费公示及纪检报备。
市场部:本周内汇报招聘进度、农村渠道进度及营销方案;每日微信发送满意度日报。
各部门:今日确定体育文化节方阵补充人员名单;针对专线助账客指标拿出具体保障方案并回复。
关键信息
会议时间2026-05-06 13:37
核心考核导向KPI考核聚焦工信部有责与投诉率强调执行力与日清日结习惯。
业务风险点PCDN专线恶化、满意度测评因形式问题导致拉分、招待费预算超支。
AI建议
针对满意度考核风险,建议市场部立即复盘测评机制,避免形式主义拉低指标,并提前演练“后机评”应对策略。
针对招待费超支及客户经理个人垫资问题,建议综合部建立统一审批与结算台账,明确费用归属与报销时效,规避财务与合规风险。
针对KPI考核准备建议建立市公司指标动态跟踪表提前模拟考核场景强化跨部门协同与数据预埋确保年底考核不被动。

View File

@ -0,0 +1,3 @@
from meeting_memory.config import config
__all__ = ["config"]

View File

@ -0,0 +1,202 @@
import argparse
import glob as glob_module
import logging
import os
import sys
from meeting_memory.meeting_processor import meeting_processor
from meeting_memory.web_demo import run_demo_server
if sys.stdout.encoding and sys.stdout.encoding.lower() == "gbk":
sys.stdout.reconfigure(encoding="utf-8")
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
datefmt="%H:%M:%S",
)
logger = logging.getLogger(__name__)
def cmd_process(args):
filepath = args.file
if not os.path.exists(filepath):
print(f"错误:文件不存在 {filepath}")
sys.exit(1)
print(f"正在处理会议文件:{filepath}")
archive_path = meeting_processor.process_meeting_file(
filepath,
force=getattr(args, "force", False),
)
if archive_path:
print("\n处理完成")
print(f"原文归档:{archive_path}")
else:
print("\n处理失败或已跳过")
sys.exit(1)
def cmd_text(args):
print("正在处理会议文本...")
archive_path = meeting_processor.process_meeting_text(
args.text,
force=getattr(args, "force", False),
)
if archive_path:
print("\n处理完成")
print(f"原文归档:{archive_path}")
else:
print("\n处理失败或已跳过")
def cmd_query(args):
print(f"查询:{args.question}")
print("-" * 40)
result = meeting_processor.query(args.question, top_k=args.top_k)
print(result if result else "未找到相关信息")
def cmd_stats(_args):
stats = meeting_processor.stats()
graph = stats.get("graph", {})
state = stats.get("state", {})
print("会议纪要长期记忆系统统计")
print("-" * 40)
print(f"Neo4j 启用:{graph.get('enabled', False)}")
print(f"图谱会议数:{graph.get('meetings', 0)}")
print(f"图谱 Episode 数:{graph.get('episodes', 0)}")
print(f"图谱实体数:{graph.get('entities', 0)}")
print(f"图谱 Fact 数:{graph.get('facts', 0)}")
print(f"行动项数:{state.get('action_items_tracked', 0)}")
print(f"指标数:{state.get('metrics_tracked', 0)}")
print(f"会议系列数:{state.get('meeting_series', 0)}")
print(f"原文归档目录:{stats.get('raw_dir', '')}")
print(f"状态文件:{stats.get('state_path', '')}")
def cmd_batch(args):
files = glob_module.glob(args.pattern, recursive=True)
if not files:
print(f"未匹配到任何文件:{args.pattern}")
sys.exit(1)
print(f"找到 {len(files)} 个文件,开始批量处理...")
success = 0
for path in files:
try:
print(f"\n处理:{path}")
result = meeting_processor.process_meeting_file(
path,
force=getattr(args, "force", False),
)
if result:
success += 1
except Exception as exc:
logger.error("处理失败: %s - %s", path, exc)
print(f"\n批量处理完成:{success}/{len(files)} 成功")
def cmd_web(args):
run_demo_server(
host=getattr(args, "host", "127.0.0.1"),
port=getattr(args, "port", 8765),
)
def cmd_interactive():
print("会议纪要长期记忆系统")
print("=" * 50)
print("可用命令:")
print(" query <问题> 查询会议记忆")
print(" process <路径> 处理会议文件")
print(" stats 查看统计")
print(" help 显示帮助")
print(" exit/quit 退出")
print("=" * 50)
while True:
try:
line = input("\n> ").strip()
except (EOFError, KeyboardInterrupt):
print()
break
if not line:
continue
if line in ("exit", "quit", "q"):
break
if line == "help":
print(" query <问题>")
print(" process <路径>")
print(" stats")
print(" exit/quit")
continue
if line == "stats":
cmd_stats(None)
continue
if line.startswith("process "):
filepath = line[8:].strip()
if not os.path.exists(filepath):
print(f"文件不存在:{filepath}")
continue
result = meeting_processor.process_meeting_file(filepath)
print(f"完成:{result}" if result else "处理失败或已跳过")
continue
question = line[6:].strip() if line.startswith("query ") else line
result = meeting_processor.query(question, top_k=3)
print(result if result else "未找到相关信息")
print("bye!")
def main():
parser = argparse.ArgumentParser(description="会议纪要长期记忆系统")
subparsers = parser.add_subparsers(dest="command")
p_process = subparsers.add_parser("process", help="处理会议 markdown 文件")
p_process.add_argument("file", help="会议文件路径")
p_process.add_argument("-f", "--force", action="store_true", help="发现重复时自动覆盖")
p_text = subparsers.add_parser("text", help="直接处理一段会议文本")
p_text.add_argument("text", help="会议文本内容")
p_text.add_argument("-f", "--force", action="store_true", help="发现重复时自动覆盖")
p_query = subparsers.add_parser("query", help="查询会议记忆")
p_query.add_argument("question", help="查询问题")
p_query.add_argument("--top-k", type=int, default=3, help="返回结果数量")
subparsers.add_parser("stats", help="查看统计")
p_batch = subparsers.add_parser("batch", help="批量处理会议文件")
p_batch.add_argument("pattern", help="glob 模式,如 meetings/*.md")
p_batch.add_argument("-f", "--force", action="store_true", help="发现重复时自动覆盖")
p_web = subparsers.add_parser("web", help="启动 Web 界面")
p_web.add_argument("--host", default="127.0.0.1", help="绑定地址")
p_web.add_argument("--port", type=int, default=8765, help="服务端口")
args = parser.parse_args()
if args.command == "process":
cmd_process(args)
elif args.command == "text":
cmd_text(args)
elif args.command == "query":
cmd_query(args)
elif args.command == "stats":
cmd_stats(args)
elif args.command == "batch":
cmd_batch(args)
elif args.command == "web":
cmd_web(args)
else:
cmd_interactive()
if __name__ == "__main__":
main()

View File

@ -1,10 +1,12 @@
import os
from pydantic import BaseModel, Field
from dotenv import load_dotenv
from pydantic import BaseModel, Field
load_dotenv()
PROJECT_ROOT = os.path.dirname(os.path.abspath(__file__))
PACKAGE_ROOT = os.path.dirname(os.path.abspath(__file__))
PROJECT_ROOT = os.path.dirname(PACKAGE_ROOT)
class LLMConfig(BaseModel):
@ -21,24 +23,24 @@ class EmbeddingConfig(BaseModel):
model: str = Field(default=os.getenv("EMBEDDING_MODEL", "text-embedding-3-small"))
class ObsidianConfig(BaseModel):
vault_path: str = Field(default=os.path.join(PROJECT_ROOT, "obsidian_vault"))
meetings_dir: str = Field(default="Meetings")
entities_dir: str = Field(default="Entities")
graphs_dir: str = Field(default="Graphs")
raw_dir: str = Field(default="Raw")
class StorageConfig(BaseModel):
data_dir: str = Field(default=os.path.join(PROJECT_ROOT, "data"))
raw_dir: str = Field(default=os.path.join(PROJECT_ROOT, "data", "raw"))
class VectorStoreConfig(BaseModel):
persist_dir: str = Field(default=os.path.join(PROJECT_ROOT, "vector_store_data"))
class Neo4jConfig(BaseModel):
enabled: bool = Field(default=os.getenv("NEO4J_ENABLED", "false").lower() == "true")
uri: str = Field(default=os.getenv("NEO4J_URI", "bolt://localhost:7687"))
user: str = Field(default=os.getenv("NEO4J_USER", "neo4j"))
password: str = Field(default=os.getenv("NEO4J_PASSWORD", ""))
database: str = Field(default=os.getenv("NEO4J_DATABASE", "neo4j"))
class ProjectConfig(BaseModel):
llm: LLMConfig = Field(default_factory=LLMConfig)
embedding: EmbeddingConfig = Field(default_factory=EmbeddingConfig)
obsidian: ObsidianConfig = Field(default_factory=ObsidianConfig)
vector_store: VectorStoreConfig = Field(default_factory=VectorStoreConfig)
state_path: str = Field(default=os.path.join(PROJECT_ROOT, "obsidian_vault", "meeting_state.json"))
storage: StorageConfig = Field(default_factory=StorageConfig)
neo4j: Neo4jConfig = Field(default_factory=Neo4jConfig)
state_path: str = Field(default=os.path.join(PROJECT_ROOT, "data", "meeting_state.json"))
config = ProjectConfig()
config = ProjectConfig()

View File

@ -0,0 +1,364 @@
import json
import logging
import re
import sys
from typing import List, Optional
from openai import OpenAI
from pydantic import BaseModel, Field
from meeting_memory.config import config
logger = logging.getLogger(__name__)
client = OpenAI(
api_key=config.llm.api_key or None,
base_url=config.llm.base_url if config.llm.base_url else None,
)
class Entity(BaseModel):
name: str
entity_type: str
description: str = ""
class Relation(BaseModel):
subject: str
subject_type: str
predicate: str
object: str
object_type: str
description: str = ""
fact: str = ""
qualifiers: List[str] = Field(default_factory=list)
evidence: str = ""
confidence: float = 0.0
valid_at: str = ""
invalid_at: str = ""
class ActionItem(BaseModel):
task: str
assignee: str = ""
deadline: str = ""
status: str = "待办"
priority: str = ""
class Decision(BaseModel):
content: str
proposer: str = ""
status: str = "已决"
class MeetingMetric(BaseModel):
metric_name: str
value: str
target: str = ""
owner: str = ""
trend: str = ""
class MeetingExtraction(BaseModel):
title: str
date: str = ""
participants: List[str] = Field(default_factory=list)
agenda: List[str] = Field(default_factory=list)
entities: List[Entity] = Field(default_factory=list)
relations: List[Relation] = Field(default_factory=list)
action_items: List[ActionItem] = Field(default_factory=list)
decisions: List[Decision] = Field(default_factory=list)
metrics: List[MeetingMetric] = Field(default_factory=list)
summary: str = ""
EXTRACTION_SYSTEM_PROMPT = """
你是一个专业的会议知识抽取助手你的任务是从中文会议记录中抽取结构化事实尤其要抽出更细粒度更有语义深度的关系
输出要求
1. 只输出一个 JSON 对象不要输出解释文字
2. 关系抽取不要停留在部门汇报了工作这种浅层描述要尽可能向下细化到
- 责任归属
- 目标值 / 当前值 / 趋势
- 约束条件
- 因果 / 影响
- 时间要求
- 依赖关系
- 部署 / 决策 / 要求 / 风险 / 支撑关系
3. 每条关系尽量同时给出
- subject / predicate / object
- fact: 一句自然语言事实表述
- qualifiers: 限定条件范围状态数值约束等
- evidence: 原文中的关键短句或压缩证据
- confidence: 0 1 之间
- valid_at / invalid_at: 如果文中明确提到时间可填写否则留空
4. 如果原文存在多个事实不要只抽象概括要拆成多条关系
5. 避免空泛关系词优先使用更具体的谓词例如
- 负责 / 汇报 / 目标值 / 当前值 / 低于 / 高于 / 要求 / 督导 / 推进 / 影响 / 支撑 / 依赖 / 计划 / 完成 / 截止于
"""
def _call_llm(system: str, user: str, stream: bool = False) -> str:
if not stream:
response = client.chat.completions.create(
model=config.llm.model,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user},
],
max_tokens=config.llm.max_tokens,
temperature=config.llm.temperature,
)
content = response.choices[0].message.content
if content is None:
raise ValueError("LLM returned empty response")
return content
response = client.chat.completions.create(
model=config.llm.model,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user},
],
max_tokens=config.llm.max_tokens,
temperature=config.llm.temperature,
stream=True,
)
chunks: List[str] = []
print("\n[LLM] 开始抽取,流式输出中:")
for event in response:
if not event.choices:
continue
delta = event.choices[0].delta.content
if not delta:
continue
chunks.append(delta)
sys.stdout.write(delta)
sys.stdout.flush()
print("\n[LLM] 抽取输出结束")
return "".join(chunks)
def extract_meeting_info(text: str, stream: bool = False) -> MeetingExtraction:
user_prompt = f"""
请从下面会议记录中提取结构化信息并重点做深层关系抽取
输出 JSON 字段
- title
- date
- participants
- agenda
- entities: name, entity_type, description
- relations:
- subject
- subject_type
- predicate
- object
- object_type
- description
- fact
- qualifiers
- evidence
- confidence
- valid_at
- invalid_at
- action_items: task, assignee, deadline, status, priority
- decisions: content, proposer, status
- metrics: metric_name, value, target, owner, trend
- summary
关系抽取规则
1. 不要只抽汇报了工作这种会议动作要尽量继续下钻出具体事实
2. 如果一句话里同时包含主体 + 指标 + 当前值 + 目标值 + 负责人 + 趋势应拆成多条关系或在 qualifiers 中保留这些细节
3. 对于要求部署负责依赖影响约束目标风险类信息优先保留
4. fact 必须是一句完整自然可检索的事实描述
5. qualifiers 用于补充数值范围状态条件截止时间优先级等信息
6. evidence 用原文中的关键词短句不要太长
7. confidence 取值 0 1
会议记录如下
{text}
"""
content = _call_llm(EXTRACTION_SYSTEM_PROMPT, user_prompt, stream=stream)
data = _try_parse_json(content)
data = _normalize_meeting_data(data)
return MeetingExtraction(**data)
def _try_parse_json(content: str) -> dict:
try:
return json.loads(content)
except json.JSONDecodeError:
logger.warning("JSON parsing failed; trying to repair extracted block")
match = re.search(r"\{.*\}", content, re.DOTALL)
if match:
try:
return json.loads(match.group())
except json.JSONDecodeError as exc:
logger.error("Repaired JSON still failed to parse: %s", exc)
raise
def _normalize_meeting_data(data: dict) -> dict:
if not isinstance(data, dict):
return {}
return {
"title": _as_str(data.get("title")),
"date": _as_str(data.get("date")),
"participants": _as_str_list(data.get("participants")),
"agenda": _as_str_list(data.get("agenda")),
"entities": _normalize_entities(data.get("entities")),
"relations": _normalize_relations(data.get("relations")),
"action_items": _normalize_action_items(data.get("action_items")),
"decisions": _normalize_decisions(data.get("decisions")),
"metrics": _normalize_metrics(data.get("metrics")),
"summary": _as_str(data.get("summary")),
}
def _as_str(value) -> str:
if value is None:
return ""
if isinstance(value, str):
return value
return str(value)
def _as_float(value) -> float:
if value is None or value == "":
return 0.0
try:
numeric = float(value)
return max(0.0, min(1.0, numeric))
except (TypeError, ValueError):
return 0.0
def _as_str_list(value) -> List[str]:
if isinstance(value, dict):
items = []
for key, item in value.items():
key_text = _as_str(key)
value_text = _as_str(item)
if key_text and value_text:
items.append(f"{key_text}: {value_text}")
elif key_text:
items.append(key_text)
elif value_text:
items.append(value_text)
return items
if not isinstance(value, list):
return []
return [_as_str(item) for item in value if item is not None]
def _normalize_entities(value) -> List[dict]:
if not isinstance(value, list):
return []
items = []
for entity in value:
if not isinstance(entity, dict):
continue
items.append(
{
"name": _as_str(entity.get("name")),
"entity_type": _as_str(entity.get("entity_type")),
"description": _as_str(entity.get("description")),
}
)
return items
def _normalize_relations(value) -> List[dict]:
if not isinstance(value, list):
return []
items = []
for relation in value:
if not isinstance(relation, dict):
continue
subject = _as_str(relation.get("subject"))
predicate = _as_str(relation.get("predicate"))
obj = _as_str(relation.get("object"))
description = _as_str(relation.get("description"))
fact = _as_str(relation.get("fact"))
if not fact and subject and predicate and obj:
fact = f"{subject} {predicate} {obj}"
items.append(
{
"subject": subject,
"subject_type": _as_str(relation.get("subject_type")),
"predicate": predicate,
"object": obj,
"object_type": _as_str(relation.get("object_type")),
"description": description,
"fact": fact,
"qualifiers": _as_str_list(relation.get("qualifiers")),
"evidence": _as_str(relation.get("evidence")),
"confidence": _as_float(relation.get("confidence")),
"valid_at": _as_str(relation.get("valid_at")),
"invalid_at": _as_str(relation.get("invalid_at")),
}
)
return items
def _normalize_action_items(value) -> List[dict]:
if not isinstance(value, list):
return []
items = []
for action in value:
if not isinstance(action, dict):
continue
items.append(
{
"task": _as_str(action.get("task")),
"assignee": _as_str(action.get("assignee")),
"deadline": _as_str(action.get("deadline")),
"status": _as_str(action.get("status")) or "待办",
"priority": _as_str(action.get("priority")) or "",
}
)
return items
def _normalize_decisions(value) -> List[dict]:
if not isinstance(value, list):
return []
items = []
for decision in value:
if not isinstance(decision, dict):
continue
items.append(
{
"content": _as_str(decision.get("content")),
"proposer": _as_str(decision.get("proposer")),
"status": _as_str(decision.get("status")) or "已决",
}
)
return items
def _normalize_metrics(value) -> List[dict]:
if not isinstance(value, list):
return []
items = []
for metric in value:
if not isinstance(metric, dict):
continue
items.append(
{
"metric_name": _as_str(metric.get("metric_name")),
"value": _as_str(metric.get("value")),
"target": _as_str(metric.get("target")),
"owner": _as_str(metric.get("owner")),
"trend": _as_str(metric.get("trend")),
}
)
return items

View File

@ -0,0 +1,784 @@
import hashlib
import json
import logging
import re
import time
from typing import Any, Dict, List, Optional
from meeting_memory.config import config
from meeting_memory.services.embedding_service import embedding_service
logger = logging.getLogger(__name__)
def _cosine_similarity(left: List[float], right: List[float]) -> float:
if not left or not right or len(left) != len(right):
return 0.0
dot = sum(a * b for a, b in zip(left, right))
left_norm = sum(a * a for a in left) ** 0.5
right_norm = sum(b * b for b in right) ** 0.5
if left_norm == 0 or right_norm == 0:
return 0.0
return dot / (left_norm * right_norm)
def _keyword_score(text: str, question: str) -> float:
source = (text or "").lower()
terms = _keyword_terms(question)
if not source or not terms:
return 0.0
hits = sum(1 for term in terms if term in source)
return hits / len(terms)
def _keyword_terms(text: str) -> List[str]:
normalized = (text or "").lower()
raw_terms = re.findall(r"[a-z0-9]+|[\u4e00-\u9fff]{2,}", normalized)
stopwords = {"是什么", "多少", "分别", "以及", "还有", "当前值", "目标值"}
terms: List[str] = []
for raw in raw_terms:
if raw in stopwords:
continue
if raw not in terms:
terms.append(raw)
if re.fullmatch(r"[\u4e00-\u9fff]{4,}", raw):
for size in (2, 3, 4):
for idx in range(0, len(raw) - size + 1):
piece = raw[idx : idx + size]
if piece not in stopwords and piece not in terms:
terms.append(piece)
return terms
class Neo4jGraphStore:
def __init__(self):
self._driver = None
self._enabled = False
self._uri = config.neo4j.uri
self._last_failure_at = 0.0
self._retry_cooldown_seconds = 10.0
self._connect()
def _connect(self):
if not config.neo4j.enabled:
logger.info("Neo4j graph store disabled")
return
try:
from neo4j import GraphDatabase
except ImportError:
logger.warning("neo4j package is not installed")
return
if not config.neo4j.password:
logger.warning("Neo4j is enabled but NEO4J_PASSWORD is empty")
return
tried_uris = [self._uri]
if self._uri.startswith("neo4j://"):
tried_uris.append("bolt://" + self._uri[len("neo4j://") :])
for uri in tried_uris:
driver = None
try:
driver = GraphDatabase.driver(
uri,
auth=(config.neo4j.user, config.neo4j.password),
)
driver.verify_connectivity()
self._driver = driver
self._uri = uri
self._enabled = True
self._last_failure_at = 0.0
if uri != config.neo4j.uri:
logger.warning("Neo4j routing URI unavailable; fell back to %s", uri)
return
except Exception as exc:
logger.warning("Neo4j connection failed for %s: %s", uri, exc)
try:
driver.close()
except Exception:
pass
self._mark_unavailable("Neo4j is currently unreachable")
@property
def enabled(self) -> bool:
if not self._enabled and self._should_retry_connect():
self._connect()
return self._enabled and self._driver is not None
def _should_retry_connect(self) -> bool:
return (time.time() - self._last_failure_at) >= self._retry_cooldown_seconds
def _mark_unavailable(self, reason: str = "") -> None:
if reason:
logger.warning("Neo4j temporarily disabled: %s", reason)
self._enabled = False
self._last_failure_at = time.time()
if self._driver is not None:
try:
self._driver.close()
except Exception:
pass
self._driver = None
@staticmethod
def meeting_id(meeting_data: dict) -> str:
title = meeting_data.get("title", "")
date = meeting_data.get("date", "")
raw = f"{date}_{title}"
return f"meeting_{hashlib.md5(raw.encode('utf-8')).hexdigest()[:12]}"
def close(self):
if self._driver is not None:
self._driver.close()
def run_query(self, query: str, **params) -> List[Dict[str, Any]]:
if not self.enabled:
return []
try:
with self._driver.session(database=config.neo4j.database) as session:
result = session.run(query, **params)
return [record.data() for record in result]
except Exception as exc:
logger.warning("Neo4j query failed: %s", exc)
self._mark_unavailable(str(exc))
return []
def initialize_schema(self):
if not self.enabled:
return
statements = [
"CREATE CONSTRAINT meeting_id IF NOT EXISTS FOR (m:Meeting) REQUIRE m.meeting_id IS UNIQUE",
"CREATE CONSTRAINT episode_id IF NOT EXISTS FOR (e:Episode) REQUIRE e.episode_id IS UNIQUE",
"CREATE CONSTRAINT entity_name IF NOT EXISTS FOR (e:Entity) REQUIRE e.name IS UNIQUE",
"CREATE CONSTRAINT fact_id IF NOT EXISTS FOR (f:Fact) REQUIRE f.fact_id IS UNIQUE",
"CREATE INDEX meeting_title IF NOT EXISTS FOR (m:Meeting) ON (m.title)",
"CREATE INDEX episode_title IF NOT EXISTS FOR (e:Episode) ON (e.title)",
"CREATE INDEX entity_type IF NOT EXISTS FOR (e:Entity) ON (e.entity_type)",
"CREATE INDEX fact_predicate IF NOT EXISTS FOR (f:Fact) ON (f.predicate)",
]
for statement in statements:
self.run_query(statement)
def get_stats(self) -> Dict[str, Any]:
if not self.enabled:
return {"enabled": False}
rows = self.run_query(
"""
CALL () {
MATCH (m:Meeting)
RETURN count(m) AS meetings
}
CALL () {
MATCH (ep:Episode)
RETURN count(ep) AS episodes
}
CALL () {
MATCH (e:Entity)
RETURN count(e) AS entities
}
CALL () {
MATCH (f:Fact)
RETURN count(f) AS facts
}
RETURN meetings, episodes, entities, facts
"""
)
if not rows:
return {"enabled": False, "meetings": 0, "episodes": 0, "entities": 0, "facts": 0}
return {"enabled": True, **rows[0]}
def upsert_meeting_subgraph(self, meeting_data: dict) -> None:
if not self.enabled:
return
meeting_id = meeting_data.get("_graph_meeting_id") or self.meeting_id(meeting_data)
episode_text = self._build_episode_text(meeting_data)
episode_embedding = embedding_service.embed_text(episode_text)
self.initialize_schema()
self.run_query(
"""
MERGE (m:Meeting {meeting_id: $meeting_id})
SET m.title = $title,
m.date = $date,
m.summary = $summary,
m.content_hash = $content_hash,
m.raw_path = $raw_path,
m.updated_at = datetime()
MERGE (ep:Episode {episode_id: $meeting_id})
SET ep.title = $title,
ep.date = $date,
ep.summary = $summary,
ep.content = $content,
ep.content_hash = $content_hash,
ep.raw_path = $raw_path,
ep.participants = $participants,
ep.content_embedding = $content_embedding,
ep.updated_at = datetime()
MERGE (m)-[:HAS_EPISODE]->(ep)
""",
meeting_id=meeting_id,
title=meeting_data.get("title", ""),
date=meeting_data.get("date", ""),
summary=meeting_data.get("summary", ""),
content_hash=meeting_data.get("_content_hash", ""),
raw_path=meeting_data.get("_original_text_path", ""),
content=episode_text,
participants=meeting_data.get("participants", []),
content_embedding=episode_embedding,
)
for entity in meeting_data.get("entities", []):
self._upsert_entity(meeting_id, entity)
for participant in meeting_data.get("participants", []):
self._upsert_entity(
meeting_id,
{"name": participant, "entity_type": "participant", "description": ""},
)
for relation in meeting_data.get("relations", []):
self._upsert_relation(meeting_id, relation, meeting_data.get("date", ""))
def _upsert_entity(self, meeting_id: str, entity: dict) -> None:
name = entity.get("name", "").strip()
if not name:
return
summary = self._entity_summary(entity)
name_embedding = embedding_service.embed_text(summary or name)
self.run_query(
"""
MATCH (:Meeting {meeting_id: $meeting_id})-[:HAS_EPISODE]->(ep:Episode {episode_id: $meeting_id})
MERGE (e:Entity {name: $name})
SET e.entity_type = CASE
WHEN $entity_type <> '' THEN $entity_type
ELSE coalesce(e.entity_type, '')
END,
e.description = CASE
WHEN $description <> '' THEN $description
ELSE coalesce(e.description, '')
END,
e.summary = CASE
WHEN $summary <> '' THEN $summary
ELSE coalesce(e.summary, '')
END,
e.name_embedding = CASE
WHEN size($name_embedding) > 0 THEN $name_embedding
ELSE coalesce(e.name_embedding, [])
END,
e.updated_at = datetime()
MERGE (ep)-[:MENTIONS]->(e)
""",
meeting_id=meeting_id,
name=name,
entity_type=entity.get("entity_type", ""),
description=entity.get("description", ""),
summary=summary,
name_embedding=name_embedding,
)
def _upsert_relation(self, meeting_id: str, relation: dict, meeting_date: str) -> None:
subject = relation.get("subject", "").strip()
predicate = relation.get("predicate", "").strip()
obj = relation.get("object", "").strip()
if not subject or not predicate or not obj:
return
self._upsert_entity(
meeting_id,
{
"name": subject,
"entity_type": relation.get("subject_type", ""),
"description": "",
},
)
self._upsert_entity(
meeting_id,
{
"name": obj,
"entity_type": relation.get("object_type", ""),
"description": "",
},
)
fact_text = self._fact_text(relation)
fact_id = hashlib.md5(
f"{meeting_id}|{subject}|{predicate}|{obj}".encode("utf-8")
).hexdigest()
fact_embedding = embedding_service.embed_text(fact_text)
self.run_query(
"""
MATCH (:Meeting {meeting_id: $meeting_id})-[:HAS_EPISODE]->(ep:Episode {episode_id: $meeting_id})
MATCH (s:Entity {name: $subject})
MATCH (o:Entity {name: $object})
MERGE (f:Fact {fact_id: $fact_id})
SET f.fact = $fact,
f.predicate = $predicate,
f.description = $description,
f.qualifiers = $qualifiers,
f.evidence = $evidence,
f.confidence = $confidence,
f.valid_at = $valid_at,
f.invalid_at = $invalid_at,
f.meeting_id = $meeting_id,
f.meeting_date = $meeting_date,
f.fact_embedding = $fact_embedding,
f.updated_at = datetime()
MERGE (ep)-[:HAS_FACT]->(f)
MERGE (s)-[:FACT_SOURCE]->(f)
MERGE (f)-[:FACT_TARGET]->(o)
""",
meeting_id=meeting_id,
subject=subject,
predicate=predicate,
object=obj,
fact_id=fact_id,
fact=fact_text,
description=relation.get("description", ""),
qualifiers=relation.get("qualifiers", []),
evidence=relation.get("evidence", ""),
confidence=relation.get("confidence", 0.0),
valid_at=relation.get("valid_at", ""),
invalid_at=relation.get("invalid_at", ""),
meeting_date=meeting_date,
fact_embedding=fact_embedding,
)
def remove_meeting_subgraph(self, meeting_id: str) -> None:
if not self.enabled:
return
self.run_query(
"""
MATCH (m:Meeting {meeting_id: $meeting_id})-[:HAS_EPISODE]->(ep:Episode {episode_id: $meeting_id})
OPTIONAL MATCH (ep)-[mention:MENTIONS]->(entity:Entity)
OPTIONAL MATCH (ep)-[has_fact:HAS_FACT]->(fact:Fact)
OPTIONAL MATCH (fact)-[target_rel:FACT_TARGET]->(:Entity)
OPTIONAL MATCH (:Entity)-[source_rel:FACT_SOURCE]->(fact)
DELETE mention, has_fact, target_rel, source_rel
WITH m, ep, collect(DISTINCT fact) AS facts, collect(DISTINCT entity) AS entities
FOREACH (fact IN facts | DELETE fact)
DELETE ep, m
WITH entities
UNWIND entities AS entity
WITH DISTINCT entity WHERE entity IS NOT NULL
OPTIONAL MATCH (entity)<-[m1:MENTIONS]-(:Episode)
OPTIONAL MATCH (entity)-[m2:FACT_SOURCE|FACT_TARGET]-(:Fact)
WITH entity, count(m1) + count(m2) AS refs
WHERE refs = 0
DELETE entity
""",
meeting_id=meeting_id,
)
def get_meeting(self, title: str, date: str = "") -> Optional[Dict[str, Any]]:
if not self.enabled:
return None
rows = self.run_query(
"""
MATCH (m:Meeting)
WHERE m.title = $title
AND ($date = '' OR m.date = $date)
RETURN m.meeting_id AS meeting_id,
m.title AS title,
m.date AS date,
m.summary AS summary,
m.content_hash AS content_hash
LIMIT 1
""",
title=title,
date=date,
)
return rows[0] if rows else None
def find_similar_episode(self, text: str, threshold: float = 0.92) -> Optional[Dict[str, Any]]:
if not self.enabled or not text.strip():
return None
query_embedding = embedding_service.embed_text(text)
rows = self.run_query(
"""
MATCH (m:Meeting)-[:HAS_EPISODE]->(ep:Episode)
RETURN m.meeting_id AS meeting_id,
m.title AS title,
m.date AS date,
m.content_hash AS content_hash,
ep.content_embedding AS content_embedding
"""
)
best_match = None
for row in rows:
score = _cosine_similarity(query_embedding, row.get("content_embedding", []))
if score >= threshold and (best_match is None or score > best_match["score"]):
best_match = {
"metadata": {
"meeting_id": row.get("meeting_id", ""),
"title": row.get("title", ""),
"date": row.get("date", ""),
"content_hash": row.get("content_hash", ""),
},
"score": score,
}
return best_match
def hybrid_search(self, question: str, limit: int = 5) -> List[Dict[str, Any]]:
if not self.enabled or not question.strip():
return []
query_embedding = embedding_service.embed_text(question)
candidates = self._load_fact_candidates()
candidates.extend(self._load_entity_candidates())
candidates.extend(self._load_episode_candidates())
scored = []
for item in candidates:
combined_text = " ".join(
[
str(item.get("title") or ""),
str(item.get("text") or ""),
str(item.get("meeting_title") or ""),
str(item.get("date") or ""),
]
)
semantic = _cosine_similarity(query_embedding, item.get("embedding", []))
lexical = _keyword_score(combined_text, question)
graph_bonus = 0.1 if item.get("kind") == "fact" else 0.05
score = semantic * 0.7 + lexical * 0.2 + graph_bonus
if score <= 0:
continue
scored.append(
{
**item,
"score": round(score, 4),
"semantic_score": round(semantic, 4),
"keyword_score": round(lexical, 4),
}
)
scored.sort(key=lambda row: row["score"], reverse=True)
return scored[:limit]
def search_facts(self, question: str, limit: int = 5) -> List[Dict[str, Any]]:
return self.hybrid_search(question, limit=limit)
def get_graph_kinds(self) -> List[Dict[str, Any]]:
if not self.enabled:
return []
rows = self.run_query(
"""
MATCH (n)
WHERE n:Meeting OR n:Episode OR n:Entity OR n:Fact
WITH [lbl IN labels(n) WHERE lbl IN ['Meeting','Episode','Entity','Fact']][0] AS kind
RETURN kind, count(*) AS count
ORDER BY count DESC
"""
)
return rows
def get_entity_types(self) -> List[Dict[str, Any]]:
if not self.enabled:
return []
return self.run_query(
"""
MATCH (e:Entity)
WHERE coalesce(e.entity_type, '') <> ''
RETURN e.entity_type AS entity_type, count(*) AS count
ORDER BY count DESC
"""
)
def get_graph_snapshot(
self,
query: str = "",
entity_types: Optional[List[str]] = None,
kinds: Optional[List[str]] = None,
limit_nodes: int = 80,
limit_edges: int = 160,
) -> Dict[str, Any]:
if not self.enabled:
return {"nodes": [], "edges": [], "stats": {"enabled": False}}
keyword_terms = _keyword_terms(query) if query else []
raw_nodes = self.run_query(
"""
MATCH (n)
WHERE (n:Meeting OR n:Episode OR n:Entity OR n:Fact)
AND ($kinds = [] OR [lbl IN labels(n) WHERE lbl IN ['Meeting','Episode','Entity','Fact']][0] IN $kinds)
AND ($terms = []
OR (n:Meeting AND any(t IN $terms WHERE toLower(coalesce(n.title,'')) CONTAINS t OR toLower(coalesce(n.summary,'')) CONTAINS t))
OR (n:Episode AND any(t IN $terms WHERE toLower(coalesce(n.title,'')) CONTAINS t OR toLower(coalesce(n.content,'')) CONTAINS t))
OR (n:Entity AND any(t IN $terms WHERE toLower(coalesce(n.name,'')) CONTAINS t OR toLower(coalesce(n.summary,'')) CONTAINS t OR toLower(coalesce(n.description,'')) CONTAINS t))
OR (n:Fact AND any(t IN $terms WHERE toLower(coalesce(n.fact,'')) CONTAINS t OR toLower(coalesce(n.predicate,'')) CONTAINS t OR toLower(coalesce(n.description,'')) CONTAINS t))
)
AND ($types = [] OR NOT n:Entity OR coalesce(n.entity_type, '') IN $types)
OPTIONAL MATCH (n)-[r]-()
RETURN n.meeting_id AS meeting_id,
n.episode_id AS episode_id,
n.name AS entity_name,
n.fact_id AS fact_id,
n.title AS title,
n.summary AS summary,
n.date AS date,
n.entity_type AS entity_type,
n.description AS description,
n.predicate AS predicate,
n.fact AS fact,
n.confidence AS confidence,
n.meeting_date AS meeting_date,
[lbl IN labels(n) WHERE lbl IN ['Meeting','Episode','Entity','Fact']][0] AS kind,
count(DISTINCT r) AS degree
ORDER BY degree DESC, coalesce(n.title, n.name, n.fact) ASC
LIMIT $limit_nodes
""",
terms=keyword_terms,
types=entity_types or [],
kinds=kinds or [],
limit_nodes=limit_nodes,
)
if not raw_nodes:
return {"nodes": [], "edges": [], "stats": self.get_stats()}
all_raw_ids = set()
nodes = []
for row in raw_nodes:
kind = row.get("kind", "")
if kind == "Meeting":
raw_id = row.get("meeting_id", "")
label = row.get("title", "") or raw_id
elif kind == "Episode":
raw_id = row.get("episode_id", "")
label = row.get("title", "") or raw_id
elif kind == "Entity":
raw_id = row.get("entity_name", "")
label = raw_id
elif kind == "Fact":
raw_id = row.get("fact_id", "")
label = row.get("predicate", "") or row.get("fact", "") or raw_id
else:
continue
if not raw_id:
continue
nid = f"{kind}:{raw_id}"
all_raw_ids.add(raw_id)
nodes.append({
"id": nid,
"label": label,
"kind": kind,
"entity_type": row.get("entity_type", "") if kind == "Entity" else "",
"description": row.get("description", "") or row.get("summary", "") or "",
"date": row.get("date", "") or row.get("meeting_date", "") or "",
"degree": row.get("degree", 0),
"fact": row.get("fact", "") if kind == "Fact" else "",
"summary": row.get("summary", "") or "",
})
if not nodes:
return {"nodes": [], "edges": [], "stats": self.get_stats()}
ids_list = list(all_raw_ids)
edges_raw = self.run_query(
"""
MATCH (s)-[r]->(t)
WHERE type(r) IN ['HAS_EPISODE','MENTIONS','HAS_FACT','FACT_SOURCE','FACT_TARGET']
AND (
(s:Meeting AND s.meeting_id IN $ids)
OR (s:Episode AND s.episode_id IN $ids)
OR (s:Entity AND s.name IN $ids)
OR (s:Fact AND s.fact_id IN $ids)
)
AND (
(t:Meeting AND t.meeting_id IN $ids)
OR (t:Episode AND t.episode_id IN $ids)
OR (t:Entity AND t.name IN $ids)
OR (t:Fact AND t.fact_id IN $ids)
)
RETURN type(r) AS predicate,
CASE WHEN s:Meeting THEN s.meeting_id
WHEN s:Episode THEN s.episode_id
WHEN s:Entity THEN s.name
WHEN s:Fact THEN s.fact_id END AS source_raw,
CASE WHEN t:Meeting THEN t.meeting_id
WHEN t:Episode THEN t.episode_id
WHEN t:Entity THEN t.name
WHEN t:Fact THEN t.fact_id END AS target_raw,
CASE WHEN s:Meeting THEN 'Meeting' WHEN s:Episode THEN 'Episode'
WHEN s:Entity THEN 'Entity' WHEN s:Fact THEN 'Fact' END AS source_kind,
CASE WHEN t:Meeting THEN 'Meeting' WHEN t:Episode THEN 'Episode'
WHEN t:Entity THEN 'Entity' WHEN t:Fact THEN 'Fact' END AS target_kind,
CASE WHEN s:Fact THEN coalesce(s.predicate, '')
WHEN t:Fact THEN coalesce(t.predicate, '') ELSE '' END AS fact_predicate,
CASE WHEN s:Fact THEN coalesce(s.fact, '')
WHEN t:Fact THEN coalesce(t.fact, '') ELSE '' END AS fact_text,
CASE WHEN s:Fact THEN coalesce(s.description, '')
WHEN t:Fact THEN coalesce(t.description, '') ELSE '' END AS fact_description,
CASE WHEN s:Fact THEN coalesce(s.confidence, 0.0)
WHEN t:Fact THEN coalesce(t.confidence, 0.0) ELSE 0.0 END AS fact_confidence,
CASE WHEN s:Fact THEN coalesce(s.meeting_date, '')
WHEN t:Fact THEN coalesce(t.meeting_date, '') ELSE '' END AS fact_date,
CASE WHEN s:Fact THEN coalesce(s.meeting_id, '')
WHEN t:Fact THEN coalesce(t.meeting_id, '') ELSE '' END AS fact_meeting_id
LIMIT $limit_edges
""",
ids=list(all_raw_ids),
limit_edges=limit_edges,
)
degree_map: Dict[str, int] = {}
for row in edges_raw:
src = row.get("source", "")
tgt = row.get("target", "")
degree_map[src] = degree_map.get(src, 0) + 1
degree_map[tgt] = degree_map.get(tgt, 0) + 1
for node in nodes:
node["degree"] = degree_map.get(node["id"], node.get("degree", 0))
edges = []
for idx, row in enumerate(edges_raw, start=1):
sk = row.get("source_kind", "")
tk = row.get("target_kind", "")
edges.append({
"id": f"edge_{idx}",
"source": f"{sk}:{row['source_raw']}" if sk and row.get("source_raw") else "",
"target": f"{tk}:{row['target_raw']}" if tk and row.get("target_raw") else "",
"predicate": row.get("predicate", ""),
"fact": row.get("fact_text", "") or row.get("fact_description", "") or "",
"description": row.get("fact_description", "") or "",
"confidence": row.get("fact_confidence", 0.0),
"date": row.get("fact_date", "") or "",
"meeting_id": row.get("fact_meeting_id", "") or "",
})
return {
"nodes": nodes,
"edges": edges,
"stats": self.get_stats(),
"query": query,
}
def format_search_context(self, question: str, top_k: int = 5) -> str:
results = self.hybrid_search(question, limit=top_k)
if not results:
return ""
lines = []
for idx, row in enumerate(results, start=1):
date = row.get("date", "")
meeting_title = row.get("meeting_title", "")
title = row.get("title", row.get("kind", "item"))
suffix = f" ({date})" if date else ""
source = f" | 来源会议: {meeting_title}" if meeting_title else ""
lines.append(
f"[{idx}] {title}{suffix}{source}\n"
f"{row.get('text', '')}\n"
f"score={row.get('score', 0):.4f}, semantic={row.get('semantic_score', 0):.4f}, keyword={row.get('keyword_score', 0):.4f}"
)
return "\n\n".join(lines)
def _load_fact_candidates(self) -> List[Dict[str, Any]]:
return self.run_query(
"""
MATCH (ep:Episode)-[:HAS_FACT]->(f:Fact)
OPTIONAL MATCH (s:Entity)-[:FACT_SOURCE]->(f)
OPTIONAL MATCH (f)-[:FACT_TARGET]->(o:Entity)
RETURN 'fact' AS kind,
coalesce(s.name + ' -[' + coalesce(f.predicate, '') + ']-> ' + o.name, f.fact) AS title,
coalesce(
f.description + CASE
WHEN size(coalesce(f.qualifiers, [])) > 0 THEN ' | ' + reduce(acc = '', item IN f.qualifiers |
acc + CASE WHEN acc = '' THEN item ELSE '; ' + item END
)
ELSE ''
END,
f.fact,
''
) AS text,
ep.date AS date,
ep.title AS meeting_title,
f.fact_embedding AS embedding
"""
)
def _load_entity_candidates(self) -> List[Dict[str, Any]]:
return self.run_query(
"""
MATCH (e:Entity)
OPTIONAL MATCH (ep:Episode)-[:MENTIONS]->(e)
RETURN 'entity' AS kind,
e.name AS title,
coalesce(e.summary, e.description, '') AS text,
max(ep.date) AS date,
head(collect(DISTINCT ep.title)) AS meeting_title,
e.name_embedding AS embedding
"""
)
def _load_episode_candidates(self) -> List[Dict[str, Any]]:
return self.run_query(
"""
MATCH (m:Meeting)-[:HAS_EPISODE]->(ep:Episode)
RETURN 'episode' AS kind,
m.title AS title,
coalesce(ep.summary, ep.content, '') AS text,
ep.date AS date,
m.title AS meeting_title,
ep.content_embedding AS embedding
"""
)
@staticmethod
def _entity_summary(entity: dict) -> str:
entity_type = entity.get("entity_type", "").strip()
name = entity.get("name", "").strip()
description = entity.get("description", "").strip()
parts = [part for part in [entity_type, name, description] if part]
return " | ".join(parts)
@staticmethod
def _fact_text(relation: dict) -> str:
subject = relation.get("subject", "").strip()
predicate = relation.get("predicate", "").strip()
obj = relation.get("object", "").strip()
description = relation.get("description", "").strip()
fact = relation.get("fact", "").strip() or f"{subject} {predicate} {obj}".strip()
qualifiers = relation.get("qualifiers", [])
qualifier_text = "; ".join(item for item in qualifiers if item)
if description and qualifier_text:
return f"{fact}. {description}. {qualifier_text}"
if description:
return f"{fact}. {description}"
if qualifier_text:
return f"{fact}. {qualifier_text}"
return fact
@staticmethod
def _build_episode_text(meeting_data: dict) -> str:
payload = {
"title": meeting_data.get("title", ""),
"date": meeting_data.get("date", ""),
"participants": meeting_data.get("participants", []),
"summary": meeting_data.get("summary", ""),
"entities": meeting_data.get("entities", []),
"relations": meeting_data.get("relations", []),
"action_items": meeting_data.get("action_items", []),
"metrics": meeting_data.get("metrics", []),
"decisions": meeting_data.get("decisions", []),
"original_text": meeting_data.get("_original_text", ""),
}
return json.dumps(payload, ensure_ascii=False)
graph_store = Neo4jGraphStore()

View File

@ -0,0 +1,184 @@
import hashlib
import logging
from typing import Callable, Optional
from meeting_memory.config import config
from meeting_memory.extractor import MeetingExtraction, extract_meeting_info
from meeting_memory.graph_store import graph_store
from meeting_memory.meeting_state import MeetingStateStore
from meeting_memory.raw_store import raw_meeting_store
logger = logging.getLogger(__name__)
state_store = MeetingStateStore(config.state_path)
ProgressCallback = Callable[[int, int, str], None]
class MeetingProcessor:
def process_meeting_file(self, filepath: str, force: bool = False) -> Optional[str]:
with open(filepath, "r", encoding="utf-8") as file_obj:
text = file_obj.read()
return self.process_meeting_text(text, force=force)
def process_meeting_text(
self,
text: str,
force: bool = False,
interactive: bool = True,
progress_callback: Optional[ProgressCallback] = None,
) -> Optional[str]:
def report(step: int, message: str) -> None:
if progress_callback:
progress_callback(step, 7, message)
print(f"[{step}/7] {message}")
report(1, "计算内容哈希")
content_hash = self._compute_content_hash(text)
if not force and state_store.has_content_hash(content_hash):
logger.info("Duplicate content hash skipped: %s", content_hash[:12])
return None
if not force:
report(2, "Neo4j 语义相似去重检索")
similar = graph_store.find_similar_episode(text, threshold=0.92)
if similar:
meta = similar["metadata"]
if not interactive:
logger.info(
"Skipped similar meeting in non-interactive mode: %s",
meta.get("title", ""),
)
return None
print(
f"\n发现相似会议:{meta.get('title', '')} ({meta.get('date', '')}) "
f"相似度 {similar['score']:.2%}"
)
while True:
choice = input("选择 [s]跳过 / [o]覆盖(默认 s").strip().lower() or "s"
if choice == "s":
logger.info("Skipped similar meeting: %s", meta.get("title", ""))
return None
if choice == "o":
force = True
break
print("请输入 s 或 o。")
else:
report(2, "跳过语义去重,按覆盖模式继续")
report(3, "调用大模型抽取结构化信息")
meeting_data = self._extract(text)
if not meeting_data:
logger.error("Failed to extract meeting information")
return None
data_dict = meeting_data.model_dump()
data_dict["_content_hash"] = content_hash
data_dict["_graph_meeting_id"] = graph_store.meeting_id(data_dict)
report(4, "检查标题和日期重复")
should_skip = self._handle_duplicate(data_dict, force=force, interactive=interactive)
if should_skip:
return None
meeting_title = data_dict.get("title", "")
meeting_date = data_dict.get("date", "")
report(5, "归档原始会议文本")
raw_path = raw_meeting_store.save(text, title=meeting_title, date=meeting_date)
data_dict["_original_text"] = text
data_dict["_original_text_path"] = raw_path
meeting_filename = f"{graph_store.meeting_id(data_dict)}.md"
report(6, "合并行动项和指标状态")
data_dict["action_items"] = state_store.merge_action_items(
data_dict.get("action_items", []),
meeting_title,
meeting_date,
meeting_filename,
)
data_dict["metrics"] = state_store.merge_metrics(
data_dict.get("metrics", []),
meeting_title,
meeting_date,
meeting_filename,
)
state_store.add_content_hash(content_hash, meeting_title, meeting_date, meeting_filename)
state_store.save()
report(7, "写入 Neo4j 图谱和检索数据")
graph_store.upsert_meeting_subgraph(data_dict)
logger.info("Meeting processed: %s", meeting_title)
return raw_path
def _handle_duplicate(self, data_dict: dict, force: bool, interactive: bool = True) -> bool:
title = data_dict.get("title", "")
date = data_dict.get("date", "")
existing = graph_store.get_meeting(title, date)
if not existing:
return False
if force:
logger.info("Duplicate meeting found; overwriting in force mode: %s", title)
self._remove_old(data_dict, existing)
return False
if not interactive:
logger.info("Skipped duplicate meeting in non-interactive mode: %s", title)
return True
print(f"\n发现重复会议:{title} ({date})")
while True:
choice = input("选择 [s]跳过 / [o]覆盖(默认 s").strip().lower() or "s"
if choice == "s":
logger.info("Skipped duplicate meeting: %s", title)
return True
if choice == "o":
self._remove_old(data_dict, existing)
return False
print("请输入 s 或 o。")
def _remove_old(self, data_dict: dict, existing: Optional[dict] = None) -> None:
meeting_id = graph_store.meeting_id(data_dict)
graph_store.remove_meeting_subgraph(meeting_id)
new_hash = data_dict.get("_content_hash", "")
if new_hash:
state_store.remove_content_hash(new_hash)
if existing:
old_hash = existing.get("content_hash", "")
if old_hash and old_hash != new_hash:
state_store.remove_content_hash(old_hash)
logger.info("Removed old meeting artifacts: %s", data_dict.get("title", ""))
def _compute_content_hash(self, text: str) -> str:
normalized = text.strip().replace("\r\n", "\n")
return hashlib.sha256(normalized.encode("utf-8")).hexdigest()
def _extract(self, text: str) -> Optional[MeetingExtraction]:
try:
return extract_meeting_info(text, stream=True)
except Exception as exc:
logger.error("LLM extraction failed: %s", exc)
return None
def query(self, question: str, top_k: int = 3) -> str:
return graph_store.format_search_context(question, top_k=top_k)
def stats(self) -> dict:
return {
"graph": graph_store.get_stats(),
"state": state_store.get_stats(),
"raw_dir": config.storage.raw_dir,
"state_path": config.state_path,
}
meeting_processor = MeetingProcessor()

View File

@ -2,8 +2,8 @@ import hashlib
import json
import logging
import os
from datetime import datetime
from typing import Dict, List, Optional
import re
from typing import List, Optional
logger = logging.getLogger(__name__)
@ -28,8 +28,8 @@ class MeetingStateStore:
try:
with open(self.state_path, "r", encoding="utf-8") as f:
return json.load(f)
except Exception as e:
logger.warning(f"加载状态文件失败,将创建新状态: {e}")
except Exception as exc:
logger.warning("Failed to load state file, creating a new one: %s", exc)
return {
"action_items": {},
"metrics": {},
@ -55,11 +55,10 @@ class MeetingStateStore:
return series_name
def _detect_series(self, title: str) -> str:
import re
cleaned = re.sub(r"\d{4}\w+期)", "", title)
cleaned = re.sub(r"\(\d{4}\w+期\)", "", cleaned)
cleaned = re.sub(r"\d{4}\w+期", "", cleaned)
cleaned = re.sub(r"\d{4}年第\w+次", "", cleaned)
cleaned = re.sub(r"\uFF08\d{4}\u7B2C\w+\u671F\uFF09", "", title)
cleaned = re.sub(r"\(\d{4}\u7B2C\w+\u671F\)", "", cleaned)
cleaned = re.sub(r"\d{4}\u7B2C\w+\u671F", "", cleaned)
cleaned = re.sub(r"\d{4}\u5E74\u7B2C\w+\u6B21", "", cleaned)
cleaned = cleaned.strip("-_ ")
return cleaned or title
@ -122,26 +121,25 @@ class MeetingStateStore:
) -> List[dict]:
merged = []
for m in new_metrics:
metric_name = m.get("metric_name", "")
owner = m.get("owner", "")
for metric in new_metrics:
metric_name = metric.get("metric_name", "")
owner = metric.get("owner", "")
mid = _metric_id(metric_name, owner)
history_entry = {
"date": meeting_date,
"meeting": meeting_filename,
"value": m.get("value", ""),
"target": m.get("target", ""),
"trend": m.get("trend", ""),
"value": metric.get("value", ""),
"target": metric.get("target", ""),
"trend": metric.get("trend", ""),
}
existing = self._state["metrics"].get(mid)
if existing:
existing["history"].append(history_entry)
existing["latest"] = history_entry
item = m
item["_metric_id"] = mid
item["_history"] = list(existing["history"])
metric["_metric_id"] = mid
metric["_history"] = list(existing["history"])
else:
self._state["metrics"][mid] = {
"metric_id": mid,
@ -150,10 +148,10 @@ class MeetingStateStore:
"history": [history_entry],
"latest": history_entry,
}
m["_metric_id"] = mid
m["_history"] = [history_entry]
metric["_metric_id"] = mid
metric["_history"] = [history_entry]
merged.append(m)
merged.append(metric)
return merged
@ -186,4 +184,4 @@ class MeetingStateStore:
"metrics_tracked": len(self._state["metrics"]),
"meeting_series": len(self._state["meeting_series"]),
"content_hashes": len(self._state["content_hashes"]),
}
}

View File

@ -0,0 +1,55 @@
import logging
import os
from datetime import datetime
from meeting_memory.config import config
logger = logging.getLogger(__name__)
def _sanitize_filename(name: str) -> str:
if not name:
return "untitled"
invalid = '<>:"/\\|?*'
for char in invalid:
name = name.replace(char, "")
name = name.replace(" ", "_").strip("._")
return name or "untitled"
class RawMeetingStore:
def __init__(self):
self.raw_dir = config.storage.raw_dir
os.makedirs(self.raw_dir, exist_ok=True)
def save(self, text: str, title: str = "", date: str = "") -> str:
os.makedirs(self.raw_dir, exist_ok=True)
date_str = date or datetime.now().strftime("%Y-%m-%d")
safe_date = _sanitize_filename(date_str)[:40]
safe_title = _sanitize_filename(title)[:60]
filename = f"{safe_date}_{safe_title}.md"
filepath = os.path.join(self.raw_dir, filename)
content = "\n".join(
[
"---",
f'title: "{title}"',
f'date: "{date_str}"',
"status: archived",
"---",
"",
f"# {title or 'Untitled Meeting'}",
"",
text,
"",
]
)
with open(filepath, "w", encoding="utf-8") as f:
f.write(content)
logger.info("Saved raw meeting text: %s", filepath)
return filepath
raw_meeting_store = RawMeetingStore()

View File

@ -0,0 +1,3 @@
from meeting_memory.services.embedding_service import EmbeddingService, embedding_service
__all__ = ["EmbeddingService", "embedding_service"]

View File

@ -0,0 +1,29 @@
from typing import List, Optional
from openai import OpenAI as OpenAIClient
from meeting_memory.config import config
class EmbeddingService:
def __init__(
self,
model: Optional[str] = None,
api_key: Optional[str] = None,
api_base: Optional[str] = None,
):
self._client = OpenAIClient(
api_key=api_key or config.embedding.api_key or "not-needed",
base_url=api_base or config.embedding.api_base or None,
)
self._model = model or config.embedding.model
def embed_text(self, text: str) -> List[float]:
response = self._client.embeddings.create(
model=self._model,
input=text,
)
return response.data[0].embedding
embedding_service = EmbeddingService()

View File

@ -0,0 +1,3 @@
from meeting_memory.web_demo.server import run_demo_server
__all__ = ["run_demo_server"]

View File

@ -0,0 +1,400 @@
import json
import logging
import mimetypes
import sys
import threading
import time
import uuid
from http import HTTPStatus
from http.server import SimpleHTTPRequestHandler, ThreadingHTTPServer
from pathlib import Path
from urllib.parse import parse_qs, urlparse
if __package__ in (None, ""):
sys.path.insert(0, str(Path(__file__).resolve().parents[2]))
from meeting_memory.config import config
from meeting_memory.graph_store import graph_store
from meeting_memory.meeting_processor import meeting_processor, state_store
logger = logging.getLogger(__name__)
STATIC_DIR = Path(__file__).resolve().parent / "static"
RAW_DIR = Path(config.storage.raw_dir)
IMPORT_JOBS = {}
IMPORT_JOBS_LOCK = threading.Lock()
class GraphDemoHandler(SimpleHTTPRequestHandler):
def __init__(self, *args, **kwargs):
super().__init__(*args, directory=str(STATIC_DIR), **kwargs)
def do_GET(self):
parsed = urlparse(self.path)
if parsed.path == "/api/dashboard":
self._handle_dashboard()
return
if parsed.path == "/api/graph":
self._handle_graph(parsed.query)
return
if parsed.path == "/api/graph-types":
self._handle_graph_types()
return
if parsed.path == "/api/graph-kinds":
self._handle_graph_kinds()
return
if parsed.path == "/api/search":
self._handle_search(parsed.query)
return
if parsed.path == "/api/meetings":
self._handle_meetings(parsed.query)
return
if parsed.path == "/api/meeting":
self._handle_meeting(parsed.query)
return
if parsed.path == "/api/import-status":
self._handle_import_status(parsed.query)
return
if parsed.path in ("/", "/index.html"):
self.path = "/index.html"
elif parsed.path == "/graph":
self.path = "/graph.html"
super().do_GET()
def do_POST(self):
parsed = urlparse(self.path)
if parsed.path == "/api/import":
self._handle_import()
return
self.send_error(HTTPStatus.NOT_FOUND, "Unsupported endpoint")
def log_message(self, format, *args):
logger.info("%s - %s", self.address_string(), format % args)
def end_headers(self):
self.send_header("Cache-Control", "no-store")
super().end_headers()
def guess_type(self, path):
guessed = super().guess_type(path)
if guessed == "application/octet-stream":
return mimetypes.guess_type(path)[0] or guessed
return guessed
def _handle_graph(self, raw_query: str):
params = parse_qs(raw_query)
query = (params.get("q") or [""])[0].strip()
limit_nodes = self._safe_int((params.get("limit_nodes") or ["80"])[0], default=80)
limit_edges = self._safe_int((params.get("limit_edges") or ["160"])[0], default=160)
entity_types = params.get("entity_types")
kinds = params.get("kinds")
payload = graph_store.get_graph_snapshot(
query=query,
entity_types=entity_types if entity_types else None,
kinds=kinds if kinds else None,
limit_nodes=limit_nodes,
limit_edges=limit_edges,
)
self._write_json(payload)
def _handle_graph_types(self):
types = graph_store.get_entity_types()
self._write_json({"types": types})
def _handle_graph_kinds(self):
kinds = graph_store.get_graph_kinds()
self._write_json({"kinds": kinds})
def _handle_search(self, raw_query: str):
params = parse_qs(raw_query)
query = (params.get("q") or [""])[0].strip()
limit = self._safe_int((params.get("limit") or ["8"])[0], default=8)
payload = {
"query": query,
"results": graph_store.hybrid_search(query, limit=limit) if query else [],
}
self._write_json(payload)
def _handle_dashboard(self):
meetings = _load_recent_meetings(limit=6)
action_items = _state_items("action_items", limit=6)
metrics = _state_items("metrics", limit=6)
series = _load_series(limit=6)
graph_stats = graph_store.get_stats()
payload = {
"graph": graph_stats,
"state": state_store.get_stats(),
"meetings": meetings,
"action_items": action_items,
"metrics": metrics,
"series": series,
"highlights": _build_highlights(meetings, action_items, metrics, graph_stats),
}
self._write_json(payload)
def _handle_meetings(self, raw_query: str):
params = parse_qs(raw_query)
limit = self._safe_int((params.get("limit") or ["24"])[0], default=24)
self._write_json({"meetings": _load_recent_meetings(limit=limit)})
def _handle_meeting(self, raw_query: str):
params = parse_qs(raw_query)
filename = (params.get("filename") or [""])[0].strip()
if not filename:
self._write_json({"error": "filename is required"}, status=HTTPStatus.BAD_REQUEST)
return
file_path = RAW_DIR / filename
if not file_path.exists() or file_path.parent != RAW_DIR:
self._write_json({"error": "meeting not found"}, status=HTTPStatus.NOT_FOUND)
return
self._write_json(_serialize_meeting(file_path, include_content=True))
def _handle_import(self):
payload = self._read_json_body()
if payload is None:
self._write_json({"ok": False, "error": "invalid json body"}, status=HTTPStatus.BAD_REQUEST)
return
text = str(payload.get("text") or "").strip()
force = bool(payload.get("force", False))
if not text:
self._write_json({"ok": False, "error": "text is required"}, status=HTTPStatus.BAD_REQUEST)
return
job_id = str(uuid.uuid4())
with IMPORT_JOBS_LOCK:
IMPORT_JOBS[job_id] = {
"job_id": job_id,
"status": "queued",
"message": "任务已创建,等待处理",
"archive_path": "",
"created_at": time.time(),
"updated_at": time.time(),
"steps": [],
}
thread = threading.Thread(
target=_run_import_job,
args=(job_id, text, force),
daemon=True,
)
thread.start()
self._write_json({"ok": True, "job_id": job_id, "status": "queued"})
def _handle_import_status(self, raw_query: str):
params = parse_qs(raw_query)
job_id = (params.get("job_id") or [""])[0].strip()
if not job_id:
self._write_json({"error": "job_id is required"}, status=HTTPStatus.BAD_REQUEST)
return
with IMPORT_JOBS_LOCK:
payload = IMPORT_JOBS.get(job_id)
if not payload:
self._write_json({"error": "job not found"}, status=HTTPStatus.NOT_FOUND)
return
self._write_json(payload)
def _read_json_body(self):
length = self._safe_int(self.headers.get("Content-Length"), default=0)
if length <= 0:
return None
try:
body = self.rfile.read(length)
return json.loads(body.decode("utf-8"))
except Exception:
return None
def _write_json(self, payload, status: HTTPStatus = HTTPStatus.OK):
body = json.dumps(payload, ensure_ascii=False).encode("utf-8")
self.send_response(status)
self.send_header("Content-Type", "application/json; charset=utf-8")
self.send_header("Content-Length", str(len(body)))
self.end_headers()
self.wfile.write(body)
@staticmethod
def _safe_int(raw_value, default: int) -> int:
try:
value = int(raw_value)
except (TypeError, ValueError):
return default
return max(0, value)
def run_demo_server(host: str = "127.0.0.1", port: int = 8765) -> None:
server = ThreadingHTTPServer((host, port), GraphDemoHandler)
logger.info("Graph demo server started at http://%s:%s", host, port)
print(f"Graph demo server started: http://{host}:{port}")
print("Press Ctrl+C to stop.")
try:
server.serve_forever()
except KeyboardInterrupt:
print("\nServer stopped.")
finally:
server.server_close()
def _run_import_job(job_id: str, text: str, force: bool) -> None:
def update(status: str | None = None, message: str | None = None, *, append_step: bool = False):
with IMPORT_JOBS_LOCK:
job = IMPORT_JOBS.get(job_id)
if not job:
return
if status:
job["status"] = status
if message:
job["message"] = message
if append_step:
job["steps"].append(message)
job["updated_at"] = time.time()
def progress(step: int, total: int, message: str):
update(
"running",
f"步骤 {step}/{total}{message}",
append_step=True,
)
update("running", "开始处理会议文本", append_step=True)
try:
archive_path = meeting_processor.process_meeting_text(
text,
force=force,
interactive=False,
progress_callback=progress,
)
if not archive_path:
update("error", "处理被跳过:可能是重复内容,或结构化抽取失败", append_step=True)
return
with IMPORT_JOBS_LOCK:
job = IMPORT_JOBS.get(job_id)
if job:
job["status"] = "done"
job["message"] = "导入完成"
job["archive_path"] = archive_path
job["updated_at"] = time.time()
job["dashboard"] = {
"graph": graph_store.get_stats(),
"state": state_store.get_stats(),
"meetings": _load_recent_meetings(limit=6),
}
job["steps"].append("导入完成")
except Exception as exc:
logger.exception("Meeting import failed")
update("error", f"处理失败:{exc}", append_step=True)
def _load_recent_meetings(limit: int = 6):
if not RAW_DIR.exists():
return []
files = sorted(
RAW_DIR.glob("*.md"),
key=lambda path: path.stat().st_mtime,
reverse=True,
)
return [_serialize_meeting(path) for path in files[:limit]]
def _serialize_meeting(path: Path, include_content: bool = False):
raw_text = path.read_text(encoding="utf-8")
title = ""
date = ""
lines = raw_text.splitlines()
for line in lines[:12]:
if line.startswith('title: "'):
title = line[len('title: "') : -1]
elif line.startswith('date: "'):
date = line[len('date: "') : -1]
content_start = 0
for idx, line in enumerate(lines):
if line.startswith("# "):
content_start = idx + 2
if not title:
title = line[2:].strip()
break
body = "\n".join(lines[content_start:]).strip()
snippet = body[:180] + ("..." if len(body) > 180 else "")
payload = {
"filename": path.name,
"title": title or path.stem,
"date": date,
"snippet": snippet,
"updated_at": int(path.stat().st_mtime),
}
if include_content:
payload["content"] = body
return payload
def _state_items(key: str, limit: int = 6):
bucket = getattr(state_store, "_state", {}).get(key, {})
items = []
for item in bucket.values():
latest = item.get("latest", {})
items.append({**item, "latest": latest})
items.sort(key=lambda row: str(row.get("latest", {}).get("date", "")), reverse=True)
return items[:limit]
def _load_series(limit: int = 6):
series = getattr(state_store, "_state", {}).get("meeting_series", {})
rows = []
for name, payload in series.items():
rows.append(
{
"name": name,
"latest_date": payload.get("latest_date", ""),
"processed_titles": payload.get("processed_titles", []),
"meeting_count": len(payload.get("processed_titles", [])),
}
)
rows.sort(key=lambda row: row.get("latest_date", ""), reverse=True)
return rows[:limit]
def _build_highlights(meetings, action_items, metrics, graph_stats):
latest_meeting = meetings[0] if meetings else {}
top_action = action_items[0] if action_items else {}
top_metric = metrics[0] if metrics else {}
return [
{
"label": "最近归档",
"value": latest_meeting.get("title", "暂无会议"),
"meta": latest_meeting.get("date", ""),
},
{
"label": "待跟进事项",
"value": str(len(action_items)),
"meta": top_action.get("task", ""),
},
{
"label": "图谱节点",
"value": str(graph_stats.get("entities", 0)),
"meta": "Neo4j 实体总数",
},
{
"label": "关键指标",
"value": str(len(metrics)),
"meta": top_metric.get("metric_name", ""),
},
]
if __name__ == "__main__":
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
datefmt="%H:%M:%S",
)
run_demo_server()

View File

@ -0,0 +1,399 @@
const dashboardUrl = "/api/dashboard";
let currentImportJobId = null;
let importPollTimer = null;
const highlightGrid = document.getElementById("highlightGrid");
const statsList = document.getElementById("statsList");
const meetingCards = document.getElementById("meetingCards");
const actionList = document.getElementById("actionList");
const metricList = document.getElementById("metricList");
const seriesList = document.getElementById("seriesList");
const searchForm = document.getElementById("searchForm");
const searchInput = document.getElementById("searchInput");
const searchResults = document.getElementById("searchResults");
const refreshDashboardBtn = document.getElementById("refreshDashboardBtn");
const importForm = document.getElementById("importForm");
const importFieldset = document.getElementById("importFieldset");
const importSubmitBtn = document.getElementById("importSubmitBtn");
const importFile = document.getElementById("importFile");
const importText = document.getElementById("importText");
const importForce = document.getElementById("importForce");
const importStatus = document.getElementById("importStatus");
const importProgress = document.getElementById("importProgress");
const meetingDialog = document.getElementById("meetingDialog");
const closeDialogBtn = document.getElementById("closeDialogBtn");
const dialogTitle = document.getElementById("dialogTitle");
const dialogMeta = document.getElementById("dialogMeta");
const dialogContent = document.getElementById("dialogContent");
function escapeHtml(value) {
return String(value ?? "")
.replaceAll("&", "&amp;")
.replaceAll("<", "&lt;")
.replaceAll(">", "&gt;")
.replaceAll('"', "&quot;")
.replaceAll("'", "&#39;");
}
function emptyMarkup(message) {
return `<div class="empty-state">${escapeHtml(message)}</div>`;
}
function setImportBusy(isBusy) {
importFieldset.disabled = isBusy;
importSubmitBtn.textContent = isBusy ? "处理中..." : "开始导入";
}
function setImportStatus(message, kind = "info") {
importStatus.textContent = message;
importStatus.dataset.kind = kind;
}
function renderProgress(steps = []) {
if (!steps.length) {
importProgress.innerHTML = emptyMarkup("导入开始后,这里会实时显示处理步骤。");
return;
}
importProgress.innerHTML = steps.map((step, index) => `
<div class="progress-item">
<span class="progress-index">${index + 1}</span>
<span>${escapeHtml(step)}</span>
</div>
`).join("");
}
function renderHighlights(items) {
if (!items?.length) {
highlightGrid.innerHTML = emptyMarkup("暂无概览数据");
return;
}
const colors = ["#4a90d9", "#34c759", "#ff9500", "#53c2da"];
highlightGrid.innerHTML = items.map((item, i) => `
<article class="highlight-card" style="--card-accent: ${colors[i % colors.length]}">
<div class="hc-bar"></div>
<p class="eyebrow">${escapeHtml(item.label)}</p>
<strong>${escapeHtml(item.value)}</strong>
<p>${escapeHtml(item.meta || "")}</p>
</article>
`).join("");
}
function renderStats(graph = {}, state = {}) {
const cards = [
{ label: "Neo4j", value: graph.enabled ? "在线" : "离线", icon: "⬡", color: graph.enabled ? "#34c759" : "#b3261e" },
{ label: "会议", value: graph.meetings ?? 0, icon: "📋", color: "#4a90d9" },
{ label: "实体", value: graph.entities ?? 0, icon: "◆", color: "#53c2da" },
{ label: "关系", value: graph.facts ?? 0, icon: "↗", color: "#ff9500" },
{ label: "行动项", value: state.action_items_tracked ?? 0, icon: "☐", color: "#7f8bff" },
{ label: "指标", value: state.metrics_tracked ?? 0, icon: "📊", color: "#af52de" },
];
statsList.innerHTML = cards.map((c) => `
<div class="mini-stat" style="--stat-color: ${c.color}">
<span class="ms-icon">${c.icon}</span>
<div class="ms-body">
<strong>${escapeHtml(c.value)}</strong>
<p>${escapeHtml(c.label)}</p>
</div>
</div>
`).join("");
}
function renderMeetings(items) {
if (!items?.length) {
meetingCards.innerHTML = emptyMarkup("还没有归档会议");
return;
}
meetingCards.innerHTML = items.map((item) => `
<article class="meeting-card" data-filename="${escapeHtml(item.filename)}">
<div class="mc-date">${escapeHtml(item.date || "??")}</div>
<div class="mc-body">
<h4>${escapeHtml(item.title)}</h4>
<p>${escapeHtml(item.snippet || "暂无摘要")}</p>
</div>
</article>
`).join("");
}
function renderActionItems(items) {
if (!items?.length) {
actionList.innerHTML = emptyMarkup("暂无行动项");
return;
}
const priorityColors = { "高": "#b3261e", "中": "#ff9500", "低": "#34c759" };
actionList.innerHTML = items.map((item) => {
const pri = item.latest?.priority || "普通";
const priColor = priorityColors[pri] || "#68709d";
return `
<article class="list-item">
<div class="li-priority" style="--pri-color: ${priColor}"></div>
<div class="li-body">
<strong>${escapeHtml(item.task || "未命名任务")}</strong>
<p>${escapeHtml(item.assignee || "未分配")} · ${escapeHtml(item.series || "未归类")}</p>
<div class="chip-row">
<span class="chip status-${(item.latest?.status || "unknown").toLowerCase()}">${escapeHtml(item.latest?.status || "未知")}</span>
<span class="chip">${escapeHtml(pri)}</span>
${item.latest?.deadline ? `<span class="chip">${escapeHtml(item.latest.deadline)}</span>` : ""}
</div>
</div>
</article>`;
}).join("");
}
function renderMetrics(items) {
if (!items?.length) {
metricList.innerHTML = emptyMarkup("暂无指标");
return;
}
metricList.innerHTML = items.map((item) => {
const val = parseFloat(item.latest?.value) || 0;
const tgt = parseFloat(item.latest?.target) || 100;
const pct = Math.min(100, Math.round((val / tgt) * 100));
return `
<article class="metric-card">
<div class="mc-head">
<strong>${escapeHtml(item.metric_name || "未命名指标")}</strong>
<span class="mc-value">${escapeHtml(item.latest?.value || "—")}</span>
</div>
<p>${escapeHtml(item.owner || "未指定负责人")}</p>
<div class="mc-bar-track">
<div class="mc-bar-fill" style="width: ${pct}%"></div>
</div>
<div class="chip-row">
${item.latest?.target ? `<span class="chip">目标 ${escapeHtml(item.latest.target)}</span>` : ""}
${item.latest?.trend ? `<span class="chip">${escapeHtml(item.latest.trend)}</span>` : ""}
</div>
</article>`;
}).join("");
}
function renderSeries(items) {
if (!items?.length) {
seriesList.innerHTML = emptyMarkup("暂无会议系列");
return;
}
seriesList.innerHTML = items.map((item) => `
<article class="series-card">
<div class="sc-count">${escapeHtml(item.meeting_count)}</div>
<div class="sc-body">
<strong>${escapeHtml(item.name)}</strong>
<p>最近${escapeHtml(item.latest_date || "未知")}</p>
</div>
</article>
`).join("");
}
function updateDashboard(payload) {
renderHighlights(payload.highlights || []);
renderStats(payload.graph || {}, payload.state || {});
renderMeetings(payload.meetings || []);
renderActionItems(payload.action_items || []);
renderMetrics(payload.metrics || []);
renderSeries(payload.series || []);
}
async function loadDashboard() {
const response = await fetch(dashboardUrl);
const payload = await response.json();
updateDashboard(payload);
}
async function runSearch(query) {
if (!query.trim()) {
searchResults.innerHTML = emptyMarkup("输入问题后,这里会展示混合检索结果。");
return;
}
searchResults.innerHTML = emptyMarkup("正在检索...");
const response = await fetch(`/api/search?q=${encodeURIComponent(query)}&limit=8`);
const payload = await response.json();
const items = payload.results || [];
if (!items.length) {
searchResults.innerHTML = emptyMarkup("没有找到匹配结果");
return;
}
searchResults.innerHTML = items.map((item) => `
<article class="result-card">
<div class="rc-kind">${escapeHtml(item.kind || "item")}</div>
<strong>${escapeHtml(item.title || "结果")}</strong>
<p>${escapeHtml(item.text || "")}</p>
<div class="meta-row">
${item.meeting_title ? `<span class="chip">${escapeHtml(item.meeting_title)}</span>` : ""}
${item.date ? `<span class="chip">${escapeHtml(item.date)}</span>` : ""}
<span class="chip">score ${escapeHtml(item.score)}</span>
</div>
</article>
`).join("");
}
async function openMeeting(filename) {
const response = await fetch(`/api/meeting?filename=${encodeURIComponent(filename)}`);
const payload = await response.json();
dialogTitle.textContent = payload.title || "会议详情";
dialogMeta.textContent = payload.date || payload.filename || "";
dialogContent.textContent = payload.content || "没有可展示的原文";
meetingDialog.showModal();
}
async function readImportText() {
const directText = importText.value.trim();
if (directText) {
return directText;
}
const file = importFile.files?.[0];
if (!file) {
return "";
}
return await file.text();
}
async function pollImportStatus(jobId) {
const response = await fetch(`/api/import-status?job_id=${encodeURIComponent(jobId)}`);
const payload = await response.json();
renderProgress(payload.steps || []);
if (payload.status === "done") {
currentImportJobId = null;
clearTimeout(importPollTimer);
importPollTimer = null;
setImportBusy(false);
setImportStatus(`导入完成:${payload.archive_path || "已归档"}`, "success");
importText.value = "";
importFile.value = "";
await loadDashboard();
return;
}
if (payload.status === "error") {
currentImportJobId = null;
clearTimeout(importPollTimer);
importPollTimer = null;
setImportBusy(false);
setImportStatus(payload.message || "导入失败", "error");
return;
}
setImportStatus(payload.message || "正在处理中...", "info");
importPollTimer = setTimeout(() => {
pollImportStatus(jobId).catch((error) => {
setImportBusy(false);
setImportStatus(`进度查询失败: ${error}`, "error");
});
}, 900);
}
async function submitImport() {
if (currentImportJobId) {
return;
}
const text = (await readImportText()).trim();
if (!text) {
setImportStatus("请先选择文件或粘贴会议文本。", "error");
return;
}
setImportBusy(true);
renderProgress(["任务已提交,准备开始处理"]);
setImportStatus("正在创建导入任务...", "info");
const response = await fetch("/api/import", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
text,
force: importForce.checked,
}),
});
const payload = await response.json();
if (!response.ok || !payload.ok) {
setImportBusy(false);
setImportStatus(payload.error || "导入失败", "error");
return;
}
currentImportJobId = payload.job_id;
setImportStatus("任务已创建,正在处理中...", "info");
await pollImportStatus(currentImportJobId);
}
meetingCards?.addEventListener("click", (event) => {
const card = event.target.closest("[data-filename]");
if (!card) {
return;
}
openMeeting(card.dataset.filename).catch((error) => {
dialogTitle.textContent = "加载失败";
dialogMeta.textContent = "";
dialogContent.textContent = String(error);
meetingDialog.showModal();
});
});
searchForm?.addEventListener("submit", (event) => {
event.preventDefault();
runSearch(searchInput.value).catch((error) => {
searchResults.innerHTML = emptyMarkup(`检索失败: ${error}`);
});
});
importForm?.addEventListener("submit", (event) => {
event.preventDefault();
submitImport().catch((error) => {
currentImportJobId = null;
setImportBusy(false);
setImportStatus(`导入失败: ${error}`, "error");
});
});
refreshDashboardBtn?.addEventListener("click", () => {
loadDashboard().catch((error) => {
highlightGrid.innerHTML = emptyMarkup(`刷新失败: ${error}`);
});
});
closeDialogBtn?.addEventListener("click", () => meetingDialog.close());
// Unified panel tab switching
(function initUnifiedTabs() {
const tabs = document.querySelectorAll(".unified-tab");
const panes = {
import: document.getElementById("unifiedImport"),
search: document.getElementById("unifiedSearch"),
stats: document.getElementById("unifiedStats"),
};
tabs.forEach((tab) => {
tab.addEventListener("click", () => {
const target = tab.dataset.tab;
tabs.forEach((t) => t.classList.toggle("active", t === tab));
Object.values(panes).forEach((p) => p?.classList.add("hidden"));
const pane = panes[target];
if (pane) {
pane.classList.remove("hidden");
// Refresh stats layout when switching to stats tab
if (target === "stats" && typeof renderStats === "function") {
// stats already rendered by loadDashboard
}
}
});
});
})();
renderProgress([]);
loadDashboard().catch((error) => {
highlightGrid.innerHTML = emptyMarkup(`加载失败: ${error}`);
});

View File

@ -0,0 +1,69 @@
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Neo4j Graph Explorer</title>
<link rel="stylesheet" href="/styles.css">
</head>
<body>
<div class="shell graph-shell">
<aside class="sidebar">
<div class="brand">
<div class="brand-mark">G</div>
<div>
<p class="brand-kicker">Graph Explorer</p>
<h1>Neo4j 图谱</h1>
</div>
</div>
<nav class="nav">
<a class="nav-link" href="/index.html">总览面板</a>
<a class="nav-link active" href="/graph.html">图谱浏览</a>
</nav>
<div class="legend">
<p class="eyebrow" style="margin-bottom:6px">图例</p>
<span><i class="legend-dot meeting"></i>会议</span>
<span><i class="legend-dot episode"></i>片段</span>
<span><i class="legend-dot entity"></i>实体</span>
<span><i class="legend-dot fact"></i>事实</span>
</div>
</aside>
<main class="main">
<div class="graph-toolbar panel">
<form class="graph-controls" id="graphSearchForm">
<input id="graphQueryInput" type="text" placeholder="搜索节点名称或关键词…" class="search-input">
<label class="field-label">节点 <input id="graphNodeLimit" type="number" min="10" max="200" step="10" value="60"></label>
<label class="field-label">关系 <input id="graphEdgeLimit" type="number" min="10" max="300" step="10" value="120"></label>
<button class="btn" type="submit">更新</button>
</form>
<div class="graph-toolbar-row">
<div class="graph-type-filter" id="graphTypeFilter"></div>
<div class="graph-actions">
<span class="graph-meta" id="graphMeta"></span>
</div>
</div>
</div>
<div class="graph-layout">
<div class="panel graph-stage-panel">
<div class="graph-stage" id="graphStage">
<svg id="graphSvg" viewBox="0 0 960 640" preserveAspectRatio="xMidYMid meet"></svg>
</div>
</div>
<div class="panel detail-panel">
<div class="detail-card" id="graphDetail">
<div class="empty-state">点击节点或关系查看详情</div>
</div>
<div class="related-search" id="relatedSearch"></div>
</div>
</div>
</main>
</div>
<script src="/graph.js"></script>
</body>
</html>

View File

@ -0,0 +1,517 @@
const graphForm = document.getElementById("graphSearchForm");
const graphQueryInput = document.getElementById("graphQueryInput");
const graphNodeLimit = document.getElementById("graphNodeLimit");
const graphEdgeLimit = document.getElementById("graphEdgeLimit");
const graphSvg = document.getElementById("graphSvg");
const graphMeta = document.getElementById("graphMeta");
const graphDetail = document.getElementById("graphDetail");
const relatedSearch = document.getElementById("relatedSearch");
const graphTypeFilter = document.getElementById("graphTypeFilter");
let selectedEntityTypes = null;
let selectedKinds = null;
const TRUNCATE_LENGTH = 16;
function h(value) {
return String(value ?? "")
.replaceAll("&", "&amp;")
.replaceAll("<", "&lt;")
.replaceAll(">", "&gt;")
.replaceAll('"', "&quot;")
.replaceAll("'", "&#39;");
}
function truncate(text, maxLen) {
if (!text || text.length <= maxLen) return text || "";
return text.slice(0, maxLen - 1) + "…";
}
function empty(message) {
return `<div class="empty-state">${h(message)}</div>`;
}
async function loadGraphKinds() {
try {
const kindRes = await fetch("/api/graph-kinds");
const kindData = await kindRes.json();
const kinds = kindData.kinds || [];
if (kinds.length) {
selectedKinds = new Set(kinds.map((k) => k.kind));
}
let html = "";
if (kinds.length) {
html += `<span class="field-label" style="margin-right:4px">节点类型:</span>`;
html += kinds.map((k) =>
`<label><input type="checkbox" class="kind-cb" value="${h(k.kind)}" checked> ${h(k.kind)} (${k.count})</label>`
).join("");
html += `<label><input type="checkbox" class="kind-cb" id="kindSelectAll" checked> 全选</label>`;
}
graphTypeFilter.innerHTML = html;
graphTypeFilter.querySelectorAll(".kind-cb").forEach((cb) => {
cb.addEventListener("change", () => {
if (cb.id === "kindSelectAll") {
const checked = cb.checked;
graphTypeFilter.querySelectorAll(".kind-cb:not(#kindSelectAll)").forEach((c) => c.checked = checked);
selectedKinds = checked ? new Set(kinds.map((k) => k.kind)) : new Set();
} else {
if (!cb.checked) {
document.getElementById("kindSelectAll").checked = false;
selectedKinds.delete(cb.value);
} else {
selectedKinds.add(cb.value);
if (selectedKinds.size === kinds.length) {
document.getElementById("kindSelectAll").checked = true;
}
}
}
fetchGraph().catch((error) => renderInspector(empty(`图谱加载失败: ${error}`)));
});
});
} catch (_) {}
}
function renderInspector(content) {
graphDetail.innerHTML = content;
}
async function loadRelated(query) {
if (!query) {
relatedSearch.innerHTML = "";
return;
}
const response = await fetch(`/api/search?q=${encodeURIComponent(query)}&limit=4`);
const payload = await response.json();
const results = payload.results || [];
if (!results.length) {
relatedSearch.innerHTML = empty("没有更多相关检索结果");
return;
}
relatedSearch.innerHTML = `
<div class="panel-head">
<div>
<p class="eyebrow">Related</p>
<h3>相关检索</h3>
</div>
</div>
${results.map((item) => `
<article class="result-card">
<strong>${h(item.title || item.kind || "结果")}</strong>
<p>${h(item.text || "")}</p>
</article>
`).join("")}
`;
}
function renderGraph(payload) {
const nodes = payload.nodes || [];
const edges = payload.edges || [];
const stage = document.getElementById("graphStage");
const rect = stage.getBoundingClientRect();
const svgW = Math.max(600, rect.width - 4);
const svgH = Math.max(400, rect.height - 4);
graphSvg.setAttribute("viewBox", `0 0 ${svgW} ${svgH}`);
graphSvg.setAttribute("width", svgW);
graphSvg.setAttribute("height", svgH);
graphMeta.textContent = `节点 ${nodes.length} · 关系 ${edges.length} · Neo4j ${payload.stats?.enabled ? "已启用" : "未启用"}`;
if (!nodes.length) {
graphSvg.innerHTML = "";
renderInspector(empty("当前没有可显示的图谱数据"));
relatedSearch.innerHTML = "";
return;
}
const nodeRadius = (node) => Math.max(14, Math.min(28, 12 + (node.degree || 0) * 1.4));
const dataNodes = nodes.map((node, i) => ({
...node,
x: svgW / 2 + Math.cos((Math.PI * 2 * i) / nodes.length) * Math.min(svgW, svgH) * 0.28,
y: svgH / 2 + Math.sin((Math.PI * 2 * i) / nodes.length) * Math.min(svgW, svgH) * 0.26,
vx: 0, vy: 0,
pinned: false,
radius: nodeRadius(node),
}));
const nodeById = new Map(dataNodes.map((n) => [n.id, n]));
const dataEdges = edges
.map((e) => ({ ...e, sourceNode: nodeById.get(e.source), targetNode: nodeById.get(e.target) }))
.filter((e) => e.sourceNode && e.targetNode);
const n = dataNodes.length;
const area = svgW * svgH;
const repulsionStr = Math.max(3000, area / Math.max(n, 1));
const linkDist = Math.max(80, Math.min(200, 240 - n));
const linkStr = Math.min(0.06, 4 / Math.max(n, 1));
let zoomTransform = { x: 0, y: 0, k: 1 };
let isDragging = false;
let dragOccurred = false;
let dragNode = null;
let dragOffX = 0, dragOffY = 0;
let isPanning = false;
let panStartX = 0, panStartY = 0;
let simAlpha = 1;
let simRunning = false;
let simId = null;
const mainGroup = document.createElementNS("http://www.w3.org/2000/svg", "g");
graphSvg.innerHTML = "";
graphSvg.appendChild(mainGroup);
const edgeEls = dataEdges.map((edge) => {
const g = document.createElementNS("http://www.w3.org/2000/svg", "g");
g.setAttribute("data-edge-id", edge.id);
g.setAttribute("class", "edge-wrap");
g.style.cursor = "pointer";
const line = document.createElementNS("http://www.w3.org/2000/svg", "line");
line.setAttribute("class", "graph-edge");
g.appendChild(line);
if (edge.predicate) {
const text = document.createElementNS("http://www.w3.org/2000/svg", "text");
text.setAttribute("text-anchor", "middle");
text.setAttribute("font-size", "11");
text.setAttribute("fill", "#7d86b4");
text.setAttribute("data-type", "edge-label");
text.textContent = truncate(edge.predicate, 20);
g.appendChild(text);
}
mainGroup.appendChild(g);
return g;
});
const nodeEls = dataNodes.map((node) => {
const r = node.radius;
const g = document.createElementNS("http://www.w3.org/2000/svg", "g");
const kindClass = `graph-node--${(node.kind || 'entity').toLowerCase()}`;
g.setAttribute("class", `graph-node ${kindClass}`);
g.setAttribute("data-node-id", node.id);
g.style.cursor = "grab";
const circle = document.createElementNS("http://www.w3.org/2000/svg", "circle");
circle.setAttribute("r", r);
g.appendChild(circle);
const text = document.createElementNS("http://www.w3.org/2000/svg", "text");
text.setAttribute("y", r + 16);
text.setAttribute("text-anchor", "middle");
text.setAttribute("font-size", "11");
text.setAttribute("fill", "#22264d");
text.setAttribute("data-type", "node-label");
text.textContent = truncate(node.label, TRUNCATE_LENGTH);
g.appendChild(text);
mainGroup.appendChild(g);
return g;
});
function syncDom() {
for (let i = 0; i < dataEdges.length; i++) {
const edge = dataEdges[i], el = edgeEls[i];
if (!el) continue;
const line = el.querySelector("line");
if (line) {
line.setAttribute("x1", edge.sourceNode.x.toFixed(1));
line.setAttribute("y1", edge.sourceNode.y.toFixed(1));
line.setAttribute("x2", edge.targetNode.x.toFixed(1));
line.setAttribute("y2", edge.targetNode.y.toFixed(1));
}
const label = el.querySelector("text[data-type='edge-label']");
if (label) {
const mx = (edge.sourceNode.x + edge.targetNode.x) / 2;
const my = (edge.sourceNode.y + edge.targetNode.y) / 2;
const angle = Math.atan2(edge.targetNode.y - edge.sourceNode.y, edge.targetNode.x - edge.sourceNode.x) * (180 / Math.PI);
label.setAttribute("x", mx.toFixed(1));
label.setAttribute("y", (my + (Math.abs(angle) < 30 || Math.abs(angle) > 150 ? -12 : 4)).toFixed(1));
}
}
for (let i = 0; i < dataNodes.length; i++) {
const node = dataNodes[i], el = nodeEls[i];
if (el) el.setAttribute("transform", `translate(${node.x.toFixed(1)} ${node.y.toFixed(1)})`);
}
}
function tick() {
const alpha = simAlpha;
if (alpha < 0.001) {
simRunning = false;
simId = null;
syncDom();
return;
}
for (let i = 0; i < dataNodes.length; i++) {
for (let j = i + 1; j < dataNodes.length; j++) {
const a = dataNodes[i], b = dataNodes[j];
let dx = b.x - a.x, dy = b.y - a.y;
const dist = Math.sqrt(dx * dx + dy * dy) || 1;
const minDist = (a.radius + b.radius) * 1.6;
if (dist < minDist) {
const push = (minDist - dist) / dist * 0.5;
a.vx -= dx * push; a.vy -= dy * push;
b.vx += dx * push; b.vy += dy * push;
}
const force = (repulsionStr * alpha) / (dist * dist + 1);
const fx = force * dx / dist, fy = force * dy / dist;
a.vx -= fx; a.vy -= fy;
b.vx += fx; b.vy += fy;
}
}
for (const edge of dataEdges) {
const s = edge.sourceNode, t = edge.targetNode;
const dx = t.x - s.x, dy = t.y - s.y;
const dist = Math.sqrt(dx * dx + dy * dy) || 1;
const force = (dist - linkDist) * linkStr * alpha;
const fx = force * dx / dist, fy = force * dy / dist;
s.vx += fx; s.vy += fy;
t.vx -= fx; t.vy -= fy;
}
const cx = svgW / 2, cy = svgH / 2;
const grav = 0.005 * alpha;
for (const node of dataNodes) {
if (node.pinned) continue;
node.vx += (cx - node.x) * grav;
node.vy += (cy - node.y) * grav;
}
simAlpha *= 0.992;
if (simAlpha < 0.001) simAlpha = 0;
for (const node of dataNodes) {
if (node.pinned) continue;
node.vx *= 0.6;
node.vy *= 0.6;
node.x += node.vx;
node.y += node.vy;
node.x = Math.max(20, Math.min(svgW - 20, node.x));
node.y = Math.max(20, Math.min(svgH - 20, node.y));
}
syncDom();
simId = requestAnimationFrame(tick);
}
function startSim() {
if (simRunning) return;
simRunning = true;
simAlpha = Math.max(simAlpha, 0.15);
if (simId) cancelAnimationFrame(simId);
simId = requestAnimationFrame(tick);
}
function wakeSim() {
simAlpha = Math.max(simAlpha, 0.3);
if (!simRunning) startSim();
}
function applyTransform() {
mainGroup.setAttribute("transform", `translate(${zoomTransform.x} ${zoomTransform.y}) scale(${zoomTransform.k})`);
}
graphSvg.addEventListener("wheel", (e) => {
e.preventDefault();
const delta = e.deltaY > 0 ? 0.9 : 1.1;
const newK = Math.max(0.15, Math.min(6, zoomTransform.k * delta));
const r = graphSvg.getBoundingClientRect();
const cx = e.clientX - r.left, cy = e.clientY - r.top;
zoomTransform.x = cx - (cx - zoomTransform.x) * (newK / zoomTransform.k);
zoomTransform.y = cy - (cy - zoomTransform.y) * (newK / zoomTransform.k);
zoomTransform.k = newK;
applyTransform();
});
graphSvg.addEventListener("mousedown", (e) => {
const target = e.target.closest("[data-node-id]");
if (target) {
isDragging = true;
dragOccurred = false;
dragNode = dataNodes.find((n) => n.id === target.dataset.nodeId);
if (dragNode) {
const r = graphSvg.getBoundingClientRect();
dragOffX = (e.clientX - r.left - zoomTransform.x) / zoomTransform.k - dragNode.x;
dragOffY = (e.clientY - r.top - zoomTransform.y) / zoomTransform.k - dragNode.y;
target.style.cursor = "grabbing";
wakeSim();
}
return;
}
if (e.target === graphSvg || e.target === mainGroup) {
isPanning = true;
panStartX = e.clientX - zoomTransform.x;
panStartY = e.clientY - zoomTransform.y;
graphSvg.style.cursor = "grabbing";
}
});
window.addEventListener("mousemove", (e) => {
if (isDragging && dragNode) {
dragOccurred = true;
const r = graphSvg.getBoundingClientRect();
dragNode.x = (e.clientX - r.left - zoomTransform.x) / zoomTransform.k - dragOffX;
dragNode.y = (e.clientY - r.top - zoomTransform.y) / zoomTransform.k - dragOffY;
dragNode.pinned = true;
syncDom();
} else if (isPanning) {
zoomTransform.x = e.clientX - panStartX;
zoomTransform.y = e.clientY - panStartY;
applyTransform();
}
});
window.addEventListener("mouseup", () => {
if (isDragging && dragNode) {
const el = graphSvg.querySelector(`[data-node-id="${dragNode.id}"]`);
if (el) el.style.cursor = "grab";
dragNode.vx = 0;
dragNode.vy = 0;
wakeSim();
}
isDragging = false;
dragNode = null;
isPanning = false;
graphSvg.style.cursor = "";
});
nodeEls.forEach((el) => {
el.addEventListener("click", (e) => {
if (dragOccurred) { dragOccurred = false; return; }
const node = nodes.find((item) => item.id === el.dataset.nodeId);
const related = edges.filter((edge) => edge.source === node.id || edge.target === node.id);
const kind = (node.kind || "Entity").toLowerCase();
let body = "";
if (kind === "meeting" || kind === "episode") {
body = `
<p>${h(node.description || node.summary || "暂无描述")}</p>
<div class="chip-row">
${node.date ? `<span class="chip">${h(node.date)}</span>` : ""}
<span class="chip">关系 ${h(related.length)}</span>
</div>`;
} else if (kind === "fact") {
body = `
<p>${h(node.fact || node.description || "暂无描述")}</p>
<div class="chip-row">
${node.date ? `<span class="chip">${h(node.date)}</span>` : ""}
<span class="chip">关系 ${h(related.length)}</span>
</div>`;
} else {
body = `
<p>${h(node.description || "暂无描述")}</p>
<div class="chip-row">
${node.entity_type ? `<span class="chip">${h(node.entity_type)}</span>` : ""}
${node.date ? `<span class="chip">${h(node.date)}</span>` : ""}
<span class="chip">关系 ${h(related.length)}</span>
</div>`;
}
renderInspector(`
<div class="detail-card">
<p class="eyebrow">${h(node.kind)}</p>
<h3>${h(node.label)}</h3>
${body}
</div>
${related.map((edge) => `
<article class="result-card">
<strong>${h(edge.source)} ${h(edge.target)}</strong>
<p>${h(edge.fact || edge.description || edge.predicate || "")}</p>
</article>
`).join("")}
`);
loadRelated(node.label).catch(() => relatedSearch.innerHTML = empty("相关检索加载失败"));
});
});
edgeEls.forEach((el) => {
el.addEventListener("click", () => {
graphSvg.querySelectorAll(".graph-edge.active").forEach((item) => item.classList.remove("active"));
const line = el.querySelector(".graph-edge");
line?.classList.add("active");
const edge = edges.find((item) => item.id === el.dataset.edgeId);
renderInspector(`
<div class="detail-card">
<p class="eyebrow">Edge</p>
<h3>${h(edge.source)} ${h(edge.target)}</h3>
<p>${h(edge.fact || edge.description || "暂无补充描述")}</p>
<div class="chip-row">
${edge.predicate ? `<span class="chip">${h(edge.predicate)}</span>` : ""}
${edge.date ? `<span class="chip">${h(edge.date)}</span>` : ""}
<span class="chip">置信度 ${h(edge.confidence ?? 0)}</span>
${edge.meeting_id ? `<span class="chip">${h(edge.meeting_id)}</span>` : ""}
</div>
</div>
`);
loadRelated(`${edge.source} ${edge.predicate} ${edge.target}`).catch(() => relatedSearch.innerHTML = empty("相关检索加载失败"));
});
});
const resetBtn = document.createElement("button");
resetBtn.className = "btn ghost zoom-reset-btn";
resetBtn.textContent = "重置视图";
resetBtn.addEventListener("click", () => {
zoomTransform = { x: 0, y: 0, k: 1 };
applyTransform();
});
const pauseBtn = document.createElement("button");
pauseBtn.className = "btn ghost pause-btn";
pauseBtn.textContent = "⏸ 暂停";
pauseBtn.addEventListener("click", () => {
if (simRunning) {
simRunning = false;
if (simId) cancelAnimationFrame(simId);
simId = null;
pauseBtn.textContent = "▶ 继续";
} else {
wakeSim();
pauseBtn.textContent = "⏸ 暂停";
}
});
document.querySelectorAll(".zoom-reset-btn, .pause-btn").forEach((el) => el.remove());
const toolbar = document.querySelector(".graph-toolbar .graph-controls");
if (toolbar) {
const wrap = toolbar.parentElement;
const hint = document.createElement("div");
hint.className = "zoom-hint";
hint.innerHTML = `滚轮缩放 · 空白拖拽平移 · 拖拽节点重排 · 物理动画自动冷却`;
wrap.appendChild(hint);
const btnRow = document.createElement("div");
btnRow.className = "graph-toolbar-row";
btnRow.appendChild(resetBtn);
btnRow.appendChild(pauseBtn);
wrap.appendChild(btnRow);
}
startSim();
syncDom();
}
async function fetchGraph() {
const query = graphQueryInput.value.trim();
const limitNodes = graphNodeLimit.value || "60";
const limitEdges = graphEdgeLimit.value || "120";
const params = new URLSearchParams();
if (query) params.set("q", query);
params.set("limit_nodes", limitNodes);
params.set("limit_edges", limitEdges);
if (selectedKinds && selectedKinds.size > 0) {
selectedKinds.forEach((k) => params.append("kinds", k));
}
renderInspector(empty("图谱加载中..."));
const response = await fetch(`/api/graph?${params.toString()}`);
const payload = await response.json();
renderGraph(payload);
}
graphForm?.addEventListener("submit", (event) => {
event.preventDefault();
fetchGraph().catch((error) => renderInspector(empty(`图谱加载失败: ${error}`)));
});
loadGraphKinds().catch(() => {});
fetchGraph().catch((error) => renderInspector(empty(`图谱加载失败: ${error}`)));

View File

@ -0,0 +1,141 @@
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Meeting Memory Console</title>
<link rel="stylesheet" href="/styles.css">
</head>
<body>
<div class="shell">
<aside class="sidebar">
<div class="brand">
<div class="brand-mark">M</div>
<div>
<p class="brand-kicker">Meeting Memory</p>
<h1>会议记忆中枢</h1>
</div>
</div>
<nav class="nav">
<a class="nav-link active" href="/index.html">总览面板</a>
<a class="nav-link" href="/graph.html">图谱浏览</a>
</nav>
<div class="side-card sidebar-shortcuts">
<a class="pill-link" href="#import-panel">导入会议</a>
<a class="pill-link" href="#search-panel">知识检索</a>
<a class="pill-link" href="/graph.html">图谱页</a>
</div>
</aside>
<main class="main">
<div class="main-toolbar">
<div>
<p class="eyebrow">Dashboard</p>
<h2>会议知识库</h2>
</div>
<div class="main-toolbar-actions">
<button class="btn" id="refreshDashboardBtn">刷新</button>
</div>
</div>
<section class="stats-grid" id="highlightGrid"></section>
<section class="panel unified-panel">
<div class="unified-tabs">
<button class="unified-tab active" data-tab="import">导入</button>
<button class="unified-tab" data-tab="search">检索</button>
<button class="unified-tab" data-tab="stats">统计</button>
</div>
<div class="unified-pane" id="unifiedImport">
<form class="import-form" id="importForm">
<fieldset id="importFieldset" class="import-fieldset">
<label class="field-label" for="importFile">选择文件</label>
<input id="importFile" type="file" accept=".md,.txt,text/markdown,text/plain">
<label class="field-label" for="importText">或直接粘贴会议文本</label>
<textarea id="importText" rows="6" placeholder="把会议纪要、聊天整理稿或录音转写文本粘贴到这里"></textarea>
<label class="check-row">
<input id="importForce" type="checkbox">
<span>发现重复时允许覆盖</span>
</label>
<button class="btn" id="importSubmitBtn" type="submit">开始导入</button>
</fieldset>
</form>
<div class="status-box" id="importStatus">支持 `.md` / `.txt`,也可以直接粘贴文本。</div>
<div class="progress-list" id="importProgress">
<div class="empty-state">导入开始后,这里会实时显示处理步骤。</div>
</div>
</div>
<div class="unified-pane hidden" id="unifiedSearch">
<form class="search-box" id="searchForm">
<input id="searchInput" type="text" placeholder="搜索会议主题、负责人、指标、关系事实...">
<button class="btn" type="submit">搜索</button>
</form>
<div class="search-results" id="searchResults">
<div class="empty-state">输入问题后,这里会展示混合检索结果。</div>
</div>
</div>
<div class="unified-pane hidden" id="unifiedStats">
<div class="mini-stats" id="statsList"></div>
</div>
</section>
<div class="content-grid">
<section class="panel" id="meeting-list">
<div class="panel-head">
<p class="eyebrow">Recent Archives</p>
<h3>最近会议</h3>
</div>
<div class="card-list" id="meetingCards"></div>
</section>
<section class="panel">
<div class="panel-head">
<p class="eyebrow">Action Items</p>
<h3>待跟进行动项</h3>
</div>
<div class="list-stack" id="actionList"></div>
</section>
<section class="panel">
<div class="panel-head">
<p class="eyebrow">Metrics</p>
<h3>关键指标</h3>
</div>
<div class="list-stack" id="metricList"></div>
</section>
<section class="panel">
<div class="panel-head">
<p class="eyebrow">Series</p>
<h3>会议系列</h3>
</div>
<div class="list-stack" id="seriesList"></div>
</section>
</div>
</main>
</div>
<dialog class="detail-modal" id="meetingDialog">
<div class="dialog-head">
<div>
<p class="eyebrow">Archived Meeting</p>
<h3 id="dialogTitle">会议详情</h3>
</div>
<button class="icon-btn" id="closeDialogBtn" type="button">×</button>
</div>
<p class="dialog-meta" id="dialogMeta"></p>
<pre class="dialog-content" id="dialogContent"></pre>
</dialog>
<script src="/app.js"></script>
</body>
</html>

View File

@ -0,0 +1,977 @@
:root {
--primary: #5d67f5;
--primary-2: #7f8bff;
--primary-soft: #edf1ff;
--accent: #53c2da;
--bg: #f5f7ff;
--bg-2: #fbfcff;
--panel: rgba(255, 255, 255, 0.9);
--panel-strong: rgba(255, 255, 255, 0.96);
--border: rgba(212, 221, 247, 0.95);
--text: #22264d;
--muted: #68709d;
--danger: #b3261e;
--success: #11693c;
--shadow: 0 12px 28px rgba(73, 81, 141, 0.08);
--shadow-sm: 0 6px 16px rgba(73, 81, 141, 0.06);
--radius-xl: 20px;
--radius-lg: 16px;
--radius-md: 12px;
--radius-sm: 10px;
}
* { box-sizing: border-box; }
html, body {
margin: 0;
min-height: 100%;
}
body {
font-family: "Segoe UI", "PingFang SC", "Microsoft YaHei", sans-serif;
font-size: 13px;
color: var(--text);
background:
radial-gradient(circle at 10% 10%, rgba(126, 186, 255, 0.16), transparent 24%),
radial-gradient(circle at 88% 14%, rgba(132, 121, 255, 0.12), transparent 22%),
linear-gradient(135deg, #f8faff 0%, var(--bg) 55%, var(--bg-2) 100%);
}
a { color: inherit; text-decoration: none; }
button, input, textarea { font: inherit; }
.shell {
display: grid;
grid-template-columns: 220px minmax(0, 1fr);
gap: 14px;
min-height: 100vh;
padding: 14px;
}
.sidebar, .panel, .detail-modal::backdrop {
backdrop-filter: blur(12px);
}
.sidebar {
display: flex;
flex-direction: column;
gap: 10px;
padding: 14px;
border: 1px solid var(--border);
border-radius: 22px;
background: linear-gradient(180deg, rgba(236, 243, 255, 0.92), rgba(255, 255, 255, 0.8));
box-shadow: var(--shadow);
}
.brand {
display: flex;
gap: 10px;
align-items: center;
}
.brand-mark {
width: 40px;
height: 40px;
display: grid;
place-items: center;
border-radius: 14px;
color: #fff;
font-size: 17px;
font-weight: 800;
background: linear-gradient(135deg, var(--primary), var(--primary-2));
}
.brand-kicker, .eyebrow {
margin: 0 0 3px;
color: var(--primary);
font-size: 10px;
font-weight: 700;
letter-spacing: 0.08em;
text-transform: uppercase;
}
.brand h1, .panel h3, .dialog-head h3 {
margin: 0;
}
.brand h1 { font-size: 18px; }
.nav {
display: grid;
gap: 6px;
}
.nav-link {
padding: 10px 12px;
border: 1px solid transparent;
border-radius: var(--radius-md);
color: var(--muted);
font-size: 13px;
font-weight: 700;
transition: 0.2s ease;
}
.nav-link:hover, .nav-link.active {
color: var(--primary);
border-color: rgba(109, 123, 255, 0.16);
background: rgba(255, 255, 255, 0.78);
}
.side-card, .panel {
border: 1px solid var(--border);
border-radius: var(--radius-xl);
background: var(--panel);
box-shadow: var(--shadow-sm);
}
.panel { padding: 14px; }
.panel-head {
display: flex;
justify-content: space-between;
align-items: start;
gap: 10px;
margin-bottom: 10px;
}
.panel h3 { font-size: 17px; }
.sidebar-shortcuts {
display: flex;
flex-wrap: wrap;
gap: 6px;
padding: 10px;
margin-top: auto;
}
.pill-link, .chip {
display: inline-flex;
align-items: center;
min-height: 24px;
padding: 0 9px;
border-radius: 999px;
font-size: 11px;
font-weight: 700;
}
.pill-link {
background: rgba(255, 255, 255, 0.9);
border: 1px solid var(--border);
}
.chip {
background: var(--primary-soft);
color: var(--primary);
}
.chip.status-done, .chip.status-completed { background: #edfdf4; color: var(--success); }
.chip.status-pending, .chip.status-todo { background: #fff8e7; color: #b8860b; }
.chip.status-in_progress, .chip.status-active { background: #e8f4fd; color: #4a90d9; }
.chip.status-blocked { background: #fff4f2; color: var(--danger); }
.main {
display: flex;
flex-direction: column;
gap: 12px;
min-height: 0;
}
.main-toolbar {
display: flex;
justify-content: space-between;
align-items: center;
gap: 16px;
padding: 16px 18px;
border: 1px solid var(--border);
border-radius: 22px;
background:
radial-gradient(circle at top right, rgba(134, 144, 255, 0.12), transparent 28%),
linear-gradient(180deg, rgba(255, 255, 255, 0.94), rgba(244, 248, 255, 0.96));
box-shadow: var(--shadow);
}
.main-toolbar h2 {
margin: 0;
font-size: 22px;
}
.main-toolbar-actions {
display: flex;
gap: 8px;
}
.btn, .icon-btn {
border: none;
cursor: pointer;
transition: 0.2s ease;
}
.btn {
display: inline-flex;
align-items: center;
justify-content: center;
min-height: 36px;
padding: 0 14px;
border-radius: 11px;
font-size: 12px;
font-weight: 700;
color: #fff;
background: linear-gradient(135deg, var(--primary), var(--primary-2));
box-shadow: 0 8px 18px rgba(93, 103, 245, 0.18);
}
.btn:hover, .icon-btn:hover { transform: translateY(-1px); }
.btn:disabled {
opacity: 0.68;
cursor: not-allowed;
transform: none;
}
.btn.ghost {
color: var(--primary);
background: rgba(255, 255, 255, 0.94);
box-shadow: none;
border: 1px solid var(--border);
}
.stats-grid, .content-grid, .workspace-grid {
display: grid;
gap: 12px;
}
.stats-grid { grid-template-columns: repeat(4, minmax(0, 1fr)); }
.highlight-card {
padding: 0;
border: 1px solid var(--border);
border-radius: var(--radius-lg);
background: var(--panel-strong);
box-shadow: var(--shadow-sm);
overflow: hidden;
}
.highlight-card .hc-bar {
height: 4px;
background: var(--card-accent);
}
.highlight-card .eyebrow {
padding: 12px 14px 0;
}
.highlight-card strong {
display: block;
margin: 4px 0 2px;
padding: 0 14px;
font-size: 26px;
color: var(--card-accent);
}
.highlight-card p:last-child {
padding: 0 14px 14px;
margin: 0;
color: var(--muted);
}
.dashboard-grid {
grid-template-columns: minmax(330px, 1.1fr) minmax(340px, 1fr) minmax(220px, 0.72fr);
align-items: start;
}
.search-box, .import-form, .import-fieldset {
display: grid;
gap: 8px;
}
.import-fieldset {
margin: 0;
padding: 0;
border: 0;
min-width: 0;
}
.import-fieldset:disabled { opacity: 0.6; }
.search-box input, .graph-controls input, textarea, input[type="file"] {
width: 100%;
min-height: 38px;
padding: 9px 12px;
border: 1px solid var(--border);
border-radius: 11px;
background: rgba(255, 255, 255, 0.94);
color: var(--text);
}
textarea {
min-height: 138px;
resize: vertical;
}
.field-label {
font-size: 11px;
font-weight: 700;
color: var(--muted);
}
.check-row {
display: flex;
align-items: center;
gap: 8px;
font-size: 12px;
color: var(--muted);
}
.status-box {
margin-top: 10px;
padding: 10px 12px;
border-radius: 12px;
border: 1px solid var(--border);
background: rgba(255, 255, 255, 0.76);
font-size: 12px;
color: var(--muted);
}
.status-box[data-kind="error"] {
color: var(--danger);
background: #fff4f2;
}
.status-box[data-kind="success"] {
color: var(--success);
background: #edfdf4;
}
.progress-list, .search-results, .mini-stats, .card-list, .list-stack, .related-search {
display: grid;
gap: 8px;
}
.progress-item, .mini-stat, .card, .list-item, .result-card, .detail-card {
padding: 12px;
border: 1px solid var(--border);
border-radius: 14px;
background: rgba(255, 255, 255, 0.88);
}
.progress-item {
display: grid;
grid-template-columns: 24px 1fr;
gap: 8px;
align-items: start;
}
.progress-index {
width: 24px;
height: 24px;
display: grid;
place-items: center;
border-radius: 999px;
background: var(--primary-soft);
color: var(--primary);
font-size: 11px;
font-weight: 700;
}
.mini-stat {
display: flex;
align-items: center;
gap: 10px;
padding: 10px 12px;
}
.ms-icon {
width: 32px;
height: 32px;
display: grid;
place-items: center;
border-radius: 10px;
font-size: 15px;
background: color-mix(in srgb, var(--stat-color) 14%, transparent);
color: var(--stat-color);
flex-shrink: 0;
}
.ms-body strong {
display: block;
font-size: 16px;
line-height: 1.2;
}
.ms-body p {
margin: 0;
font-size: 11px;
color: var(--muted);
}
.mini-stat strong, .card h4, .list-item strong, .result-card strong {
display: block;
margin-bottom: 4px;
}
.card { cursor: pointer; }
.card:hover, .result-card:hover, .list-item:hover {
border-color: rgba(120, 132, 255, 0.34);
}
.content-grid { grid-template-columns: repeat(2, minmax(0, 1fr)); }
/* ── Meeting card ── */
.meeting-card {
display: flex;
gap: 10px;
padding: 12px;
border: 1px solid var(--border);
border-radius: 14px;
background: rgba(255, 255, 255, 0.88);
cursor: pointer;
transition: 0.2s ease;
}
.meeting-card:hover {
border-color: rgba(120, 132, 255, 0.34);
}
.mc-date {
flex-shrink: 0;
width: 44px;
height: 44px;
display: grid;
place-items: center;
border-radius: 10px;
background: var(--primary-soft);
color: var(--primary);
font-size: 11px;
font-weight: 700;
text-align: center;
line-height: 1.2;
}
.mc-body h4 {
margin: 0 0 4px;
font-size: 13px;
}
.mc-body p {
margin: 0;
font-size: 12px;
color: var(--muted);
display: -webkit-box;
-webkit-line-clamp: 2;
-webkit-box-orient: vertical;
overflow: hidden;
}
/* ── List item with priority dot ── */
.list-item {
display: flex;
gap: 10px;
padding: 12px;
border: 1px solid var(--border);
border-radius: 14px;
background: rgba(255, 255, 255, 0.88);
}
.li-priority {
flex-shrink: 0;
width: 4px;
border-radius: 2px;
background: var(--pri-color);
}
.li-body {
flex: 1;
min-width: 0;
}
.li-body strong {
display: block;
margin-bottom: 2px;
}
.li-body p {
margin: 0 0 6px;
font-size: 12px;
color: var(--muted);
}
/* ── Metric card ── */
.metric-card {
padding: 12px;
border: 1px solid var(--border);
border-radius: 14px;
background: rgba(255, 255, 255, 0.88);
}
.mc-head {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 2px;
}
.mc-head strong {
display: block;
}
.mc-value {
font-size: 16px;
font-weight: 700;
color: var(--primary);
}
.metric-card p {
margin: 0 0 8px;
font-size: 12px;
color: var(--muted);
}
.mc-bar-track {
height: 4px;
border-radius: 2px;
background: rgba(212, 221, 247, 0.5);
margin-bottom: 8px;
overflow: hidden;
}
.mc-bar-fill {
height: 100%;
border-radius: 2px;
background: linear-gradient(90deg, var(--primary), var(--primary-2));
transition: width 0.4s ease;
}
/* ── Series card ── */
.series-card {
display: flex;
gap: 10px;
align-items: center;
padding: 12px;
border: 1px solid var(--border);
border-radius: 14px;
background: rgba(255, 255, 255, 0.88);
}
.sc-count {
flex-shrink: 0;
width: 36px;
height: 36px;
display: grid;
place-items: center;
border-radius: 10px;
font-size: 14px;
font-weight: 700;
background: var(--primary-soft);
color: var(--primary);
}
.sc-body strong {
display: block;
margin-bottom: 2px;
}
.sc-body p {
margin: 0;
font-size: 12px;
color: var(--muted);
}
/* ── Unified Import / Search panel ── */
.unified-panel {
display: flex;
flex-direction: column;
}
.unified-tabs {
display: flex;
gap: 4px;
margin-bottom: 12px;
padding: 3px;
border-radius: 11px;
background: rgba(212, 221, 247, 0.3);
}
.unified-tab {
flex: 1;
padding: 7px 12px;
border: none;
border-radius: 8px;
font-size: 12px;
font-weight: 700;
cursor: pointer;
background: transparent;
color: var(--muted);
transition: 0.2s ease;
}
.unified-tab.active {
background: #fff;
color: var(--primary);
box-shadow: 0 2px 6px rgba(73, 81, 141, 0.1);
}
.unified-tab:hover:not(.active) {
color: var(--text);
}
.unified-pane.hidden {
display: none;
}
/* ── Result card with kind badge ── */
.result-card {
position: relative;
}
.rc-kind {
display: inline-block;
padding: 1px 7px;
border-radius: 4px;
font-size: 10px;
font-weight: 700;
text-transform: uppercase;
background: var(--primary-soft);
color: var(--primary);
margin-bottom: 4px;
}
.empty-state {
padding: 16px 14px;
text-align: center;
border: 1px dashed var(--border);
border-radius: 14px;
color: var(--muted);
}
.detail-modal {
width: min(820px, calc(100vw - 24px));
border: 1px solid var(--border);
border-radius: 20px;
padding: 0;
background: rgba(255, 255, 255, 0.97);
box-shadow: var(--shadow);
}
.detail-modal::backdrop {
background: rgba(37, 44, 78, 0.28);
}
.dialog-head {
display: flex;
justify-content: space-between;
gap: 10px;
padding: 16px 16px 6px;
}
.dialog-meta { padding: 0 16px 6px; color: var(--muted); }
.dialog-content {
margin: 0;
padding: 0 16px 16px;
white-space: pre-wrap;
font-family: "Consolas", "Courier New", monospace;
max-height: 60vh;
overflow: auto;
color: var(--muted);
}
.icon-btn {
width: 30px;
height: 30px;
border-radius: 10px;
background: rgba(242, 245, 255, 0.92);
color: var(--primary);
font-size: 20px;
}
/* ── Graph page ── */
.graph-shell {
height: 100vh;
overflow: hidden;
gap: 10px;
padding: 10px;
}
.graph-shell .sidebar {
flex-shrink: 0;
}
.graph-shell .main {
gap: 8px;
}
.graph-shell .graph-layout {
gap: 8px;
}
.graph-shell .graph-layout .panel {
padding: 10px;
}
.graph-layout {
display: grid;
grid-template-columns: 1fr 300px;
gap: 12px;
flex: 1;
min-height: 0;
}
.graph-stage-panel {
display: flex;
flex-direction: column;
padding: 0;
overflow: hidden;
}
.graph-stage {
flex: 1;
min-height: 0;
position: relative;
background:
linear-gradient(180deg, rgba(251, 253, 255, 0.96), rgba(241, 246, 255, 0.94)),
radial-gradient(circle at center, rgba(133, 196, 255, 0.08), transparent 36%);
}
#graphSvg {
width: 100%;
height: 100%;
display: block;
}
.detail-panel {
display: flex;
flex-direction: column;
gap: 8px;
overflow: hidden;
}
.detail-panel .detail-card,
.detail-panel .related-search {
overflow-y: auto;
}
.detail-card {
flex-shrink: 0;
word-break: break-all;
}
.detail-card strong {
word-break: break-word;
}
.related-search {
flex-shrink: 0;
}
.related-search .result-card {
word-break: break-all;
}
/* ── Graph toolbar ── */
.graph-toolbar { padding: 8px 12px; }
.graph-controls {
display: flex;
gap: 6px;
align-items: center;
}
.graph-controls .search-input {
flex: 1;
min-height: 30px;
padding: 6px 10px;
}
.graph-controls label.field-label {
display: flex;
align-items: center;
gap: 2px;
white-space: nowrap;
font-size: 10px;
}
.graph-controls label.field-label input {
width: 44px;
min-height: 26px;
padding: 4px 6px;
}
.graph-controls .btn {
min-height: 30px;
padding: 0 12px;
font-size: 11px;
}
.graph-toolbar-row {
display: flex;
justify-content: space-between;
align-items: center;
flex-wrap: wrap;
gap: 6px;
margin-top: 6px;
}
.graph-actions {
display: flex;
align-items: center;
gap: 8px;
font-size: 11px;
color: var(--muted);
}
.graph-type-filter {
display: flex;
flex-wrap: wrap;
align-items: center;
gap: 4px 10px;
}
.graph-type-filter label {
display: inline-flex;
align-items: center;
gap: 3px;
font-size: 11px;
color: var(--muted);
cursor: pointer;
user-select: none;
}
.graph-type-filter label input {
margin: 0;
accent-color: var(--primary);
}
.graph-meta { font-size: 11px; color: var(--muted); }
/* ── Graph nodes & edges ── */
.graph-node { cursor: pointer; }
.graph-node circle {
stroke: rgba(255, 255, 255, 0.85);
stroke-width: 2;
transition: filter 0.15s;
}
.graph-node--meeting circle { fill: #4a90d9; }
.graph-node--episode circle { fill: #34c759; }
.graph-node--entity circle { fill: var(--accent); }
.graph-node--fact circle { fill: #ff9500; }
.graph-node:hover circle { filter: brightness(1.2); }
.graph-node text {
font-size: 11px;
fill: var(--text);
pointer-events: none;
user-select: none;
}
.graph-edge {
stroke: rgba(120, 136, 194, 0.42);
stroke-width: 1.6;
cursor: pointer;
transition: stroke 0.15s, stroke-width 0.15s;
}
.edge-wrap:hover .graph-edge {
stroke: rgba(120, 136, 194, 0.7);
stroke-width: 2;
}
.graph-edge.active {
stroke: var(--primary);
stroke-width: 2.4;
}
.edge-wrap text {
pointer-events: none;
user-select: none;
}
/* ── Legend ── */
.legend { font-size: 11px; color: var(--muted); }
.legend-dot {
display: inline-block;
width: 9px;
height: 9px;
border-radius: 50%;
margin-right: 6px;
}
.legend-dot.meeting { background: #4a90d9; }
.legend-dot.episode { background: #34c759; }
.legend-dot.entity { background: var(--accent); }
.legend-dot.fact { background: #ff9500; }
.graph-shell .sidebar {
gap: 8px;
padding: 10px;
}
.graph-shell .sidebar .legend {
display: flex;
flex-direction: column;
gap: 3px;
font-size: 11px;
padding: 0 4px;
}
.graph-shell .sidebar .legend .eyebrow {
margin-bottom: 4px;
}
/* ── Graph controls overlay ── */
.zoom-reset-btn, .pause-btn {
font-size: 11px;
min-height: 28px;
padding: 0 10px;
}
.zoom-hint {
font-size: 11px;
color: var(--muted);
padding: 4px 0;
}
/* ── Responsive ── */
@media (max-width: 1240px) {
.shell, .graph-shell, .dashboard-grid, .content-grid, .graph-layout, .stats-grid {
grid-template-columns: 1fr;
}
.sidebar { order: 2; }
.graph-shell { height: auto; overflow: auto; }
}
@media (max-width: 720px) {
.shell, .graph-shell {
padding: 10px;
gap: 10px;
}
.sidebar, .panel { border-radius: 18px; }
.search-box { grid-template-columns: 1fr; }
.graph-stage { min-height: 250px; }
.graph-controls { flex-wrap: wrap; }
.graph-controls .search-input { min-width: 100%; }
}

View File

@ -1,171 +0,0 @@
import hashlib
import logging
from typing import Optional
from extractor import extract_meeting_info, MeetingExtraction
from vector_store import meeting_vector_store
from obsidian_manager import obsidian_manager
from meeting_state import MeetingStateStore
from config import config
logger = logging.getLogger(__name__)
state_store = MeetingStateStore(config.state_path)
class MeetingProcessor:
def process_meeting_file(self, filepath: str, force: bool = False) -> Optional[str]:
with open(filepath, "r", encoding="utf-8") as f:
text = f.read()
return self.process_meeting_text(text, force=force)
def process_meeting_text(self, text: str, force: bool = False) -> Optional[str]:
content_hash = self._compute_content_hash(text)
if not force and state_store.has_content_hash(content_hash):
print(f"\n⚠️ 检测到重复内容(内容指纹匹配),跳过处理")
logger.info(f"内容哈希重复,跳过: {content_hash[:12]}")
return None
if not force:
similar = meeting_vector_store.find_similar_text(text, threshold=0.92)
if similar:
meta = similar["metadata"]
print(f"\n⚠️ 发现高度相似的已有会议: 「{meta.get('title', '')}」({meta.get('date', '')}) 相似度: {similar['score']:.2%}")
while True:
choice = input(" 选择操作 [s]跳过 / [o]覆盖 (默认 s): ").strip().lower() or "s"
if choice == "s":
logger.info(f"跳过相似会议: {meta.get('title', '')}")
return None
elif choice == "o":
logger.info(f"覆盖重新处理相似会议")
force = True
break
print(" 请输入 s(skip) 或 o(overwrite)")
meeting_data = self._extract(text)
if not meeting_data:
logger.error("会议信息提取失败")
return None
data_dict = meeting_data.model_dump()
meeting_title = data_dict.get("title", "")
meeting_date = data_dict.get("date", "")
data_dict["_content_hash"] = content_hash
should_skip = self._handle_duplicate(data_dict, force)
if should_skip:
return None
raw_path = obsidian_manager.save_raw_text(
text,
title=meeting_title,
date=meeting_date,
)
data_dict["_original_text"] = text
data_dict["_original_text_path"] = raw_path
obsidian_manager.mark_raw_processed(raw_path)
meeting_filename = obsidian_manager._meeting_filename(data_dict)
merged_items = state_store.merge_action_items(
data_dict.get("action_items", []),
meeting_title,
meeting_date,
meeting_filename,
)
data_dict["action_items"] = merged_items
merged_metrics = state_store.merge_metrics(
data_dict.get("metrics", []),
meeting_title,
meeting_date,
meeting_filename,
)
data_dict["metrics"] = merged_metrics
state_store.add_content_hash(content_hash, meeting_title, meeting_date, meeting_filename)
state_store.save()
vault_path = obsidian_manager.add_meeting(data_dict, text)
vector_store_manager = meeting_vector_store
vector_store_manager.add_meeting(data_dict)
logger.info(f"会议处理完成: {meeting_data.title}")
return vault_path
def _handle_duplicate(self, data_dict: dict, force: bool) -> bool:
title = data_dict.get("title", "")
date = data_dict.get("date", "")
existing = meeting_vector_store.find_meeting(title, date)
file_exists = obsidian_manager.meeting_file_exists(data_dict)
if not existing and not file_exists:
return False
if force:
logger.info(f"发现重复会议「{title}」,--force 模式自动覆盖")
self._remove_old(data_dict)
return False
print(f"\n⚠️ 发现重复会议: 「{title}」({date})")
while True:
choice = input(" 选择操作 [s]跳过 / [o]覆盖 (默认 s): ").strip().lower() or "s"
if choice == "s":
logger.info(f"跳过重复会议: {title}")
return True
elif choice == "o":
logger.info(f"覆盖重新处理: {title}")
self._remove_old(data_dict)
return False
print(" 请输入 s(skip) 或 o(overwrite)")
def _remove_old(self, data_dict: dict):
meeting_id = meeting_vector_store._meeting_id(data_dict)
meeting_vector_store.remove_meeting(meeting_id)
obsidian_manager.remove_meeting_note(data_dict)
content_hash = data_dict.get("_content_hash", "")
if content_hash:
state_store.remove_content_hash(content_hash)
logger.info(f"旧数据清理完成: {data_dict.get('title', '')}")
def _compute_content_hash(self, text: str) -> str:
normalized = text.strip().replace('\r\n', '\n')
return hashlib.sha256(normalized.encode('utf-8')).hexdigest()
def _extract(self, text: str) -> Optional[MeetingExtraction]:
try:
return extract_meeting_info(text)
except Exception as e:
logger.error(f"LLM提取失败: {e}")
return None
def query(self, question: str, top_k: int = 3) -> str:
return meeting_vector_store.query_as_context(question, top_k=top_k)
def stats(self) -> dict:
import os
vault = config.obsidian.vault_path
meetings_dir = os.path.join(vault, config.obsidian.meetings_dir)
entities_dir = os.path.join(vault, config.obsidian.entities_dir)
meeting_files = [f for f in os.listdir(meetings_dir) if f.endswith(".md")] if os.path.exists(meetings_dir) else []
entity_files = [f for f in os.listdir(entities_dir) if f.endswith(".md")] if os.path.exists(entities_dir) else []
vs_stats = meeting_vector_store.get_stats()
state_stats = state_store.get_stats()
return {
"obsidian_meetings": len(meeting_files),
"obsidian_entities": len(entity_files),
"vector_index": vs_stats,
"state": state_stats,
"vault_path": vault,
}
meeting_processor = MeetingProcessor()

View File

@ -1,416 +0,0 @@
import logging
import os
import shutil
from datetime import datetime
from typing import Dict, List, Optional, Set
from config import config
logger = logging.getLogger(__name__)
def _sanitize_filename(name: str) -> str:
if not name:
return "未命名"
invalid = '<>:"/\\|?*'
for c in invalid:
name = name.replace(c, "")
name = name.replace(" ", "_").strip("._")
if not name:
return "未命名"
return name
def _safe_filename(name: str, max_len: int = 60) -> str:
safe = _sanitize_filename(name)
if len(safe) > max_len:
safe = safe[:max_len]
return safe
class ObsidianVaultManager:
def __init__(self):
self.vault_path = config.obsidian.vault_path
self.meetings_dir = os.path.join(self.vault_path, config.obsidian.meetings_dir)
self.entities_dir = os.path.join(self.vault_path, config.obsidian.entities_dir)
self.graphs_dir = os.path.join(self.vault_path, config.obsidian.graphs_dir)
self.raw_dir = os.path.join(self.vault_path, config.obsidian.raw_dir)
self._ensure_dirs()
def _ensure_dirs(self):
for d in [self.meetings_dir, self.entities_dir, self.graphs_dir, self.raw_dir]:
os.makedirs(d, exist_ok=True)
def save_raw_text(self, text: str, title: str = "", date: str = "") -> str:
date_str = date or datetime.now().strftime("%Y-%m-%d")
safe_title = _safe_filename(title or "未命名", 40)
filename = f"{date_str}_{safe_title}.md"
filepath = os.path.join(self.raw_dir, filename)
if os.path.exists(filepath):
with open(filepath, "r", encoding="utf-8") as f:
existing = f.read()
if "status: processed" in existing:
logger.warning(f"原文文件已存在且已处理过,将被覆盖: {filepath}")
content = f"""---
title: "{title}"
date: "{date_str}"
tags: [raw]
status: unprocessed
---
# {title or "未命名"}
**日期**: {date_str}
## 原文
{text}
"""
with open(filepath, "w", encoding="utf-8") as f:
f.write(content)
logger.info(f"原文已保存: {filepath}")
return filepath
def mark_raw_processed(self, raw_filepath: str):
if not os.path.exists(raw_filepath):
return
with open(raw_filepath, "r", encoding="utf-8") as f:
content = f.read()
content = content.replace("status: unprocessed", "status: processed")
with open(raw_filepath, "w", encoding="utf-8") as f:
f.write(content)
def _ensure_obsidian_config(self):
obsidian_config = os.path.join(self.vault_path, ".obsidian", "app.json")
if not os.path.exists(obsidian_config):
os.makedirs(os.path.dirname(obsidian_config), exist_ok=True)
with open(obsidian_config, "w", encoding="utf-8") as f:
f.write('{\n "alwaysUpdateLinks": true,\n "newFileLocation": "current",\n "useMarkdownLinks": true\n}')
core_plugins = os.path.join(self.vault_path, ".obsidian", "core-plugins.json")
if not os.path.exists(core_plugins):
with open(core_plugins, "w", encoding="utf-8") as f:
f.write('{\n "file-explorer": true,\n "graph": true,\n "backlink": true,\n "tag-pane": true,\n "page-preview": true,\n "templates": true,\n "search": true\n}')
def _meeting_filename(self, data: dict) -> str:
date_str = data.get("date", datetime.now().strftime("%Y-%m-%d"))
title = data.get("title", "未命名会议")
safe_title = _safe_filename(title, 40)
return f"{date_str}_{safe_title}.md"
def _entity_path(self, name: str) -> str:
safe = _safe_filename(name, 60)
return os.path.join(self.entities_dir, f"{safe}.md")
def _entity_link(self, name: str) -> str:
safe = _safe_filename(name, 60)
return f"[[Entities/{safe}|{name}]]"
def _meeting_link(self, data: dict) -> str:
fname = self._meeting_filename(data).replace(".md", "")
title = data.get("title", "未命名会议")
return f"[[Meetings/{fname}|{title}]]"
def meeting_filepath(self, meeting_data: dict) -> str:
filename = self._meeting_filename(meeting_data)
return os.path.join(self.meetings_dir, filename)
def meeting_file_exists(self, meeting_data: dict) -> bool:
return os.path.exists(self.meeting_filepath(meeting_data))
def raw_filepath(self, meeting_data: dict) -> str:
date_str = meeting_data.get("date", datetime.now().strftime("%Y-%m-%d"))
title = meeting_data.get("title", "未命名")
safe_title = _safe_filename(title, 40)
filename = f"{date_str}_{safe_title}.md"
return os.path.join(self.raw_dir, filename)
def remove_meeting_note(self, meeting_data: dict):
paths = [
self.meeting_filepath(meeting_data),
self.raw_filepath(meeting_data),
]
for p in paths:
if os.path.exists(p):
os.remove(p)
logger.info(f"已删除: {p}")
def add_meeting(self, meeting_data: dict, original_text: str) -> str:
self._ensure_obsidian_config()
filename = self._meeting_filename(meeting_data)
filepath = os.path.join(self.meetings_dir, filename)
content = self._render_meeting_note(meeting_data, original_text)
with open(filepath, "w", encoding="utf-8") as f:
f.write(content)
logger.info(f"会议笔记已生成: {filepath}")
self._create_all_entity_notes(meeting_data)
self._update_graph_moc()
return filepath
def _render_meeting_note(self, data: dict, original_text: str) -> str:
lines = []
lines.append("---")
lines.append(f'title: "{data.get("title", "")}"')
lines.append(f'date: "{data.get("date", "")}"')
content_hash = data.get("_content_hash", "")
if content_hash:
lines.append(f'content_hash: "{content_hash}"')
lines.append("tags: [meeting]")
lines.append("---")
lines.append("")
lines.append(f"# {data.get('title', '')}")
lines.append("")
if data.get("date"):
lines.append(f"**日期**: {data['date']}")
if data.get("participants"):
participants_links = [self._entity_link(p) for p in data["participants"]]
lines.append(f"**参会人**: {', '.join(participants_links)}")
lines.append("")
if data.get("summary"):
lines.append("## 摘要")
lines.append(data["summary"])
lines.append("")
lines.append("## 原文")
lines.append(original_text)
lines.append("")
if data.get("entities"):
lines.append("## 涉及实体")
for e in data["entities"]:
name = e.get("name", "")
if not name:
continue
lines.append(f"- {self._entity_link(name)} ({e.get('entity_type', '')}): {e.get('description', '')}")
lines.append("")
if data.get("action_items"):
lines.append("## 行动项")
for item in data["action_items"]:
task = item.get("task", "")
assignee_link = self._entity_link(item["assignee"]) if item.get("assignee") else "待确认"
deadline = item.get("deadline", "未指定")
priority = item.get("priority", "")
status_emoji = "" if item.get("status") == "已完成" else "🔄"
lines.append(f"- {status_emoji} **{task}** | 负责人: {assignee_link} | 截止: {deadline} | 优先级: {priority}")
history = item.get("_history", [])
if len(history) > 1:
for h in history:
icon = "" if h.get("status") == "已完成" else "🔄"
lines.append(f" - {h.get('date', '')}: {icon} {h.get('status', '')} (优先级: {h.get('priority', '')})")
lines.append("")
if data.get("metrics"):
lines.append("## 指标跟踪")
lines.append("| 指标 | 当前值 | 目标值 | 趋势 | 负责人 |")
lines.append("|------|--------|--------|------|--------|")
for m in data["metrics"]:
trend_icon = {"向好": "📈", "持平": "➡️", "恶化": "📉"}.get(m.get("trend", ""), "")
owner_link = self._entity_link(m["owner"]) if m.get("owner") else "-"
lines.append(f"| {m['metric_name']} | {m.get('value', '')} | {m.get('target', '')} | {trend_icon}{m.get('trend', '')} | {owner_link} |")
lines.append("")
if data.get("decisions"):
lines.append("## 决策记录")
for d in data["decisions"]:
proposer_link = self._entity_link(d["proposer"]) if d.get("proposer") else "-"
status_badge = "✅ 已决" if d.get("status") == "已决" else "⏳ 待定"
lines.append(f"- {status_badge} {d['content']} ({proposer_link})")
lines.append("")
if data.get("relations"):
lines.append("## 关系图谱")
for r in data["relations"]:
sub = r.get("subject", "")
obj = r.get("object", "")
pred = r.get("predicate", "")
if sub and obj:
lines.append(f"- {self._entity_link(sub)} → **{pred}** → {self._entity_link(obj)}")
lines.append("")
lines.append("---")
date_tag = data.get("date", "").replace("-", "/")
lines.append(f"#meeting #{date_tag}")
return "\n".join(lines)
def _create_all_entity_notes(self, data: dict):
seen: Set[str] = set()
for entity in data.get("entities", []):
name = entity.get("name", "")
if name and name not in seen:
self._upsert_entity_note(name, entity.get("entity_type", "实体"), entity.get("description", ""), data)
seen.add(name)
for participant in data.get("participants", []):
if participant and participant not in seen:
self._upsert_entity_note(participant, "人物", f"{participant} (参会人)", data)
seen.add(participant)
for rel in data.get("relations", []):
for key in ["subject", "object"]:
name = rel.get(key, "")
if name and name not in seen:
etype = rel.get(f"{key}_type", "实体")
self._upsert_entity_note(name, etype, "", data)
seen.add(name)
for item in data.get("action_items", []):
name = item.get("assignee", "")
if name and name not in seen:
self._upsert_entity_note(name, "人物", f"{name} (行动项负责人)", data)
seen.add(name)
for m in data.get("metrics", []):
name = m.get("owner", "")
if name and name not in seen:
self._upsert_entity_note(name, "人物", f"{name} (指标负责人)", data)
seen.add(name)
for d in data.get("decisions", []):
name = d.get("proposer", "")
if name and name not in seen:
self._upsert_entity_note(name, "人物", f"{name} (决策提出人)", data)
seen.add(name)
def _upsert_entity_note(self, name: str, entity_type: str, description: str, meeting_data: dict):
filepath = self._entity_path(name)
meeting_link = self._meeting_link(meeting_data)
if os.path.exists(filepath):
with open(filepath, "r", encoding="utf-8") as f:
existing = f.read()
if meeting_link not in existing:
idx = existing.find("## 相关会议")
if idx > 0:
section_end = existing.find("\n## ", idx + 10)
if section_end < 0:
section_end = len(existing)
new_section = existing[idx:section_end].rstrip() + f"\n - {meeting_link}\n"
existing = existing[:idx] + new_section + existing[section_end:]
else:
existing = existing.rstrip() + f"\n\n## 相关会议\n- {meeting_link}\n"
self._upsert_entity_action_items(existing, meeting_data, name, meeting_link, filepath)
return
rel_lines = []
for r in meeting_data.get("relations", []):
if r.get("subject") == name and r.get("object"):
rel_lines.append(f"- → **{r['predicate']}** → {self._entity_link(r['object'])}")
elif r.get("object") == name and r.get("subject"):
rel_lines.append(f"- {self._entity_link(r['subject'])} → **{r['predicate']}** →")
action_lines = []
for item in meeting_data.get("action_items", []):
if item.get("assignee") == name:
task = item.get("task", "")
status_emoji = "" if item.get("status") == "已完成" else "🔄"
action_lines.append(f"- {status_emoji} {task} (状态: {item.get('status', '待办')}, 源自: {meeting_link})")
history = item.get("_history", [])
if len(history) > 1:
for h in history:
icon = "" if h.get("status") == "已完成" else "🔄"
action_lines.append(f" - {h.get('date', '')}: {icon} {h.get('status', '')}")
content = f"""---
type: {_sanitize_filename(entity_type)}
entity_type: "{entity_type}"
tags: [entity, {_sanitize_filename(entity_type)}]
---
# {name}
**类型**: {entity_type}
**描述**: {description}
## 相关会议
- {meeting_link}
## 关系
{chr(10).join(rel_lines) if rel_lines else "(暂无)"}
## 行动项
{chr(10).join(action_lines) if action_lines else "(暂无)"}
"""
with open(filepath, "w", encoding="utf-8") as f:
f.write(content)
def _upsert_entity_action_items(self, existing: str, meeting_data: dict, entity_name: str, meeting_link: str, filepath: str):
action_lines = []
for item in meeting_data.get("action_items", []):
if item.get("assignee") == entity_name:
task = item.get("task", "")
status_emoji = "" if item.get("status") == "已完成" else "🔄"
action_lines.append(f"- {status_emoji} {task} (状态: {item.get('status', '待办')}, 源自: {meeting_link})")
history = item.get("_history", [])
if len(history) > 1:
for h in history:
icon = "" if h.get("status") == "已完成" else "🔄"
action_lines.append(f" - {h.get('date', '')}: {icon} {h.get('status', '')}")
action_section = "\n".join(action_lines) if action_lines else "(暂无)"
idx = existing.find("## 行动项")
if idx > 0:
section_end = existing.find("\n## ", idx + 10)
if section_end < 0:
section_end = len(existing)
new_section = f"## 行动项\n{action_section}"
existing = existing[:idx] + new_section + existing[section_end:]
else:
existing = existing.rstrip() + f"\n\n## 行动项\n{action_section}\n"
with open(filepath, "w", encoding="utf-8") as f:
f.write(existing)
def _update_graph_moc(self):
meetings = [f for f in os.listdir(self.meetings_dir) if f.endswith(".md")]
entities = [f for f in os.listdir(self.entities_dir) if f.endswith(".md")]
lines = []
lines.append("---")
lines.append("tags: [moc, graph]")
lines.append("---")
lines.append("")
lines.append("# 知识图谱总览")
lines.append("")
lines.append("## 统计")
lines.append(f"- **会议数量**: {len(meetings)}")
lines.append(f"- **实体数量**: {len(entities)}")
lines.append("")
lines.append("## 最近会议")
for m in sorted(meetings, reverse=True)[:10]:
name = m.replace(".md", "")
link_text = name[11:] if len(name) > 11 else name
lines.append(f"- [[Meetings/{name}|{link_text}]]")
lines.append("")
lines.append("## 实体索引")
for e in sorted(entities):
name = e.replace(".md", "")
lines.append(f"- [[Entities/{name}|{name}]]")
with open(os.path.join(self.graphs_dir, "知识图谱总览.md"), "w", encoding="utf-8") as f:
f.write("\n".join(lines))
def rebuild_vault(self, meetings_data: List[dict]):
import shutil
if os.path.exists(self.vault_path):
shutil.rmtree(self.vault_path)
self._ensure_dirs()
self._ensure_obsidian_config()
for md in meetings_data:
self.add_meeting(md, md.get("_original_text", ""))
obsidian_manager = ObsidianVaultManager()

View File

@ -1,8 +1,4 @@
openai>=1.0.0
pydantic>=2.0.0
llama-index>=0.10.0
llama-index-embeddings-openai>=0.1.0
llama-index-vector-stores-chroma>=0.1.0
chromadb>=0.5.0
python-dotenv>=1.0.0
pyvis>=0.3.0
neo4j>=5.26.0

View File

@ -1,259 +0,0 @@
import hashlib
import json
import logging
import os
import re
from typing import List, Optional
from openai import OpenAI as OpenAI_Client
from llama_index.core import (
Document,
VectorStoreIndex,
StorageContext,
load_index_from_storage,
)
from llama_index.core.embeddings import BaseEmbedding
from llama_index.core.settings import Settings
from config import config
logger = logging.getLogger(__name__)
class CustomOpenAIEmbedding(BaseEmbedding):
def __init__(
self,
model: str = "text-embedding-ada-002",
api_key: Optional[str] = None,
api_base: Optional[str] = None,
**kwargs,
):
super().__init__(model_name=model, **kwargs)
self._client = OpenAI_Client(
api_key=api_key or "not-needed",
base_url=api_base,
)
self._model = model
async def _aget_query_embedding(self, query: str) -> List[float]:
return self._get_embedding(query)
async def _aget_text_embedding(self, text: str) -> List[float]:
return self._get_embedding(text)
def _get_query_embedding(self, query: str) -> List[float]:
return self._get_embedding(query)
def _get_text_embedding(self, text: str) -> List[float]:
return self._get_embedding(text)
def _get_embedding(self, text: str) -> List[float]:
resp = self._client.embeddings.create(
model=self._model,
input=text,
)
return resp.data[0].embedding
class MeetingVectorStore:
def __init__(self):
embed_model = CustomOpenAIEmbedding(
model=config.embedding.model,
api_key=config.embedding.api_key or None,
api_base=config.embedding.api_base if config.embedding.api_base else None,
)
Settings.embed_model = embed_model
self.persist_dir = config.vector_store.persist_dir
self._index: Optional[VectorStoreIndex] = None
self._load_or_create_index()
def _load_or_create_index(self):
if os.path.exists(os.path.join(self.persist_dir, "docstore.json")):
try:
storage_context = StorageContext.from_defaults(persist_dir=self.persist_dir)
self._index = load_index_from_storage(storage_context)
logger.info(f"从磁盘加载向量索引: {self.persist_dir}")
return
except Exception as e:
logger.warning(f"加载向量索引失败,将创建新索引: {e}")
self._index = VectorStoreIndex.from_documents([])
logger.info("创建新的向量索引")
def _save(self):
if self._index:
os.makedirs(self.persist_dir, exist_ok=True)
self._index.storage_context.persist(persist_dir=self.persist_dir)
def _meeting_id(self, meeting_data: dict) -> str:
title = meeting_data.get("title", "")
date = meeting_data.get("date", "")
raw = f"{date}_{title}"
return f"meeting_{hashlib.md5(raw.encode('utf-8')).hexdigest()[:12]}"
def find_meeting(self, title: str, date: str = "") -> Optional[dict]:
if not self._index:
return None
query_text = f"会议标题: {title}"
if date:
query_text += f" 日期: {date}"
try:
results = self.query(query_text, top_k=3)
for r in results:
meta = r.get("metadata", {})
meta_title = meta.get("title", "")
if meta_title == title or (date and meta.get("date") == date):
return meta
return None
except Exception as e:
logger.warning(f"会议查重查询失败: {e}")
return None
def find_similar_text(self, text: str, threshold: float = 0.92) -> Optional[dict]:
if not self._index:
return None
try:
retriever = self._index.as_retriever(similarity_top_k=3)
nodes = retriever.retrieve(text)
for node in nodes:
if node.score is not None and node.score > threshold:
return {
"metadata": node.metadata,
"score": node.score,
}
return None
except Exception as e:
logger.warning(f"文本相似度查重失败: {e}")
return None
def remove_meeting(self, meeting_id: str) -> bool:
if not self._index:
return False
try:
for field in self._FIELD_TYPES:
self._index.delete_ref_doc(f"{meeting_id}_{field}")
self._save()
logger.info(f"已从向量索引移除会议: {meeting_id}")
return True
except Exception as e:
logger.warning(f"移除向量索引失败: {e}")
return False
_FIELD_TYPES = ["header", "summary", "action_items", "metrics", "decisions", "relations", "entities"]
def add_meeting(self, meeting_data: dict) -> bool:
try:
meeting_id = self._meeting_id(meeting_data)
original_text_path = meeting_data.get("_original_text_path", "")
original_text = meeting_data.get("_original_text", "")
base_metadata = {
"title": meeting_data.get("title", ""),
"date": meeting_data.get("date", ""),
"participants": ", ".join(meeting_data.get("participants", [])),
"type": "meeting",
"content_hash": meeting_data.get("_content_hash", ""),
"original_text_path": original_text_path,
"original_text_excerpt": original_text[:500] if original_text else "",
"meeting_id": meeting_id,
}
docs = self._build_field_docs(meeting_data, base_metadata, meeting_id)
if self._index:
for doc in docs:
self._index.insert(doc)
self._save()
logger.info(f"会议 '{meeting_data.get('title')}' 已添加到向量索引 (id={meeting_id}, 字段数={len(docs)})")
return True
except Exception as e:
logger.error(f"添加会议到向量索引失败: {e}")
return False
def _build_field_docs(self, data: dict, base: dict, meeting_id: str) -> List[Document]:
docs = []
header = f"# {data.get('title', '')}"
if data.get("date"):
header += f"\n日期: {data['date']}"
if data.get("participants"):
header += f"\n参会人: {', '.join(data['participants'])}"
docs.append(Document(text=header, metadata={**base, "field": "header"}, doc_id=f"{meeting_id}_header"))
if data.get("summary"):
docs.append(Document(text=data["summary"], metadata={**base, "field": "summary"}, doc_id=f"{meeting_id}_summary"))
if data.get("action_items"):
lines = []
for item in data["action_items"]:
status = item.get('status', '待办')
lines.append(f"- [{status}] {item.get('task', '')} (负责人: {item.get('assignee', '')}, 截止: {item.get('deadline', '')}, 优先级: {item.get('priority', '')})")
history = item.get("_history", [])
if len(history) > 1:
lines.append(" 演变: " + "".join(f"{h.get('date','')}({h.get('status','')})" for h in history))
docs.append(Document(text="\n".join(lines), metadata={**base, "field": "action_items"}, doc_id=f"{meeting_id}_action_items"))
if data.get("metrics"):
lines = []
for m in data["metrics"]:
lines.append(f"- {m.get('metric_name', '')}: {m.get('value', '')} (目标: {m.get('target', '')}, 趋势: {m.get('trend', '')})")
docs.append(Document(text="\n".join(lines), metadata={**base, "field": "metrics"}, doc_id=f"{meeting_id}_metrics"))
if data.get("decisions"):
lines = [f"- {d.get('content', '')}" for d in data["decisions"]]
docs.append(Document(text="\n".join(lines), metadata={**base, "field": "decisions"}, doc_id=f"{meeting_id}_decisions"))
if data.get("relations"):
lines = [f"- {r.get('subject', '')} --{r.get('predicate', '')}--> {r.get('object', '')}" for r in data["relations"]]
docs.append(Document(text="\n".join(lines), metadata={**base, "field": "relations"}, doc_id=f"{meeting_id}_relations"))
if data.get("entities"):
lines = [f"- [{e.get('entity_type', '')}] {e.get('name', '')}: {e.get('description', '')}" for e in data["entities"]]
docs.append(Document(text="\n".join(lines), metadata={**base, "field": "entities"}, doc_id=f"{meeting_id}_entities"))
return docs
def query(self, question: str, top_k: int = 5) -> List[dict]:
if not self._index:
return []
try:
retriever = self._index.as_retriever(similarity_top_k=top_k)
nodes = retriever.retrieve(question)
results = []
for node in nodes:
results.append({
"text": node.text,
"score": node.score,
"metadata": node.metadata,
})
return results
except Exception as e:
logger.error(f"查询向量索引失败: {e}")
return []
def query_as_context(self, question: str, top_k: int = 3) -> str:
results = self.query(question, top_k=top_k)
if not results:
return ""
parts = []
for i, r in enumerate(results):
metadata = r.get("metadata", {})
parts.append(f"[{i+1}] {metadata.get('title', '未知会议')} ({metadata.get('date', '')})\n{r['text']}\n")
return "\n".join(parts)
def get_stats(self) -> dict:
if not self._index:
return {"doc_count": 0, "node_count": 0}
try:
docstore = self._index.docstore
docs = list(docstore.docs.values()) if hasattr(docstore, 'docs') else []
return {
"doc_count": len(docstore.docs) if hasattr(docstore, 'docs') else 0,
"node_count": len(docs),
}
except Exception:
return {"doc_count": 0, "node_count": 0}
meeting_vector_store = MeetingVectorStore()