LangGraph 双层 StateGraph 嵌套：降低 Agent 耦合度的架构实践

0. 系列闭环（不公开源码也能跟读）

端到端链路：Vue 前端 → api/routes/chat.py → Guide 多轮 SSE → run_analysis_pipeline（解析→分析→匹配→报告）→ tools/pdf_exporter PDF。
本篇：第 6/17 篇 · Guide 环 · 内外双层图

阶段	用户可见	代码入口	对应篇
建会话	欢迎语	POST /api/sessions	09
多轮对话	SSE 流式	chat/stream → run_guide_single_turn	06, 14
信息充分	开始分析	_run_analysis_background	05, 07
履历解析	进度 30%	run_resume_parser	12
画像/RIASEC	进度 50%	run_profile_analyzer	03, 13
职业匹配	进度 70%	run_career_matcher	02
报告	进度 90%	run_reporter	11
下载 PDF	文件	GET …/report/pdf	11, 15

	说明
读本篇前	第 05 篇外层 guide_node
读完本篇	画出内层 5 节点子图，并说明 API 走 run_guide_single_turn
下一环	第 14 篇：各阶段 Prompt（第 7 篇）

全系列闭环索引：SERIES-LOOP.md

1. 要解决什么问题

iCan 顶层工作流有 5 个 Agent 节点（Guide → ResumeParser → ProfileAnalyzer → CareerMatcher → Reporter）。如果把 Guide 内部的 5 个对话阶段（欢迎、需求评估、基础采集、深度挖掘、充分性检查）也平铺进同一个 StateGraph，会出现：

状态字段膨胀：Guide 的 collected_info、current_stage 与顶层的 structured_profile、final_report 混在同一 TypedDict；
改 Guide 牵连全局：调整 check_sufficiency 的路由逻辑，可能误触顶层 route_after_guide；
测试成本高：验证「信息不足时回到 dig_deeper」必须跑完四段分析才能隔离。

实际做法是 外层 5 节点 + 内层 Guide 子图：外层 workflow.py 的 guide_node 只做状态映射，内层 agents/guide.py 的 create_guide_graph() 封装 5 个函数节点与条件边。

2. 实现位置：两层状态 + 两层图

层级	文件	状态类型	入口
外层	`workflow.py`	`iCanWorkflowState`	`create_workflow()` → `guide_node`
内层	`agents/guide.py`	`GuideState`	`create_guide_graph()` → `run_guide_agent()`

core/state.py 里两套 TypedDict 职责分离：

# core/state.py — Guide 内层
class GuideState(TypedDict, total=False):
    conversation_history: list[dict[str, str]]
    collected_info: dict[str, Any]
    is_info_sufficient: bool
    messages: Annotated[list[str], operator.add]  # reducer 合并 AI 回复
    current_stage: str
    # ...

# core/state.py — 顶层汇聚
class iCanWorkflowState(TypedDict, total=False):
    session_id: str
    conversation_history: list[dict[str, str]]
    needs_more_info: bool
    structured_profile: dict[str, Any]
    final_report: str
    # ...

外层只关心 needs_more_info、conversation_history、raw_input；内层才持有 current_stage、missing_fields、emotion_state。

双层 StateGraph 嵌套

3. 外层 `guide_node`：门面，不是 Guide 类

workflow.py 里没有 GuideAgent 类，只有 async 函数 guide_node。它负责 提取 → 调用 → 写回：

# workflow.py — guide_node（节选）
async def guide_node(state: iCanWorkflowState) -> dict:
    conversation_history = list(state.get("conversation_history", []))
    raw_input = state.get("raw_input", "")
    if raw_input:
        conversation_history.append({"role": "user", "content": raw_input})

    guide_state: GuideState = create_initial_guide_state()
    guide_state["conversation_history"] = conversation_history

    guide_result = await run_guide_agent(guide_state)

    guide_messages = guide_result.get("messages", [])
    latest_reply = guide_messages[-1] if guide_messages else ""
    updated_history = list(conversation_history)
    if latest_reply:
        updated_history.append({"role": "assistant", "content": latest_reply})

    is_sufficient = guide_result.get("is_info_sufficient", False)
    result = {
        "conversation_history": updated_history,
        "current_agent": "guide",
        "needs_more_info": not is_sufficient,
    }
    collected_info = guide_result.get("collected_info", {})
    if collected_info.get("collected_raw"):
        result["raw_input"] = collected_info["collected_raw"]
    return result

外层不知道内层有 welcome、dig_deeper 等节点名，只读 is_info_sufficient 和 messages[-1]。

外层循环由 route_after_guide 控制：

# workflow.py — route_after_guide（节选）
def route_after_guide(state: iCanWorkflowState) -> str:
    if not state.get("needs_more_info", True):
        return "resume_parser_node"
    user_msg_count = len([m for m in state.get("conversation_history", [])
                          if m.get("role") == "user"])
    if user_msg_count >= 3:
        return "resume_parser_node"  # 强制进入分析
    return "guide_node"

4. 内层 `create_guide_graph()`：五节点 + 条件循环

内层图在 agents/guide.py 构建，节点全是 async 函数，不是类方法：

# agents/guide.py — create_guide_graph（节选）
def create_guide_graph() -> StateGraph:
    graph = StateGraph(GuideState)
    graph.add_node("welcome", welcome)
    graph.add_node("assess_need", assess_need)
    graph.add_node("collect_basic_info", collect_basic_info)
    graph.add_node("dig_deeper", dig_deeper)
    graph.add_node("check_sufficiency", check_sufficiency)

    graph.set_entry_point("welcome")
    graph.add_edge("welcome", "assess_need")
    graph.add_edge("assess_need", "collect_basic_info")
    graph.add_edge("collect_basic_info", "dig_deeper")
    graph.add_edge("dig_deeper", "check_sufficiency")
    graph.add_conditional_edges(
        "check_sufficiency",
        should_continue,
        {"dig_deeper": "dig_deeper", "handoff": END},
    )
    return graph.compile()

run_guide_agent 每次调用都会 create_guide_graph() 再 ainvoke，并设 recursion_limit=15：

async def run_guide_agent(state: GuideState) -> dict:
    graph = create_guide_graph()
    result = await graph.ainvoke(state, config={"recursion_limit": 15})
    return result

内层循环由 should_continue 决定：is_info_sufficient=True → handoff（END）；否则回到 dig_deeper。另有 loop_count >= 8 时强制 handoff（用 messages 列表长度估算循环次数）。

各内层节点均通过 get_chat_model() + invoke_llm() 调 LLM（见第 8 篇），异常时返回固定话术而不重试模型。

5. 与 API 路径的差异：子图并非所有入口都走

这是理解嵌套架构的关键：HTTP 对话 API 默认不走内层 5 节点图。

入口	调用链	是否走 `create_guide_graph`
顶层 `run_workflow()`	`guide_node` → `run_guide_agent`	是
`POST /api/sessions/.../chat`	`run_guide_chat` → `run_guide_single_turn`	否（单轮 LLM）
`POST .../chat/stream`	直接 `model.astream` + 关键词充分性判断	否

workflow.py 的 run_guide_chat 明确走单轮模式：

async def run_guide_chat(conversation_history: list, user_message: str) -> dict:
    from ican.agents.guide import run_guide_single_turn
    result = await run_guide_single_turn(conversation_history, user_message)
    # 更新 history，返回 reply / is_info_sufficient

因此：嵌套子图服务于批处理式顶层 workflow；线上逐轮聊天用的是 run_guide_single_turn 或 SSE 流式，逻辑与内层 check_sufficiency（LLM 判 sufficient/insufficient）并不相同。

6. 在流水线中的位置

完整顶层边（create_workflow）：

guide_node → route_after_guide
    ├─ needs_more_info → guide_node（循环）
    └─ 充分 / 强制退出 → resume_parser_node → profile_analyzer_node
        → career_matcher_node → reporter_node → END

内层一次 ainvoke 会顺序跑完 welcome → … → check_sufficiency，必要时在 dig_deeper ↔ check_sufficiency 间循环。外层每次 guide_node 被调度时，都会 create_initial_guide_state() 并从 welcome 重新开始——这在「无用户实时输入、一次性跑完 workflow」场景下会重复生成欢迎语，属于设计取舍而非 LangGraph 框架限制。

其余四个 Agent（resume_parser、profile_analyzer 等）同样是 外层节点函数 + 内层 run_* 子图/流水线，模式与 Guide 一致，但内层节点数不同；顶层文件只有 workflow.py 一处 create_workflow()。

7. 踩坑

① 注释写「最多循环 2 次」，代码不是 2
should_continue 注释写「最多循环2次」，实际用 loop_count >= 8；外层 route_after_guide 用 user_msg_count >= 3 强制进入分析。写文档或改需求时以 grep 为准，不要抄 docstring。

② 每次 run_guide_agent 重新 compile 图
create_guide_graph() 在每次 run_guide_agent 调用时执行 graph.compile()，没有模块级缓存。Guide 调用频繁时可以考虑缓存 compiled graph，当前 MVP 未做。

③ 外层 guide_node 每次从 welcome 起跑
create_initial_guide_state() 固定 current_stage="greeting"，内层入口始终是 welcome。若外层 route_after_guide 多次回到 guide_node，会重复走欢迎节点——批跑 workflow 时要注意；线上 API 因走 run_guide_single_turn 不受此影响。

④ messages reducer 与 conversation_history 双轨
内层 AI 回复进 GuideState.messages（Annotated add）；外层持久化用 conversation_history（role/content 字典列表）。guide_node 只把 messages[-1] 映射进 history，中间节点产生的多条 message 不会全部进入外层。

8. 小结

嵌套结构：**外层 iCanWorkflowState + guide_node，内层 GuideState + create_guide_graph()**，实现为函数节点而非 Agent 类。
外层门面只做字段映射；内层 5 节点 + should_continue 负责对话阶段与 dig_deeper 循环。
API 聊天走 run_guide_single_turn，不经过内层子图；子图主要用于 run_workflow / guide_node 路径。
双层各有一套退出条件（内层 loop_count/LLM 充分性，外层 needs_more_info/用户轮数），调试时要分清是哪一层在循环。
改 Guide 行为先确认改的是子图节点还是单轮/API 流式路径。

下一篇：LangGraph 错误处理与容错（workflow.py 各节点 except、run_analysis_pipeline 降级）。

附录：关键源码（逐行注释）

以下代码摘自 iCan 实现，每行上方均有中文注释，不公开仓库也可跟读。
生成命令：python3 bin/build-ican-annotated-snippets.py

create_guide_graph

# ========== create_guide_graph ==========
# 源文件: agents/guide.py  行 375-431

# L375: 同步函数 create_guide_graph：路由决策或工厂方法
def create_guide_graph() -> StateGraph:
# L377: 【文档】创建对话引导 Agent 的 StateGraph。
# L379: 【文档】功能描述：
# L380: 【文档】构建对话引导 Agent 的 LangGraph 工作流图，定义节点和边。
# L381: 【文档】工作流按照以下顺序执行：
# L382: 【文档】welcome -> assess_need -> collect_basic_info -> dig_deeper -> check_sufficiency
# L383: 【文档】check_sufficiency 通过条件路由：
# L384: 【文档】- 信息不足 -> dig_deeper（循环继续对话）
# L385: 【文档】- 信息充分 -> END（流程结束）
# L387: 【文档】入参说明：
# L388: 【文档】无
# L390: 【文档】出参说明：
# L391: 【文档】StateGraph: 编译后的 LangGraph StateGraph 实例，可直接调用 invoke。
# （L376-392 为函数/模块文档字符串，已转为注释便于阅读）
# L393: 开始 try 块，后续 except 负责兜底
    try:
# L394: 记录日志，便于线上排查节点入参/出参
        logger.info("[create_guide_graph] 开始创建对话引导 Agent 的 StateGraph")

# L396: 创建 LangGraph 状态图，括号内 TypedDict 定义各节点共享/传递的字段
        graph = StateGraph(GuideState)

# L398: 添加节点
# L399: 注册图节点「welcome」，值为 async 节点函数
        graph.add_node("welcome", welcome)
# L400: 注册图节点「assess_need」，值为 async 节点函数
        graph.add_node("assess_need", assess_need)
# L401: 注册图节点「collect_basic_info」，值为 async 节点函数
        graph.add_node("collect_basic_info", collect_basic_info)
# L402: 注册图节点「dig_deeper」，值为 async 节点函数
        graph.add_node("dig_deeper", dig_deeper)
# L403: 注册图节点「check_sufficiency」，值为 async 节点函数
        graph.add_node("check_sufficiency", check_sufficiency)

# L405: 设置入口节点
# L406: 设置图入口：ainvoke 时第一个执行的节点
        graph.set_entry_point("welcome")

# L408: 定义顺序边
# L409: 添加无条件边：上一节点完成后固定进入下一节点
        graph.add_edge("welcome", "assess_need")
# L410: 添加无条件边：上一节点完成后固定进入下一节点
        graph.add_edge("assess_need", "collect_basic_info")
# L411: 添加无条件边：上一节点完成后固定进入下一节点
        graph.add_edge("collect_basic_info", "dig_deeper")
# L412: 添加无条件边：上一节点完成后固定进入下一节点
        graph.add_edge("dig_deeper", "check_sufficiency")

# L414: 定义条件边：check_sufficiency 后根据充分性判断路由
# L415: 添加条件边：由路由函数返回值决定下一节点名
        graph.add_conditional_edges(
# L416: 执行该语句（细节见上文业务描述）
            "check_sufficiency",
# L417: 执行该语句（细节见上文业务描述）
            should_continue,
# L418: 执行该语句（细节见上文业务描述）
            {
# L419: 执行该语句（细节见上文业务描述）
                "dig_deeper": "dig_deeper",
# L420: 执行该语句（细节见上文业务描述）
                "handoff": END,
# L421: 执行该语句（细节见上文业务描述）
            },
# L422: 执行该语句（细节见上文业务描述）
        )

# L424: 编译 StateGraph，得到可 ainvoke 的 Runnable
        compiled_graph = graph.compile()
# L425: 记录日志，便于线上排查节点入参/出参
        logger.info("[create_guide_graph] StateGraph 创建并编译完成")
# L426: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
        return compiled_graph

# L428: 捕获异常，避免整图/整请求崩溃
    except Exception as e:
# L429: 记录日志，便于线上排查节点入参/出参
        logger.error("[create_guide_graph] 创建 StateGraph 异常: %s", e, exc_info=True)
# L430: 向上抛出异常，由调用方或 LangGraph 处理
        raise

外层 guide_node 门面

# ========== 外层 guide_node 门面 ==========
# 源文件: workflow.py  行 35-105

# L35: 异步函数 guide_node：可被 await，适合 IO 型 LLM/DB 调用
async def guide_node(state: iCanWorkflowState) -> dict:
# L37: 【文档】对话引导节点：调用 GuideAgent 进行多轮对话信息采集
# L39: 【文档】功能描述：
# L40: 【文档】将顶层工作流状态中的对话历史和用户信息传递给 GuideAgent，
# L41: 【文档】调用 run_guide_agent 执行多轮对话引导，收集用户基本信息、
# L42: 【文档】职业困惑和期望等关键信息。根据对话结果更新工作流状态。
# L44: 【文档】入参：
# L45: 【文档】state (iCanWorkflowState): 顶层工作流状态，包含 conversation_history、raw_input 等
# L47: 【文档】出参：
# L48: 【文档】dict: 状态更新字典，包含 conversation_history、current_agent、needs_more_info 等更新
# （L36-49 为函数/模块文档字符串，已转为注释便于阅读）
# L50: 开始 try 块，后续 except 负责兜底
    try:
# L51: 记录日志，便于线上排查节点入参/出参
        logger.info(
# L52: 多轮对话列表，元素为 {role, content}
            "[guide_node] 开始执行，入参: session_id=%s, conversation_history长度=%d, raw_input长度=%d",
# L53: 执行该语句（细节见上文业务描述）
            state.get("session_id"),
# L54: 多轮对话列表，元素为 {role, content}
            len(state.get("conversation_history", [])),
# L55: 执行该语句（细节见上文业务描述）
            len(state.get("raw_input", "")),
# L56: 执行该语句（细节见上文业务描述）
        )

# L58: 构建 GuideAgent 的输入状态
# L59: 多轮对话列表，元素为 {role, content}
        conversation_history = list(state.get("conversation_history", []))
# L60: 赋值：更新局部变量或 state 字段
        raw_input = state.get("raw_input", "")

# L62: 将用户最新消息加入对话历史
# L63: 条件分支
        if raw_input:
# L64: 多轮对话列表，元素为 {role, content}
            conversation_history.append({"role": "user", "content": raw_input})

# L66: 赋值：更新局部变量或 state 字段
        guide_state: GuideState = create_initial_guide_state()
# L67: 多轮对话列表，元素为 {role, content}
        guide_state["conversation_history"] = conversation_history

# L69: 调用 GuideAgent
# L70: 运行 Guide 内层完整子图（CLI/顶层 guide_node 用）
        guide_result = await run_guide_agent(guide_state)

# L72: 提取 GuideAgent 的回复
# L73: 赋值：更新局部变量或 state 字段
        guide_messages = guide_result.get("messages", [])
# L74: 赋值：更新局部变量或 state 字段
        latest_reply = guide_messages[-1] if guide_messages else ""

# L76: 更新对话历史（加入 AI 回复）
# L77: 多轮对话列表，元素为 {role, content}
        updated_history = list(conversation_history)
# L78: 条件分支
        if latest_reply:
# L79: 执行该语句（细节见上文业务描述）
            updated_history.append({"role": "assistant", "content": latest_reply})

# L81: 判断信息是否充分
# L82: Guide 判定用户信息是否足够进入分析阶段
        is_sufficient = guide_result.get("is_info_sufficient", False)

# L84: 收集到的信息
# L85: 赋值：更新局部变量或 state 字段
        collected_info = guide_result.get("collected_info", {})

# L87: 赋值：更新局部变量或 state 字段
        result = {
# L88: 多轮对话列表，元素为 {role, content}
            "conversation_history": updated_history,
# L89: 执行该语句（细节见上文业务描述）
            "current_agent": "guide",
# L90: 是否继续 Guide 循环；False 表示可以进 resume_parser
            "needs_more_info": not is_sufficient,
# L91: 执行该语句（细节见上文业务描述）
        }

# L93: 将收集到的原始信息存储，供后续 ResumeParser 使用
# L94: 条件分支
        if collected_info:
# L95: 赋值：更新局部变量或 state 字段
            raw_collected = collected_info.get("collected_raw", "")
# L96: 条件分支
            if raw_collected:
# L97: 赋值：更新局部变量或 state 字段
                result["raw_input"] = raw_collected

# L99: 记录日志，便于线上排查节点入参/出参
        logger.info(
# L100: 是否继续 Guide 循环；False 表示可以进 resume_parser
            "[guide_node] 执行完成，出参: is_sufficient=%s, needs_more_info=%s, conversation_history长度=%d",
# L101: 执行该语句（细节见上文业务描述）
            is_sufficient,
# L102: 执行该语句（细节见上文业务描述）
            not is_sufficient,
# L103: 执行该语句（细节见上文业务描述）
            len(updated_history),
# L104: 执行该语句（细节见上文业务描述）
        )
# L105: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
        return result

run_guide_single_turn（API 实际路径）

# ========== run_guide_single_turn（API 实际路径） ==========
# 源文件: agents/guide.py  行 465-520

# L465: 异步函数 run_guide_single_turn：可被 await，适合 IO 型 LLM/DB 调用
async def run_guide_single_turn(conversation_history: list, user_message: str) -> dict:
# L467: 【文档】单轮对话模式：直接调用 LLM 进行一轮对话引导，不使用 StateGraph 循环。
# L469: 【文档】功能描述：
# L470: 【文档】基于现有对话历史和用户新消息，调用 LLM 生成回复。
# L471: 【文档】不使用内部的 StateGraph 循环机制，适合与用户逐轮交互的场景。
# L473: 【文档】入参说明：
# L474: 【文档】conversation_history (list): 已有的对话历史
# L475: 【文档】user_message (str): 用户最新消息
# L477: 【文档】出参说明：
# L478: 【文档】dict: 包含 reply（AI回复）、is_info_sufficient（信息是否充足）、collected_info（已收集信息）
# （L466-479 为函数/模块文档字符串，已转为注释便于阅读）
# L480: 开始 try 块，后续 except 负责兜底
    try:
# L481: API 单轮 Guide：不跑内层 5 节点子图，一次 LLM 回复
        logger.info("[run_guide_single_turn] 开始执行，用户消息长度=%d, 历史长度=%d", len(user_message), len(conversation_history))

# L483: 赋值：更新局部变量或 state 字段
        messages = [
# L484: 执行该语句（细节见上文业务描述）
            {"role": "system", "content": GUIDE_SYSTEM_PROMPT},
# L485: 执行该语句（细节见上文业务描述）
        ]
# L486: 多轮对话列表，元素为 {role, content}
        for msg in conversation_history:
# L487: 执行该语句（细节见上文业务描述）
            messages.append(msg)
# L488: 执行该语句（细节见上文业务描述）
        messages.append({"role": "user", "content": user_message})

# L490: 获取对话大模型实例（配置来自 settings.LLM_MODEL_CHAT）
        model = get_chat_model()
# L491: 调用 LLM 返回纯文本，带 60s 超时与 Qwen3 /no_think 注入
        reply = await invoke_llm(model, messages)

# L493: 赋值：更新局部变量或 state 字段
        all_user_text = user_message
# L494: 多轮对话列表，元素为 {role, content}
        for msg in conversation_history:
# L495: 条件分支
            if msg.get("role") == "user":
# L496: 赋值：更新局部变量或 state 字段
                all_user_text += " " + msg.get("content", "")

# L498: 赋值：更新局部变量或 state 字段
        optional_keywords = ["年", "行业", "岗位", "职位", "技能", "经验", "公司", "专业", "学历", "方向", "期望", "困惑",
# L499: 执行该语句（细节见上文业务描述）
                             "工作", "开发", "工程师", "经理", "运营", "产品", "设计", "数据", "架构", "管理",
# L500: 执行该语句（细节见上文业务描述）
                             "经历", "项目", "负责", "参与", "大学", "本科", "硕士", "博士"]
# L501: 赋值：更新局部变量或 state 字段
        found_keywords = [kw for kw in optional_keywords if kw in all_user_text]

# L503: 赋值：更新局部变量或 state 字段
        is_sufficient = (
# L504: 赋值：更新局部变量或 state 字段
            (len(found_keywords) >= 6) or
# L505: 赋值：更新局部变量或 state 字段
            (len(found_keywords) >= 4 and len(all_user_text) >= 50)
# L506: 执行该语句（细节见上文业务描述）
        )

# L508: 赋值：更新局部变量或 state 字段
        collected_info = {"collected_raw": all_user_text}

# L510: API 单轮 Guide：不跑内层 5 节点子图，一次 LLM 回复
        logger.info("[run_guide_single_turn] 执行完成，is_sufficient=%s, found_keywords=%s", is_sufficient, found_keywords)

# L512: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
        return {
# L513: 执行该语句（细节见上文业务描述）
            "reply": reply or "",
# L514: Guide 判定用户信息是否足够进入分析阶段
            "is_info_sufficient": is_sufficient,
# L515: 执行该语句（细节见上文业务描述）
            "collected_info": collected_info,
# L516: 执行该语句（细节见上文业务描述）
        }

# L518: 捕获异常，避免整图/整请求崩溃
    except Exception as e:
# L519: API 单轮 Guide：不跑内层 5 节点子图，一次 LLM 回复
        logger.error("[run_guide_single_turn] 单轮对话异常: %s", e, exc_info=True)
# L520: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
        return {

系列导航

篇	主题
1	系统全景
2	五 Agent 协作
3	霍兰德 RIASEC
4–7	状态 · 路由 · 嵌套 · 容错
8–11	LLM 层 · SSE/WS · DB 迁移 · PDF
12–14	JSON Prompt · RIASEC Prompt · Guide Prompt
15–17	Docker · 中间件 · 配置

← 返回 iCan 专题