AI Agent 的 Prompt 设计：如何让 LLM 稳定输出结构化 JSON

0. 系列闭环（不公开源码也能跟读）

端到端链路：Vue 前端 → api/routes/chat.py → Guide 多轮 SSE → run_analysis_pipeline（解析→分析→匹配→报告）→ tools/pdf_exporter PDF。
本篇：第 12/17 篇 · 结构化环 · JSON

阶段	用户可见	代码入口	对应篇
建会话	欢迎语	POST /api/sessions	09
多轮对话	SSE 流式	chat/stream → run_guide_single_turn	06, 14
信息充分	开始分析	_run_analysis_background	05, 07
履历解析	进度 30%	run_resume_parser	12
画像/RIASEC	进度 50%	run_profile_analyzer	03, 13
职业匹配	进度 70%	run_career_matcher	02
报告	进度 90%	run_reporter	11
下载 PDF	文件	GET …/report/pdf	11, 15

	说明
读本篇前	第 08 篇 invoke_llm_with_json
读完本篇	手工走通 parse_json_from_text 四层策略
下一环	第 03/13 篇：业务 JSON schema（第 13 篇）

全系列闭环索引：SERIES-LOOP.md

1. 要解决什么问题

在 iCan 主流程里，resume_parser_node 要把 Guide 阶段收集的自然语言履历，转成 structured_profile 供 profile_analyzer_node 消费。输入是非结构化文本，输出必须是固定 schema 的 dict。

实际联调时常见失败形态：

模型把 JSON 包在 ```json 代码块里，或直接混在解释文字后面；
Ollama 本地模型不支持 response_format={"type": "json_object"}，bind 抛错；
JSON 语法有小瑕疵（尾逗号、单引号），json.loads 直接失败；
LLM 两次调用都返回空 dict，整条解析链路断掉。

iCan 的策略是 Prompt 约束 schema + 调用层 JSON 模式 + 四层文本提取 + 正则兜底，而不是指望模型「一次就完美」。

2. 实现位置

模块	职责
`llm/prompts.py`	`RESUME_PARSER_SYSTEM_PROMPT`：完整 JSON 示例 + 字段规则
`llm/providers.py`	`invoke_llm_with_json`：`response_format` 优先，失败降级
`llm/parsers.py`	`parse_json_from_text` 四层提取；`validate_structured_profile` 校验
`agents/resume_parser.py`	组装 messages、选 `get_light_model()`、重试与 `_regex_extract_profile` 兜底

子图顺序（create_resume_parser_graph）：load_input → extract_information → build_profile → validate_profile。

JSON 四层降级解析

3. Prompt 设计：ResumeParser 的 schema 契约

Prompt 定义在 llm/prompts.py 的 RESUME_PARSER_SYSTEM_PROMPT。核心不是「请输出 JSON」一句话，而是四件事同时写清：

完整示例：basic_info、work_experience、skill_set、certifications、career_progression、parsing_confidence 全字段展示；
缺失策略：「如未提及则为 null，不要编造」；
推断标注：parsing_confidence.inferred_fields 列出推断字段；
中文与格式：技能区分 technical/soft，多轮对话要整合去重。

Prompt 里嵌了带 ```json 的完整样例——这恰好与 llm/parsers.py 策略 1 的正则 r"```json\s*([\s\S]*?)\s*```" 对齐：模型若照 Prompt 输出代码块，解析器第一层就能命中。

agents/resume_parser.py 的 extract_information 把 system prompt 与用户原文拼成 messages：

messages = [
    {"role": "system", "content": RESUME_PARSER_SYSTEM_PROMPT},
    {"role": "user", "content": f"请从以下文本中提取结构化个人信息：\n\n{document_content}"},
]
model = get_light_model()
parsed_data = await invoke_llm_with_json(model, messages)

模型选择：履历解析走 get_light_model()（代码默认 LLM_MODEL_LIGHT=gpt-4o-mini），不是 chat 模型。.env 里常见改成 DeepSeek 或 Docker 里的 Ollama qwen3.5:9b——换模型不影响 Prompt/schema，但会影响 JSON 模式兼容性（见踩坑）。

4. 调用层：`invoke_llm_with_json` 的双通道

llm/providers.py 里 JSON 调用不是简单 ainvoke，而是三层递进：

try:
    json_model = model.bind(response_format={"type": "json_object"})
    response = await asyncio.wait_for(json_model.ainvoke(processed, **kwargs), timeout=60)
    raw_content = response.content
except Exception as bind_err:
    logger.warning("response_format JSON 模式不支持，回退到文本模式: %s", bind_err)
    response = await asyncio.wait_for(model.ainvoke(processed, **kwargs), timeout=60)
    raw_content = response.content

try:
    result = json.loads(raw_content)
except (json.JSONDecodeError, TypeError):
    result = parse_json_from_text(raw_content)  # llm/parsers.py
    if not result:
        raise ValueError(f"无法从 LLM 回复中提取有效 JSON，原始内容: {raw_content[:300]}")

流程可以概括为：

bind(json_object) → json.loads(content)
        ↓ 不支持或解析失败
普通 ainvoke → json.loads
        ↓ 仍失败
parse_json_from_text（四层）
        ↓ 空 dict
ValueError / 上游重试

另外，当 LLM_BASE_URL 含 11434 且模型名含 qwen3 时，_inject_no_think 会在 system 消息前加 /no_think，避免 Qwen3 思考块污染 JSON——这是 JSON 稳定性在本地 Ollama 上的额外一层。

5. 四层降级解析器：`parse_json_from_text`

llm/parsers.py 的 parse_json_from_text 是最后一道网，按顺序尝试：

策略	正则/逻辑	典型场景
1	r"```json\s([\s\S]?)\s*```"	ChatGPT 风格输出
2	普通 ``` ... ```，内容以 `{` 或 `[` 开头	未标注 json 的代码块
3	`r"\{[\s\S]*\}"` 贪婪匹配最外层花括号	「好的，结果如下：{…}」
4	`json.loads(text.strip())`	纯 JSON 回复
兜底	返回 `{}`	完全无法解析

与通用教程不同，实现里每一层失败不会抛到外层，只在当前策略 json.loads 失败时进入下一层；最外层 JSONDecodeError 也会捕获并返回 {}。这意味着调用方必须检查空 dict——invoke_llm_with_json 会再抛 ValueError，extract_information 则进入重试或正则 fallback。

源码里每层都有 logger.info 标注策略编号（策略1～4），排查时可对照日志确认走到了哪一层。

6. Agent 侧重试与正则兜底

agents/resume_parser.py 的 extract_information 在 LLM 层之上还有业务重试：

for attempt in range(2):
    try:
        model = get_light_model()
        parsed_data = await invoke_llm_with_json(model, messages)
        if parsed_data and len(parsed_data) > 0:
            break
        logger.warning("[extract_information] 第%d次尝试返回空数据，重试", attempt + 1)
    except TimeoutError as te:
        ...
    except Exception as e:
        ...

if not parsed_data or len(parsed_data) == 0:
    parsed_data = _regex_extract_profile(document_content)

_regex_extract_profile 用正则抽姓名、学历、工作经历等——字段名与 Prompt schema 不完全一致（例如产出 skills 而非 skill_set.technical_skills）。build_profile 会对缺失 key 填默认空结构，这是刻意的「有总比没有强」，但 validate_profile 大概率仍会报缺失必填字段。

7. 解析后的质量闭环

LLM 自评的 parsing_confidence 在 build_profile 被摘到 confidence_scores；validate_profile 调用 llm/parsers.py 的 validate_structured_profile 做代码侧校验，必填字段包括：

basic_info.education、basic_info.major
非空 work_experience 列表
skill_set.technical_skills、skill_set.soft_skills
career_progression.total_years

缺失项写入 parse_errors，validation_passed 写入 confidence_scores。Prompt 里的置信度与 Python 校验是互补的：前者反映模型自评，后者保证下游 Agent 不会收到「空壳 profile」。

8. 在流水线中的位置

顶层 workflow.py：guide_node 信息足够后进入 resume_parser_node，输出 structured_profile 写入 iCanWorkflowState，再交给 profile_analyzer_node。

数据流：

Guide 对话文本 (raw_input)
    → run_resume_parser
    → invoke_llm_with_json + parse_json_from_text
    → structured_profile + confidence_scores + parse_errors
    → ProfileAnalyzer

同一套 invoke_llm_with_json + parse_json_from_text 也被 ProfileAnalyzer、CareerMatcher 等需要 JSON 的节点复用（详见第 8 篇 LLM 层）；ResumeParser 是 schema 最复杂、兜底链最长的调用点。

9. 踩坑与边界

踩坑 1：response_format 不是通用能力。 Ollama 部分模型 bind 失败会走文本模式，此时更依赖 Prompt 里的 JSON 示例和 parse_json_from_text。Docker 默认 qwen3.5:9b 联调时应在日志里确认是否出现「回退到文本模式」警告。

踩坑 2：策略 3 贪婪匹配可能截错。 \{[\s\S]*\} 从第一个 { 到最后一个 }，若模型在 JSON 前后还嵌入了其他花括号文本，可能整段解析失败并掉进 {}。Prompt 要求「仅输出 JSON」仍必要，解析器不能替代 Prompt 约束。

踩坑 3：正则 fallback 与 schema 不对齐。 _regex_extract_profile 产出 skills 等字段，不会自动映射到 skill_set.technical_skills；下游校验失败是预期行为，应引导用户补充信息或重试 LLM，而不是把 fallback 当成功解析。

踩坑 4：空 dict 与重试。 extract_information 最多 2 次尝试；若 invoke_llm_with_json 返回空 dict（未抛异常），会 warning 后重试。超时 TimeoutError 单独捕获，不会无限阻塞。

10. 小结

Prompt 用完整 JSON 示例 + null/推断规则锁定 schema，定义在 llm/prompts.py。
llm/providers.py 的 invoke_llm_with_json 先 json_object 模式，不支持则普通调用，再 json.loads → parse_json_from_text。
llm/parsers.py 四层递进，失败返回 {}，调用方必须处理空结果。
agents/resume_parser.py 用 get_light_model()，并有 2 次重试 + _regex_extract_profile 最后一道兜底。
validate_structured_profile 用代码规则校验必填字段，与 parsing_confidence 自评并行。

下一篇进入 RIASEC 测评的 Prompt 工程（第 13 篇）。

附录：关键源码（逐行注释）

以下代码摘自 iCan 实现，每行上方均有中文注释，不公开仓库也可跟读。
生成命令：python3 bin/build-ican-annotated-snippets.py

parse_json_from_text 四层策略

# ========== parse_json_from_text 四层策略 ==========
# 源文件: llm/parsers.py  行 19-92

# L19: 同步函数 parse_json_from_text：路由决策或工厂方法
def parse_json_from_text(text: str) -> dict:
# L21: 【文档】从 LLM 回复文本中提取 JSON。
# L23: 【文档】功能描述：
# L24: 【文档】从 LLM 返回的文本中提取并解析 JSON 内容。支持以下格式：
# L25: 【文档】1. Markdown 代码块包裹的 JSON（```json ... ```）
# L26: 【文档】2. 普通代码块包裹的 JSON（``` ... ```）
# L27: 【文档】3. 直接嵌入文本中的 JSON（以 { 开头、} 结尾）
# L28: 【文档】解析失败时返回空字典。
# L30: 【文档】入参说明：
# L31: 【文档】text (str): LLM 回复的原始文本
# L33: 【文档】出参说明：
# L34: 【文档】dict: 解析后的 JSON 字典，解析失败返回空字典 {}
# （L20-35 为函数/模块文档字符串，已转为注释便于阅读）
# L36: 开始 try 块，后续 except 负责兜底
    try:
# L37: 从 LLM 文本中提取 JSON（四层正则/解析策略）
        logger.info(f"[parse_json_from_text] 开始执行，入参: 文本长度={len(text)}")
# L38: 从 LLM 文本中提取 JSON（四层正则/解析策略）
        logger.debug(f"[parse_json_from_text] 文本预览: {text[:300]}")

# L40: 条件分支
        if not text or not text.strip():
# L41: 从 LLM 文本中提取 JSON（四层正则/解析策略）
            logger.warning("[parse_json_from_text] 入参文本为空，返回空字典")
# L42: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
            return {}

# L44: 策略1：尝试从 ```json ... ``` 代码块中提取
# L45: 赋值：更新局部变量或 state 字段
        json_code_block_pattern = r"```json\s*([\s\S]*?)\s*```"
# L46: 赋值：更新局部变量或 state 字段
        match = re.search(json_code_block_pattern, text)
# L47: 条件分支
        if match:
# L48: 赋值：更新局部变量或 state 字段
            json_str = match.group(1).strip()
# L49: 从 LLM 文本中提取 JSON（四层正则/解析策略）
            logger.debug(f"[parse_json_from_text] 从 json 代码块中提取到内容，长度: {len(json_str)}")
# L50: 把 LLM 返回字符串解析为 Python dict
            result = json.loads(json_str)
# L51: 从 LLM 文本中提取 JSON（四层正则/解析策略）
            logger.info(f"[parse_json_from_text] 执行完成（策略1: json代码块），返回字段数: {len(result)}")
# L52: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
            return result

# L54: 策略2：尝试从普通 ``` ... ``` 代码块中提取
# L55: 赋值：更新局部变量或 state 字段
        code_block_pattern = r"```\s*([\s\S]*?)\s*```"
# L56: 赋值：更新局部变量或 state 字段
        match = re.search(code_block_pattern, text)
# L57: 条件分支
        if match:
# L58: 赋值：更新局部变量或 state 字段
            inner = match.group(1).strip()
# L59: 尝试判断是否为 JSON（以 { 或 [ 开头）
# L60: 条件分支
            if inner.startswith("{") or inner.startswith("["):
# L61: 从 LLM 文本中提取 JSON（四层正则/解析策略）
                logger.debug(f"[parse_json_from_text] 从普通代码块中提取到 JSON 内容，长度: {len(inner)}")
# L62: 把 LLM 返回字符串解析为 Python dict
                result = json.loads(inner)
# L63: 从 LLM 文本中提取 JSON（四层正则/解析策略）
                logger.info(f"[parse_json_from_text] 执行完成（策略2: 普通代码块），返回字段数: {len(result)}")
# L64: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
                return result

# L66: 策略3：尝试从文本中直接查找 JSON（找最外层的 { }）
# L67: 赋值：更新局部变量或 state 字段
        brace_pattern = r"\{[\s\S]*\}"
# L68: 赋值：更新局部变量或 state 字段
        match = re.search(brace_pattern, text)
# L69: 条件分支
        if match:
# L70: 赋值：更新局部变量或 state 字段
            json_str = match.group(0)
# L71: 从 LLM 文本中提取 JSON（四层正则/解析策略）
            logger.debug(f"[parse_json_from_text] 从文本中直接提取到 JSON 内容，长度: {len(json_str)}")
# L72: 把 LLM 返回字符串解析为 Python dict
            result = json.loads(json_str)
# L73: 从 LLM 文本中提取 JSON（四层正则/解析策略）
            logger.info(f"[parse_json_from_text] 执行完成（策略3: 直接提取），返回字段数: {len(result)}")
# L74: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
            return result

# L76: 策略4：尝试直接解析整个文本
# L77: 开始 try 块，后续 except 负责兜底
        try:
# L78: 把 LLM 返回字符串解析为 Python dict
            result = json.loads(text.strip())
# L79: 从 LLM 文本中提取 JSON（四层正则/解析策略）
            logger.info(f"[parse_json_from_text] 执行完成（策略4: 直接解析），返回字段数: {len(result)}")
# L80: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
            return result
# L81: 捕获异常，避免整图/整请求崩溃
        except json.JSONDecodeError:
# L82: 执行该语句（细节见上文业务描述）
            pass

# L84: 从 LLM 文本中提取 JSON（四层正则/解析策略）
        logger.warning("[parse_json_from_text] 未能从文本中提取到有效 JSON，返回空字典")
# L85: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
        return {}

# L87: 捕获异常，避免整图/整请求崩溃
    except json.JSONDecodeError as e:
# L88: 从 LLM 文本中提取 JSON（四层正则/解析策略）
        logger.error(f"[parse_json_from_text] JSON 解析失败，异常: {e}", exc_info=True)
# L89: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
        return {}
# L90: 捕获异常，避免整图/整请求崩溃
    except Exception as e:
# L91: 从 LLM 文本中提取 JSON（四层正则/解析策略）
        logger.error(f"[parse_json_from_text] 提取 JSON 异常: {e}", exc_info=True)
# L92: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
        return {}

invoke_llm_with_json

# ========== invoke_llm_with_json ==========
# 源文件: llm/providers.py  行 208-278

# L208: 异步函数 invoke_llm_with_json：可被 await，适合 IO 型 LLM/DB 调用
async def invoke_llm_with_json(model: ChatOpenAI, messages: list, **kwargs) -> dict:
# L210: 【文档】调用 LLM 并解析 JSON 输出。
# L212: 【文档】功能描述：
# L213: 【文档】使用指定的 ChatOpenAI 模型实例，传入消息列表异步调用 LLM，
# L214: 【文档】要求模型以 JSON 格式回复，并自动解析回复内容为 Python 字典。
# L215: 【文档】适用于需要结构化数据输出的场景，如履历解析、职业匹配结果等。
# L216: 【文档】优先使用 response_format JSON 模式，若不支持则回退到文本解析。
# L218: 【文档】入参说明：
# L219: 【文档】model (ChatOpenAI): 已配置好的 ChatOpenAI 模型实例
# L220: 【文档】messages (list): 消息列表，格式为 [{"role": "system/user/assistant", "content": "..."}]
# L221: 【文档】**kwargs: 额外参数，如 temperature、max_tokens 等覆盖默认配置
# L223: 【文档】出参说明：
# L224: 【文档】dict: 解析后的 JSON 字典数据
# （L209-225 为函数/模块文档字符串，已转为注释便于阅读）
# L226: 导入依赖模块
    import json

# L228: 导入依赖模块
    from ican.llm.parsers import parse_json_from_text

# L230: 开始 try 块，后续 except 负责兜底
    try:
# L231: 记录日志，便于线上排查节点入参/出参
        logger.info(
# L232: 调用 LLM 并解析 JSON；内部有 JSON mode → 文本降级链
            f"[invoke_llm_with_json] 开始执行，入参: model={model.model_name}，"
# L233: 执行该语句（细节见上文业务描述）
            f"messages 数量: {len(messages)}，kwargs: {kwargs}"
# L234: 执行该语句（细节见上文业务描述）
        )
# L235: 调用 LLM 并解析 JSON；内部有 JSON mode → 文本降级链
        logger.debug(f"[invoke_llm_with_json] 消息详情: {messages}")

# L237: 赋值：更新局部变量或 state 字段
        processed = _inject_no_think(messages)
# L238: 赋值：更新局部变量或 state 字段
        raw_content = None

# L240: 导入依赖模块
        import asyncio as _asyncio

# L242: 开始 try 块，后续 except 负责兜底
        try:
# L243: 尝试 OpenAI JSON 模式，不支持则走 except 降级
            json_model = model.bind(response_format={"type": "json_object"})
# L244: 开始 try 块，后续 except 负责兜底
            try:
# L245: 硬超时包装，防止 LLM 挂死
                response = await _asyncio.wait_for(json_model.ainvoke(processed, **kwargs), timeout=60)
# L246: 捕获异常，避免整图/整请求崩溃
            except _asyncio.TimeoutError:
# L247: 向上抛出异常，由调用方或 LangGraph 处理
                raise TimeoutError("AI 模型响应超时，请稍后重试")
# L248: 赋值：更新局部变量或 state 字段
            raw_content = response.content
# L249: 捕获异常，避免整图/整请求崩溃
        except TimeoutError:
# L250: 向上抛出异常，由调用方或 LangGraph 处理
            raise
# L251: 捕获异常，避免整图/整请求崩溃
        except Exception as bind_err:
# L252: 记录日志，便于线上排查节点入参/出参
            logger.warning(
# L253: 调用 LLM 并解析 JSON；内部有 JSON mode → 文本降级链
                f"[invoke_llm_with_json] response_format JSON 模式不支持，回退到文本模式: {bind_err}"
# L254: 执行该语句（细节见上文业务描述）
            )
# L255: 开始 try 块，后续 except 负责兜底
            try:
# L256: 硬超时包装，防止 LLM 挂死
                response = await _asyncio.wait_for(model.ainvoke(processed, **kwargs), timeout=60)
# L257: 捕获异常，避免整图/整请求崩溃
            except _asyncio.TimeoutError:
# L258: 向上抛出异常，由调用方或 LangGraph 处理
                raise TimeoutError("AI 模型响应超时，请稍后重试")
# L259: 赋值：更新局部变量或 state 字段
            raw_content = response.content

# L261: 记录日志，便于线上排查节点入参/出参
        logger.debug(
# L262: 调用 LLM 并解析 JSON；内部有 JSON mode → 文本降级链
            f"[invoke_llm_with_json] 原始回复长度: {len(raw_content) if raw_content else 0}"
# L263: 执行该语句（细节见上文业务描述）
        )

# L265: 开始 try 块，后续 except 负责兜底
        try:
# L266: 把 LLM 返回字符串解析为 Python dict
            result = json.loads(raw_content)
# L267: 捕获异常，避免整图/整请求崩溃
        except (json.JSONDecodeError, TypeError):
# L268: 调用 LLM 并解析 JSON；内部有 JSON mode → 文本降级链
            logger.info("[invoke_llm_with_json] 直接 JSON 解析失败，尝试 parse_json_from_text 提取")
# L269: 从 LLM 文本中提取 JSON（四层正则/解析策略）
            result = parse_json_from_text(raw_content)
# L270: 条件分支
            if not result:
# L271: 向上抛出异常，由调用方或 LangGraph 处理
                raise ValueError(f"无法从 LLM 回复中提取有效 JSON，原始内容: {raw_content[:300]}")

# L273: 记录日志，便于线上排查节点入参/出参
        logger.info(
# L274: 调用 LLM 并解析 JSON；内部有 JSON mode → 文本降级链
            f"[invoke_llm_with_json] 执行完成，返回 JSON 字段数: {len(result)}"
# L275: 执行该语句（细节见上文业务描述）
        )
# L276: 调用 LLM 并解析 JSON；内部有 JSON mode → 文本降级链
        logger.debug(f"[invoke_llm_with_json] 返回 JSON 预览: {str(result)[:300]}")

# L278: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
        return result

extract_information

# ========== extract_information ==========
# 源文件: agents/resume_parser.py  行 153-225

# L153: 异步函数 extract_information：可被 await，适合 IO 型 LLM/DB 调用
async def extract_information(state: ResumeParserState) -> dict:
# L155: 【文档】使用 LLM 提取结构化信息。
# L157: 【文档】功能描述：
# L158: 【文档】将用户的文本内容发送给 LLM，按照 RESUME_PARSER_SYSTEM_PROMPT
# L159: 【文档】中定义的格式要求，提取结构化的个人信息，包括基本信息、工作经历、
# L160: 【文档】技能集、证书和职业发展路径等。
# L162: 【文档】入参说明：
# L163: 【文档】state (ResumeParserState): 履历解析状态对象，需包含 document_content。
# L165: 【文档】出参说明：
# L166: 【文档】dict: 状态更新字典，包含 parsed_sections（LLM 提取的结构化数据）。
# （L154-167 为函数/模块文档字符串，已转为注释便于阅读）
# L168: 开始 try 块，后续 except 负责兜底
    try:
# L169: 记录日志，便于线上排查节点入参/出参
        logger.info("[extract_information] 开始执行，入参: state=%s", {k: str(v)[:100] for k, v in state.items()})
# L170: 赋值：更新局部变量或 state 字段
        document_content = state.get("document_content", "")

# L172: 条件分支
        if not document_content or not document_content.strip():
# L173: 记录日志，便于线上排查节点入参/出参
            logger.warning("[extract_information] 文档内容为空，跳过提取")
# L174: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
            return {
# L175: 执行该语句（细节见上文业务描述）
                "parsed_sections": {},
# L176: 执行该语句（细节见上文业务描述）
                "parse_errors": ["文档内容为空，无法提取信息"],
# L177: 执行该语句（细节见上文业务描述）
            }

# L179: 赋值：更新局部变量或 state 字段
        messages = [
# L180: 执行该语句（细节见上文业务描述）
            {"role": "system", "content": RESUME_PARSER_SYSTEM_PROMPT},
# L181: 执行该语句（细节见上文业务描述）
            {"role": "user", "content": f"请从以下文本中提取结构化个人信息：\n\n{document_content}"},
# L182: 执行该语句（细节见上文业务描述）
        ]

# L184: 记录日志，便于线上排查节点入参/出参
        logger.info("[extract_information] 调用 LLM 提取结构化信息，文档长度: %d", len(document_content))

# L186: 赋值：更新局部变量或 state 字段
        parsed_data = {}
# L187: 赋值：更新局部变量或 state 字段
        last_err = None
# L188: 循环
        for attempt in range(2):
# L189: 开始 try 块，后续 except 负责兜底
            try:
# L190: 获取轻量模型实例（主要用于 resume_parser 结构化 JSON）
                model = get_light_model()
# L191: 调用 LLM 并解析 JSON；内部有 JSON mode → 文本降级链
                parsed_data = await invoke_llm_with_json(model, messages)
# L192: 条件分支
                if parsed_data and len(parsed_data) > 0:
# L193: 执行该语句（细节见上文业务描述）
                    break
# L194: 记录日志，便于线上排查节点入参/出参
                logger.warning("[extract_information] 第%d次尝试返回空数据，重试", attempt + 1)
# L195: 捕获异常，避免整图/整请求崩溃
            except TimeoutError as te:
# L196: 赋值：更新局部变量或 state 字段
                last_err = te
# L197: 记录日志，便于线上排查节点入参/出参
                logger.warning("[extract_information] 第%d次 LLM 调用超时: %s", attempt + 1, te)
# L198: 捕获异常，避免整图/整请求崩溃
            except Exception as e:
# L199: 赋值：更新局部变量或 state 字段
                last_err = e
# L200: 记录日志，便于线上排查节点入参/出参
                logger.warning("[extract_information] 第%d次 LLM 调用异常: %s", attempt + 1, e)

# L202: 条件分支
        if not parsed_data or len(parsed_data) == 0:
# L203: 记录日志，便于线上排查节点入参/出参
            logger.warning("[extract_information] LLM 提取失败，使用正则 fallback")
# L204: 赋值：更新局部变量或 state 字段
            parsed_data = _regex_extract_profile(document_content)

# L206: 记录日志，便于线上排查节点入参/出参
        logger.info("[extract_information] 结构化数据字段数: %d", len(parsed_data))
# L207: 记录日志，便于线上排查节点入参/出参
        logger.debug("[extract_information] 结构化数据预览: %s", json.dumps(parsed_data, ensure_ascii=False)[:500])

# L209: 赋值：更新局部变量或 state 字段
        result = {
# L210: 执行该语句（细节见上文业务描述）
            "parsed_sections": parsed_data,
# L211: 执行该语句（细节见上文业务描述）
        }
# L212: 记录日志，便于线上排查节点入参/出参
        logger.info("[extract_information] 执行完成，出参: parsed_sections字段数=%d", len(parsed_data))
# L213: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
        return result

# L215: 捕获异常，避免整图/整请求崩溃
    except Exception as e:
# L216: 记录日志，便于线上排查节点入参/出参
        logger.error("[extract_information] LLM 提取结构化信息异常: %s", e, exc_info=True)
# L217: 赋值：更新局部变量或 state 字段
        fallback = _regex_extract_profile(state.get("document_content", ""))
# L218: 条件分支
        if fallback:
# L219: 记录日志，便于线上排查节点入参/出参
            logger.info("[extract_information] 使用正则 fallback 提取到 %d 个字段", len(fallback))
# L220: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
            return {"parsed_sections": fallback}
# L221: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
        return {
# L222: 执行该语句（细节见上文业务描述）
            "parsed_sections": {},
# L223: 执行该语句（细节见上文业务描述）
            "parse_errors": [f"LLM 提取信息异常: {str(e)}"],
# L224: 执行该语句（细节见上文业务描述）
        }

系列导航

篇	主题
1	系统全景
2	五 Agent 协作
3	霍兰德 RIASEC
4–7	状态 · 路由 · 嵌套 · 容错
8–11	LLM 层 · SSE/WS · DB 迁移 · PDF
12–14	JSON Prompt · RIASEC Prompt · Guide Prompt
15–17	Docker · 中间件 · 配置

← 返回 iCan 专题