Python PDF 生成实战：ReportLab + matplotlib 制作专业级中文报告

0. 系列闭环（不公开源码也能跟读）

端到端链路：Vue 前端 → api/routes/chat.py → Guide 多轮 SSE → run_analysis_pipeline（解析→分析→匹配→报告）→ tools/pdf_exporter PDF。
本篇：第 11/17 篇 · 交付环 · PDF

阶段	用户可见	代码入口	对应篇
建会话	欢迎语	POST /api/sessions	09
多轮对话	SSE 流式	chat/stream → run_guide_single_turn	06, 14
信息充分	开始分析	_run_analysis_background	05, 07
履历解析	进度 30%	run_resume_parser	12
画像/RIASEC	进度 50%	run_profile_analyzer	03, 13
职业匹配	进度 70%	run_career_matcher	02
报告	进度 90%	run_reporter	11
下载 PDF	文件	GET …/report/pdf	11, 15

	说明
读本篇前	第 10 篇报告入库
读完本篇	跟读 generate_pdf 的中文字体与 Markdown 解析
下一环	第 15 篇：Docker 字体依赖（第 12 篇）

全系列闭环索引：SERIES-LOOP.md

1. 要解决什么问题

agents/reporter.py 产出的是 Markdown 字符串（final_report），前端可直接渲染，但用户下载需要 PDF。要求：

中文正文、表格、标题层级不能乱码；
RIASEC / 能力维度最好有图；
不引入 HTML→PDF 中间层（WeasyPrint/wkhtmltopdf 依赖重，且曾出现转义 bug）。

实现集中在 tools/pdf_exporter.py；HTTP 入口在 api/routes/report.py 的 **GET /api/sessions/{session_id}/report/pdf**。

2. 实现位置与调用链

模块	职责
`agents/reporter.py`	LangGraph 子图生成 Markdown `final_report`
`workflow.py` / `api/routes/chat.py`	分析完成后把 `final_report` 写入 `workflow_data`
`tools/pdf_exporter.py`	`generate_pdf` → `_build_pdf`：字体、图表、Markdown 解析、ReportLab 排版
`api/routes/report.py`	`download_report_pdf` 读库并返回 `application/pdf`

Reporter 章节节点走 get_chat_model() + invoke_llm 写 Markdown（与第 8 篇一致，当前未使用 get_light_model()）。PDF 层 不再调 LLM，只消费已有 Markdown。

下载路由（节选 api/routes/report.py）：

@router.get("/{session_id}/report/pdf")
async def download_report_pdf(session_id: str):
    from ican.tools.pdf_exporter import generate_pdf

    workflow_data = session_data.get("workflow_data") or {}
    report_md = workflow_data.get("final_report", "")
    if not report_md:
        raise HTTPException(status_code=404, detail="报告尚未生成")

    profile_data = workflow_data.get("personal_profile") or workflow_data.get("structured_profile") or {}
    career_matches = workflow_data.get("career_matches") or []

    pdf_bytes = await generate_pdf(
        report_md,
        title="iCan 职业规划报告",
        profile_data=profile_data,
        career_matches=career_matches,
    )
    return Response(content=pdf_bytes, media_type="application/pdf", ...)

同文件的 GET .../report/download?format=txt|md 只写纯文本/Markdown 临时文件，不支持 pdf 参数；PDF 必须走 /report/pdf。

PDF 生成流水线

3. 入口函数：`generate_pdf`

# tools/pdf_exporter.py
async def generate_pdf(report_md: str, title: str = "iCan 职业规划报告",
                       profile_data: dict = None, career_matches: list = None) -> bytes:
    show_charts = bool(report_md and re.search(r"能力|雷达|技能|评估", report_md))
    result = _build_pdf(report_md, title, show_charts)
    return result

要点：

profile_data / career_matches 传入但未用于图表数据（见踩坑①）；
show_charts 由报告正文是否含「能力|雷达|技能|评估」等关键词决定，而非 profile 字段是否存在。

4. 中文字体：多路径探测 + 静默降级

_build_pdf 默认 cn_font = "Helvetica"，按序尝试注册 TTF/TTC：

# tools/pdf_exporter.py — 字体路径（节选）
for fp in [
    "/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc",
    "/usr/share/fonts/truetype/noto/NotoSansCJK-Regular.ttc",
    "/usr/share/fonts/opentype/noto/NotoSansCJKsc-Regular.otf",
    "/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc",
    "/usr/share/fonts/truetype/wqy/wqy-microhei.ttc",
    "/usr/share/fonts/truetype/droid/DroidSansFallbackFull.ttf",
    "/System/Library/Fonts/PingFang.ttc",
    "/System/Library/Fonts/STHeiti Light.ttc",
    "/System/Library/Fonts/Hiragino Sans GB.ttc",
    "/Library/Fonts/Arial Unicode.ttf",
]:
    if os.path.exists(fp):
        try:
            pdfmetrics.registerFont(TTFont("CNFont", fp))
            cn_font = "CNFont"
            break
        except Exception:
            pass

所有 ParagraphStyle（st_title、st_body、st_cell 等）统一 fontName=cn_font。Docker 部署需安装 fonts-noto-cjk 或文泉驿（第 15 篇），否则中文会以 Helvetica 方块显示。

5. Markdown 解析：直出 ReportLab，无 HTML

_parse_markdown 把 Markdown 拆成 (type, data) 块：

标题 h1–h3（#{1,4}）
段落 p、列表 li、引用 quote、分隔线 hr
表格行 tr（| col | col |，跳过分隔行 |---|）

内联格式由 _parse_inline 转 ReportLab XML：**bold** → 绿色加粗，`code` → 灰底 Courier。

表格单元格用 Paragraph(..., st_cell) 包装以支持换行；列宽由 _calc_col_widths 按文本长度比例分配，最小 1.5 cm。

# 表格行 flush（节选）
t = Table(table_data, colWidths=col_ws, repeatRows=1)
t.setStyle(TableStyle([
    ("GRID", (0, 0), (-1, -1), 0.5, border_c),
    ("BACKGROUND", (0, 0), (-1, 0), header_bg),
    ("ROWBACKGROUNDS", (0, 1), (-1, -1), [colors.white, colors.HexColor("#f9fafb")]),
]))
elements.append(KeepTogether(t))

这是从「Markdown → HTML → 解析」改直解析后的结构，避免 HTML 实体双重转义。

6. matplotlib 图表嵌入

当 show_charts=True 时，在正文前插入两图：

函数	图表	数据来源
`_generate_radar_chart`	能力雷达图	参数 `ability`，默认 `DEFAULT_RADAR_DATA`
`_generate_bar_chart`	霍兰德柱状图	参数 `holland_data`，默认 `DEFAULT_HOLLAND_DATA`

matplotlib 使用 Agg 后端，PNG 160 dpi 写入 BytesIO，再 base64；嵌入时解码到临时文件供 ReportLab Image 使用：

# tools/pdf_exporter.py — 雷达图（节选）
plt.rcParams["font.sans-serif"] = [
    "Noto Sans CJK SC", "WenQuanYi Zen Hei", "PingFang SC", "SimHei", ...
]
fig, ax = plt.subplots(figsize=(5.5, 4.8), subplot_kw=dict(polar=True))
ax.plot(angles, values_plot, "o-", color="#0d9488")
buf = io.BytesIO()
plt.savefig(buf, format="png", dpi=160, bbox_inches="tight")
return base64.b64encode(buf.read()).decode()

图表标题与品牌色 #0d9488 与 PDF 正文 accent 一致。

7. 与 Reporter 的关系

agents/reporter.py 内层子图：load_all_results → 各 generate_*_section → compile_final_report，最终写入 ReporterState.final_report（Markdown）。

PDF 导出 不参与 Reporter LangGraph；只在用户点击下载时：

workflow_data.final_report (Markdown)
    → generate_pdf()
    → _build_pdf() + 可选图表
    → bytes → HTTP Response

若 run_analysis_pipeline 走 Ollama 不可用降级（规则引擎 _generate_fallback_report），PDF 同样能渲染，只是正文为规则模板 Markdown。

8. 踩坑

① profile_data / career_matches 未驱动图表
download_report_pdf 把 RIASEC 分数和个人画像传给 generate_pdf，但 _generate_radar_chart() / _generate_bar_chart() 在无参时用的是模块顶部 写死的 DEFAULT 字典，未从 profile_data["riasec_scores"] 或 ability_model 读取。接口签名与实现不一致，改 PDF 真实数据需改 _build_pdf 传参逻辑。

② 字体注册失败静默
TTFont("CNFont", fp) 包在 try/except 里，.ttc 多字体集合在部分 ReportLab 版本会注册失败却继续尝试下一路径；全部失败则 cn_font 仍为 Helvetica，中文不可用且无显式报错。

③ 图表开关靠正文关键词
报告不含「能力/雷达/技能/评估」时不插图，即使用户 profile 完整。反之，正文含这些词就会插默认雷达/柱状图，可能与报告叙述不一致。

④ txt/md 与 pdf 入口分离
download_report?format=pdf 会 400；必须 GET .../report/pdf。前端路由需分开处理。

9. 小结

Markdown 报告由 agents/reporter.py 生成；PDF 由 tools/pdf_exporter.generate_pdf 在 GET /api/sessions/{session_id}/report/pdf 按需转换。
中文依赖 _build_pdf 内多路径 TTFont 注册；样式全部绑定 cn_font。
Markdown 经 _parse_markdown 直转 ReportLab Paragraph / Table，内联格式走 _parse_inline。
matplotlib 图 base64 → 临时 PNG → Image；当前图表数据为 默认值，非用户 profile。
生产环境在目标 OS/Docker 镜像上实测 PDF 中文与表格换行，并考虑把 riasec_scores 接入 _generate_bar_chart。

下一篇：Prompt 与 JSON 稳定输出（llm/parsers.py）。

附录：关键源码（逐行注释）

以下代码摘自 iCan 实现，每行上方均有中文注释，不公开仓库也可跟读。
生成命令：python3 bin/build-ican-annotated-snippets.py

generate_pdf 入口

# ========== generate_pdf 入口 ==========
# 源文件: tools/pdf_exporter.py  行 267-281

# L267: 异步函数 generate_pdf：可被 await，适合 IO 型 LLM/DB 调用
async def generate_pdf(report_md: str, title: str = "iCan 职业规划报告",
# L268: 赋值：更新局部变量或 state 字段
                       profile_data: dict = None, career_matches: list = None) -> bytes:
# L269: 赋值：更新局部变量或 state 字段
    profile_data = profile_data or {}
# L270: 赋值：更新局部变量或 state 字段
    career_matches = career_matches or []
# L271: 赋值：更新局部变量或 state 字段
    show_charts = bool(report_md and re.search(r"能力|雷达|技能|评估", report_md))

# L273: 赋值：更新局部变量或 state 字段
    result = _build_pdf(report_md, title, show_charts)

# L275: 记录日志，便于线上排查节点入参/出参
    logger.info(
# L276: 赋值：更新局部变量或 state 字段
        "[generate_pdf] 完成 | 大小=%.1fKB | 图表=%s",
# L277: 执行该语句（细节见上文业务描述）
        len(result) / 1024,
# L278: 执行该语句（细节见上文业务描述）
        show_charts,
# L279: 执行该语句（细节见上文业务描述）
    )
# L280: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
    return result

_build_pdf 开头

# ========== _build_pdf 开头 ==========
# 源文件: tools/pdf_exporter.py  行 283-340

# L283: 同步函数 _build_pdf：路由决策或工厂方法
def _build_pdf(report_md: str, title: str, show_charts: bool) -> bytes:
# L284: 导入依赖模块
    from reportlab.lib.pagesizes import A4
# L285: 导入依赖模块
    from reportlab.platypus import (
# L286: 执行该语句（细节见上文业务描述）
        SimpleDocTemplate, Paragraph, Spacer, Image, Table, TableStyle,
# L287: 执行该语句（细节见上文业务描述）
        KeepTogether,
# L288: 执行该语句（细节见上文业务描述）
    )
# L289: 导入依赖模块
    from reportlab.lib.styles import ParagraphStyle
# L290: 导入依赖模块
    from reportlab.lib.units import cm
# L291: 导入依赖模块
    from reportlab.lib import colors
# L292: 导入依赖模块
    from reportlab.pdfbase import pdfmetrics
# L293: 导入依赖模块
    from reportlab.pdfbase.ttfonts import TTFont
# L294: 导入依赖模块
    from reportlab.lib.enums import TA_CENTER, TA_JUSTIFY, TA_LEFT

# L296: 赋值：更新局部变量或 state 字段
    buf = io.BytesIO()
# L297: 赋值：更新局部变量或 state 字段
    doc = SimpleDocTemplate(
# L298: 执行该语句（细节见上文业务描述）
        buf,
# L299: 赋值：更新局部变量或 state 字段
        pagesize=A4,
# L300: 赋值：更新局部变量或 state 字段
        leftMargin=2 * cm,
# L301: 赋值：更新局部变量或 state 字段
        rightMargin=2 * cm,
# L302: 赋值：更新局部变量或 state 字段
        topMargin=2 * cm,
# L303: 赋值：更新局部变量或 state 字段
        bottomMargin=2 * cm,
# L304: 执行该语句（细节见上文业务描述）
    )

# L306: 赋值：更新局部变量或 state 字段
    cn_font = "Helvetica"
# L307: 循环
    for fp in [
# L308: 执行该语句（细节见上文业务描述）
        "/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc",
# L309: 执行该语句（细节见上文业务描述）
        "/usr/share/fonts/truetype/noto/NotoSansCJK-Regular.ttc",
# L310: 执行该语句（细节见上文业务描述）
        "/usr/share/fonts/opentype/noto/NotoSansCJKsc-Regular.otf",
# L311: 执行该语句（细节见上文业务描述）
        "/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc",
# L312: 执行该语句（细节见上文业务描述）
        "/usr/share/fonts/truetype/wqy/wqy-microhei.ttc",
# L313: 执行该语句（细节见上文业务描述）
        "/usr/share/fonts/truetype/droid/DroidSansFallbackFull.ttf",
# L314: 执行该语句（细节见上文业务描述）
        "/System/Library/Fonts/PingFang.ttc",
# L315: 执行该语句（细节见上文业务描述）
        "/System/Library/Fonts/STHeiti Light.ttc",
# L316: 执行该语句（细节见上文业务描述）
        "/System/Library/Fonts/Hiragino Sans GB.ttc",
# L317: 执行该语句（细节见上文业务描述）
        "/Library/Fonts/Arial Unicode.ttf",
# L318: 执行该语句（细节见上文业务描述）
    ]:
# L319: 条件分支
        if os.path.exists(fp):
# L320: 开始 try 块，后续 except 负责兜底
            try:
# L321: 执行该语句（细节见上文业务描述）
                pdfmetrics.registerFont(TTFont("CNFont", fp))
# L322: 赋值：更新局部变量或 state 字段
                cn_font = "CNFont"
# L323: 执行该语句（细节见上文业务描述）
                break
# L324: 捕获异常，避免整图/整请求崩溃
            except Exception:
# L325: 执行该语句（细节见上文业务描述）
                pass

# L327: 赋值：更新局部变量或 state 字段
    accent_c = colors.HexColor("#0d9488")
# L328: 赋值：更新局部变量或 state 字段
    text_c = colors.HexColor("#1f2937")
# L329: 赋值：更新局部变量或 state 字段
    gray_c = colors.HexColor("#6b7280")
# L330: 赋值：更新局部变量或 state 字段
    border_c = colors.HexColor("#e5e7eb")
# L331: 赋值：更新局部变量或 state 字段
    header_bg = colors.HexColor("#f1f5f9")

# L333: 赋值：更新局部变量或 state 字段
    st_title = ParagraphStyle("Title", fontName=cn_font, fontSize=22,
# L334: 赋值：更新局部变量或 state 字段
                              textColor=accent_c, alignment=TA_CENTER,
# L335: 赋值：更新局部变量或 state 字段
                              spaceAfter=6, leading=28)
# L336: 赋值：更新局部变量或 state 字段
    st_date = ParagraphStyle("Date", fontName=cn_font, fontSize=10,
# L337: 赋值：更新局部变量或 state 字段
                             textColor=gray_c, alignment=TA_CENTER, spaceAfter=4)
# L338: 赋值：更新局部变量或 state 字段
    st_sub = ParagraphStyle("Sub", fontName=cn_font, fontSize=10,
# L339: 赋值：更新局部变量或 state 字段
                            textColor=gray_c, alignment=TA_CENTER, spaceAfter=20)
# L340: 赋值：更新局部变量或 state 字段
    st_h1 = ParagraphStyle("H1", fontName=cn_font, fontSize=17,

GET …/report/pdf 下载

# ========== GET .../report/pdf 下载 ==========
# 源文件: api/routes/report.py  行 195-220

# L195: 装饰器
@router.get("/{session_id}/report/pdf")
# L196: 异步函数 download_report_pdf：可被 await，适合 IO 型 LLM/DB 调用
async def download_report_pdf(session_id: str):
# L197: 导入依赖模块
    from fastapi.responses import Response
# L198: 导入依赖模块
    from ican.tools.pdf_exporter import generate_pdf

# L200: 赋值：更新局部变量或 state 字段
    session_data = repository.get_session(session_id)
# L201: 条件分支
    if not session_data:
# L202: 向上抛出异常，由调用方或 LangGraph 处理
        raise HTTPException(status_code=404, detail="会话不存在")

# L204: JSON 字段：存对话历史、中间结果、final_report 等
    workflow_data = session_data.get("workflow_data") or {}
# L205: JSON 字段：存对话历史、中间结果、final_report 等
    report_md = workflow_data.get("final_report", "")
# L206: 条件分支
    if not report_md:
# L207: 向上抛出异常，由调用方或 LangGraph 处理
        raise HTTPException(status_code=404, detail="报告尚未生成")

# L209: JSON 字段：存对话历史、中间结果、final_report 等
    profile_data = workflow_data.get("personal_profile") or workflow_data.get("structured_profile") or {}
# L210: JSON 字段：存对话历史、中间结果、final_report 等
    career_matches = workflow_data.get("career_matches") or []

# L212: 赋值：更新局部变量或 state 字段
    pdf_bytes = await generate_pdf(
# L213: 执行该语句（细节见上文业务描述）
        report_md,
# L214: 赋值：更新局部变量或 state 字段
        title="iCan 职业规划报告",
# L215: 赋值：更新局部变量或 state 字段
        profile_data=profile_data,
# L216: 赋值：更新局部变量或 state 字段
        career_matches=career_matches,
# L217: 执行该语句（细节见上文业务描述）
    )

# L219: 返回本节点要合并进 state 的字段（LangGraph 会 merge）
    return Response(
# L220: 赋值：更新局部变量或 state 字段
        content=pdf_bytes,

系列导航

篇	主题
1	系统全景
2	五 Agent 协作
3	霍兰德 RIASEC
4–7	状态 · 路由 · 嵌套 · 容错
8–11	LLM 层 · SSE/WS · DB 迁移 · PDF
12–14	JSON Prompt · RIASEC Prompt · Guide Prompt
15–17	Docker · 中间件 · 配置

← 返回 iCan 专题