For Educational Applications
提升检索生成增强系统在高等教育中的应用效果
Supervisor: Dr. Albert Ting Leung LEE
Utilizes five frontier LLMs (Claude Sonnet 4, Gemini 2.5 Pro, DeepSeek R1, ChatGPT o3, Grok 4) with Chain-of-Thought prompts to extract meta-knowledge and generate self-contained Q&A pairs.
Designs comprehensive assessment system encompassing text quality metrics (readability, coherence, technical term density) and retrieval accuracy metrics (recall, precision). Through comprehensive evaluation, we select a high-quality Q&A knowledge base produced through knowledge distillation as the foundation for subsequent integration.
Employs semantic similarity comparison and voting mechanisms to filter and integrate complementary Q&A pairs from different models. High-quality Q&A pairs selected through voting mechanisms must have semantic similarity < 10% with existing Q&A knowledge base to ensure diversity and prevent redundancy.
Applies rule-based segmentation strategies for generated structured Q&A pairs, replacing traditional automatic segmentation methods to achieve optimal knowledge base construction.
采用五个前沿LLM(Claude Sonnet 4、Gemini 2.5 Pro、DeepSeek R1、ChatGPT o3、Grok 4),使用链式思维提示词提取元知识并生成自包含问答对。
设计综合评估系统,涵盖文本质量指标(可读性、连贯性、技术术语密度等)和检索准确性指标(召回率、精确率)。通过综合评估,选出一个通过知识蒸馏产出的高质量Q&A知识集作为后续集成的基础。
采用语义相似度比较和投票机制,筛选并整合来自不同模型的互补性问答对。通过投票机制选出的高质量Q&A需要与现有Q&A知识集的语义相似度<10%,以确保多样性和避免冗余。
对生成的结构化问答对采用基于规则的分割策略,替代传统自动分割方法,实现最优的知识库构建。
Proposed a data-centric, model-agnostic knowledge base construction workflow that enhances knowledge quality through preprocessing stages, reducing dependence on expensive fine-tuning or ultra-long context inference.
Provided empirical evidence through large-scale metrics and concrete case studies that synthetic multi-LLM Q&A corpora significantly improve RAG accuracy and answer utility in higher education settings.
Established an integrated assessment framework with LLM-as-judge metrics, portable to other courses and domains, enabling continuous improvement of educational chatbots.
Demonstrated that rigorous knowledge engineering approaches offer a pragmatic and scalable path to trustworthy GenAI assistance in knowledge-intensive academic environments.
提出了数据中心化、模型无关的知识库构建工作流程,通过预处理阶段提升知识质量,减少对昂贵微调或超长上下文推理的依赖。
通过大规模指标和具体案例研究,实证证明了合成多模型问答语料库在高等教育场景下显著提升RAG精度和答案实用性。
建立了包含LLM评判指标的集成评估框架,可移植到其他课程和领域,支持教育聊天机器人的持续改进。
验证了严格的知识工程方法为知识密集型学术环境中的可信GenAI辅助提供了实用且可扩展的路径。