环境安装

conda activate -n nano_rag python=3.12 -y

activate nano_rag

# clone this repo first
git clone https://github.com/gusye1234/nano-graphrag.git
cd nano-graphrag
pip install -e .

配置修改

  1. deepseek 跟 chatglm的api key准备
    在example\using_deepseek_api_as_llm+glm_api_as_embedding.py文件中添加api key
GLM_API_KEY = "your api key"
DEEPSEEK_API_KEY = "your api key"
  1. chatglm中的embedding维度默认是2048,修改为1024,具体代码为:
embedding = client.embeddings.create(input=texts, model=model_name, dimensions=1024)

代码位置为:

@wrap_embedding_func_with_attrs(embedding_dim=1024, max_token_size=8192)
async def GLM_embedding(texts: list[str]) -> np.ndarray:
    model_name = "embedding-3"
    client = OpenAI(
        api_key=GLM_API_KEY, base_url="https://open.bigmodel.cn/api/paas/v4/"
    )
    embedding = client.embeddings.create(input=texts, model=model_name, dimensions=1024)
    final_embedding = [d.embedding for d in embedding.data]
    return np.array(final_embedding)

代码运行

if __name__ == "__main__":
	insert()
    query()

输出:

The story is rich with themes that explore human nature, societal values, and the transformative power of reflection and compassion. Below are the top themes, synthesized from the analysts' reports:

### 1. **Redemption and Transformation**
The central theme of the story is **redemption**, exemplified through Ebenezer Scrooge's profound transformation. Initially portrayed as a miserly and isolated individual, Scrooge undergoes a dramatic change after supernatural encounters with the Ghost of Jacob Marley and the Ghosts of Christmas Past, Present, and Future. These encounters force him to confront his past mistakes, reflect on his present actions, and envision the bleak future that awaits him if he does not change. This journey underscores the possibility of personal growth and the importance of self-reflection and empathy.

### 2. **Human Connection and Family**
The theme of **human connection** is deeply intertwined with the story's message. Scrooge's relationships with his nephew Fred, his clerk Bob Cratchit, and the Cratchit family, particularly Tiny Tim, evolve significantly. The Cratchit family's Christmas dinner and Fezziwig's Christmas ball highlight the joy and fulfillment that come from togetherness, support, and shared traditions. These gatherings emphasize the value of family and community, contrasting sharply with Scrooge's initial isolation.

### 3. **Generosity and Social Responsibility**
**Generosity** and the spirit of giving are recurring themes, particularly through Scrooge's transformation. His acts of kindness toward the Cratchit family and his newfound concern for Tiny Tim illustrate the positive impact of generosity on individuals and the broader community. The story also critiques societal inequalities, as seen in the contrast between Scrooge's wealth and the Cratchit family's poverty. Scrooge's eventual recognition of his social responsibility underscores the importance of compassion and charity, especially during the Christmas season.

### 4. **Resilience and Hope**
The theme of **resilience** is evident in the portrayal of characters like the Cratchit family and the lighthouse keepers, who maintain hope and joy despite adversity. The lighthouse, in particular, symbolizes hope and resilience against harsh conditions, while the Cratchit family's ability to celebrate Christmas despite their struggles highlights the enduring strength of the human spirit. This theme is further reinforced by the miners on the moor and the sailors on the ship, who find camaraderie and joy even in challenging circumstances.

### 5. **The Supernatural and Reflection**
The **supernatural** plays a pivotal role in the narrative, serving as a catalyst for Scrooge's transformation. The ghosts and spirits guide Scrooge through his past, present, and future, prompting deep introspection and forcing him to confront the consequences of his actions. These supernatural elements add a layer of intrigue to the story while emphasizing the importance of reflection and the potential for change.

### 6. **The Transformative Power of Christmas**
The **Christmas season** serves as a backdrop for Scrooge's journey, symbolizing a time of joy, reflection, and renewal. The festive atmosphere, with its traditions, decorations, and communal celebrations, underscores the potential for redemption and the importance of embracing the spirit of the holiday. This theme is evident in the bustling activity of places like The Poulterers' Shops and The Fruiterers' Shops, which contribute to the story's vibrant Christmas setting.

### Conclusion
These themes collectively highlight the story's exploration of human nature, societal values, and the transformative power of compassion and reflection. Through Scrooge's journey, the narrative emphasizes the importance of redemption, human connection, generosity, and resilience, all set against the backdrop of the Christmas season. These themes resonate deeply, offering timeless lessons on the potential for personal growth and the enduring strength of community and family bonds.

代码解释:

1. 完整代码与注解

import os
import logging
import numpy as np
from openai import AsyncOpenAI, OpenAI  # OpenAI API的同步和异步客户端
from dataclasses import dataclass  # 便于创建数据类
from nano_graphrag import GraphRAG, QueryParam  # 从 nano_graphrag 导入 RAG 相关模块
from nano_graphrag.base import BaseKVStorage  # RAG 的缓存存储基类
from nano_graphrag._utils import compute_args_hash  # 计算哈希值的工具函数

# 设置日志级别,降低 nano-graphrag 相关日志的输出量
logging.basicConfig(level=logging.WARNING)
logging.getLogger("nano-graphrag").setLevel(logging.INFO)

# API 密钥(注意:建议使用环境变量,而不是直接在代码中硬编码)
GLM_API_KEY = "your api key"
DEEPSEEK_API_KEY = "your api key"

MODEL = "deepseek-chat"  # 选择的 LLM 模型

# 异步函数:用于与 DeepSeek AI 交互,并在可能的情况下使用缓存
async def deepseepk_model_if_cache(
    prompt, system_prompt=None, history_messages=[], **kwargs
) -> str:
    openai_async_client = AsyncOpenAI(
        api_key=DEEPSEEK_API_KEY, base_url="https://api.deepseek.com"
    )
    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})

    # 先检查缓存是否已有该请求的结果
    hashing_kv: BaseKVStorage = kwargs.pop("hashing_kv", None)
    messages.extend(history_messages)
    messages.append({"role": "user", "content": prompt})
    if hashing_kv is not None:
        args_hash = compute_args_hash(MODEL, messages)
        if_cache_return = await hashing_kv.get_by_id(args_hash)
        if if_cache_return is not None:
            return if_cache_return["return"]

    # 通过 API 请求生成回复
    response = await openai_async_client.chat.completions.create(
        model=MODEL, messages=messages, **kwargs
    )

    # 将结果存入缓存,以便未来查询时使用
    if hashing_kv is not None:
        await hashing_kv.upsert(
            {args_hash: {"return": response.choices[0].message.content, "model": MODEL}}
        )
    return response.choices[0].message.content

# 工具函数:如果文件存在,则删除它
def remove_if_exist(file):
    if os.path.exists(file):
        os.remove(file)

# Embedding 函数数据类,用于封装文本嵌入相关的属性和调用方式
@dataclass
class EmbeddingFunc:
    embedding_dim: int  # 嵌入向量维度
    max_token_size: int  # 最大 token 数
    func: callable  # 具体的嵌入计算函数

    async def __call__(self, *args, **kwargs) -> np.ndarray:
        return await self.func(*args, **kwargs)

# 装饰器:用于包装嵌入函数并附加相关属性
def wrap_embedding_func_with_attrs(**kwargs):
    def final_decro(func) -> EmbeddingFunc:
        new_func = EmbeddingFunc(**kwargs, func=func)
        return new_func
    return final_decro

# 定义 GLM 的文本嵌入函数
@wrap_embedding_func_with_attrs(embedding_dim=1024, max_token_size=8192)
async def GLM_embedding(texts: list[str]) -> np.ndarray:
    model_name = "embedding-3"
    client = OpenAI(
        api_key=GLM_API_KEY, base_url="https://open.bigmodel.cn/api/paas/v4/"
    )
    embedding = client.embeddings.create(input=texts, model=model_name, dimensions=1024)
    final_embedding = [d.embedding for d in embedding.data]
    return np.array(final_embedding)

# 设定 GraphRAG 工作目录
WORKING_DIR = "./nano_graphrag_cache_deepseek_TEST"

# 查询函数:使用 GraphRAG 执行查询
def query():
    rag = GraphRAG(
        working_dir=WORKING_DIR,
        best_model_func=deepseepk_model_if_cache,
        cheap_model_func=deepseepk_model_if_cache,
        embedding_func=GLM_embedding,
    )
    print(
        rag.query(
            "What are the top themes in this story?", param=QueryParam(mode="global")
        )
    )

# 插入函数:用于向 GraphRAG 中插入新数据
def insert():
    from time import time

    # 读取待索引的文本数据
    with open("./tests/mock_data.txt", encoding="utf-8-sig") as f:
        FAKE_TEXT = f.read()

    # 清除缓存文件,确保每次插入操作都从零开始
    remove_if_exist(f"{WORKING_DIR}/vdb_entities.json")
    remove_if_exist(f"{WORKING_DIR}/kv_store_full_docs.json")
    remove_if_exist(f"{WORKING_DIR}/kv_store_text_chunks.json")
    remove_if_exist(f"{WORKING_DIR}/kv_store_community_reports.json")
    remove_if_exist(f"{WORKING_DIR}/graph_chunk_entity_relation.graphml")

    rag = GraphRAG(
        working_dir=WORKING_DIR,
        enable_llm_cache=True,  # 启用 LLM 缓存
        best_model_func=deepseepk_model_if_cache,
        cheap_model_func=deepseepk_model_if_cache,
        embedding_func=GLM_embedding,
    )
    start = time()
    rag.insert(FAKE_TEXT)
    print("indexing time:", time() - start)

# 主程序入口,执行插入和查询操作
if __name__ == "__main__":
    insert()
    query()

2. 运行逻辑

  1. 日志配置:
  • 设定日志级别,降低 nano-graphrag 相关的日志输出量。
  1. API 配置:
  • 定义 GLM_API_KEY 和 DEEPSEEK_API_KEY 以访问 DeepSeek 和 GLM 相关 API。
  • 设定使用的 deepseek-chat 作为 LLM 模型。
  1. DeepSeek 交互函数:
  • deepseepk_model_if_cache:
    • 检查是否已有相同输入的缓存结果。
    • 如果缓存存在,则直接返回,否则调用 deepseek-chat 模型获取新结果,并存入缓存。
  1. 嵌入(Embedding)处理:
  • 使用 GLM_embedding 计算文本的嵌入向量。
  • wrap_embedding_func_with_attrs 用于包装嵌入函数,并附加 embedding_dim 和 max_token_size。
  1. GraphRAG 数据存储和检索:
  • insert():读取 mock_data.txt,清空缓存文件,并将数据索引到 GraphRAG。
  • query():执行查询操作,获取文本中的主题。
  1. 主程序逻辑:
  • 先执行 insert() 进行数据存储,再执行 query() 进行查询。

这段代码的核心是使用 GraphRAG 进行知识检索,结合 DeepSeek 语言模型和 GLM 的嵌入功能,以实现高效的知识存储与查询。

Logo

火山引擎开发者社区是火山引擎打造的AI技术生态平台,聚焦Agent与大模型开发,提供豆包系列模型(图像/视频/视觉)、智能分析与会话工具,并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长,新用户可领50万Tokens权益,助力构建智能应用。

更多推荐