AI 开发人员必须掌握的 9 个开源库

17611538698

webmaster@21cto.com

登录注册

AI 开发人员必须掌握的 9 个开源库

万能的大雄

人工智能 0 2322 2024-10-17 09:49:27

对我自己来说，人工智能现在已经无处不在，而当今世界每个人都想做人工智能。

但有时，很难知道要掌握哪些工具才能成功地在你的应用中实现人工智能功能。缘由于此，我将在本文中整理了一个有用的存储库列表，相信各位可以在其中学习到并掌握到 AI 的神奇魔法。

1. Composio ：构建 AI 自动化速度提高 10 倍

工具和集成组成了构建 AI 代理的核心。我最近一直在构建 AI 工具和代理，但工具准确性一直是个问题，直到我遇到 Composio。

地址：https://dub.composio.dev/nv5Oz3n

Composio 使 GitHub、Slack、Jira、Airtable 等流行应用程序与 AI 代理的集成变得更加容易，从而能够构建复杂的自动化。

它将代表你处理集成的用户身份验证和授权。因此可以安心地构建 AI 应用程序。特别要指出的，Composio 已通过 SOC2 认证。

我们可以按照以下方法开始使用它：

pip install composio-core

添加 GitHub 集成：

composio add github

Composio 代表您处理用户身份验证和授权。以下是如何使用 GitHub 集成来为存储库加注星标的方法。

from openai import OpenAIfrom composio_openai import ComposioToolSet, Appopenai_client = OpenAI(api_key="******OPENAIKEY******")
# Initialise the Composio Tool Setcomposio_toolset = ComposioToolSet(api_key="**\\*\\***COMPOSIO_API_KEY**\\*\\***")
## Step 4# Get GitHub tools that are pre-configured
actions = composio_toolset.get_actions(actions=[Action.GITHUB_ACTIVITY_STAR_REPO_FOR_AUTHENTICATED_USER])
## Step 5my_task = "Star a repo ComposioHQ/composio on GitHub"
# Create a chat completion request to decide on the actionresponse = openai_client.chat.completions.create(model="gpt-4-turbo",tools=actions, # Passing actions we fetched earlier.messages=[    {"role": "system", "content": "You are a helpful assistant."},    {"role": "user", "content": my_task}  ])

运行此 Python 脚本以使用Agent执行给定的指令。Composio 与 LangChain、LlamaIndex、CrewAi 等著名框架可以非常好地协同合作。

2. Unsloth：更快地训练和微调人工智能模型

训练和微调大语言模型 (LLM) ，是人工智能工程的关键部分。

在许多情况下，专有模型可能无法满足需求。这可能是成本、个性化或隐私问题。在某些时候，您需要在自定义数据集上微调模型。目前，Unsloth 是用于微调和训练 LLM 的最佳库之一。

地址：https://unsloth.ai/

它支持流行 LLM（包括 Llama-3 和 Mistral）及其衍生产品（如 Yi、Open-hermes 等）的完整、LoRA 和 QLoRA 微调。它实现了自定义 triton 内核和手动反向传播引擎以提高模型训练的速度。

要开始使用 Unsloth，请使用 pip 安装，并确保您拥有torch 2.4 和 CUDA 12.1。

pip install --upgrade pippip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"

这是一个使用 SFT（监督微调）在数据集上训练 Mistral 模型的简单脚本：

from unsloth import FastLanguageModel from unsloth import is_bfloat16_supportedimport torchfrom trl import SFTTrainerfrom transformers import TrainingArgumentsfrom datasets import load_datasetmax_seq_length = 2048 # Supports RoPE Scaling internally, so choose any!# Get LAION dataseturl = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"dataset = load_dataset("json", data_files = {"train" : url}, split = "train")
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.fourbit_models = [    "unsloth/mistral-7b-v0.3-bnb-4bit",      # New Mistral v3 2x faster!] # More models at https://huggingface.co/unsloth
model, tokenizer = FastLanguageModel.from_pretrained(    model_name = "unsloth/llama-3-8b-bnb-4bit",    max_seq_length = max_seq_length,    dtype = None,    load_in_4bit = True,)
# Do model patching and add fast LoRA weightsmodel = FastLanguageModel.get_peft_model(    model,    r = 16,    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",                      "gate_proj", "up_proj", "down_proj",],    lora_alpha = 16,    lora_dropout = 0, # Supports any, but = 0 is optimized    bias = "none",    # Supports any, but = "none" is optimized    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context    random_state = 3407,    max_seq_length = max_seq_length,    use_rslora = False,  # We support rank stabilized LoRA    loftq_config = None, # And LoftQ)
trainer = SFTTrainer(    model = model,    train_dataset = dataset,    dataset_text_field = "text",    max_seq_length = max_seq_length,    tokenizer = tokenizer,    args = TrainingArguments(        per_device_train_batch_size = 2,        gradient_accumulation_steps = 4,        warmup_steps = 10,        max_steps = 60,        fp16 = not is_bfloat16_supported(),        bf16 = is_bfloat16_supported(),        logging_steps = 1,        output_dir = "outputs",        optim = "adamw_8bit",        seed = 3407,    ),)trainer.train()

更多的信息，各位可参考官方文档（https://docs.unsloth.ai/）。

3. DsPy：LLM 编程框架

阻碍 LLM 在生产用例中使用的一个因素是其随机性。对于这些用例来说，促使它们输出所需的响应具有很高的失败率。

DsPy 正在解决这个问题。它不是提示语，而是对 LLM 进行编程以获得最大的可靠性。

地址：https://github.com/stanfordnlp/dspy

DSPy 通过做两件关键的事情简化了这一过程：

将程序流程与参数分离：此功能将程序流程（您采取的步骤）与每个步骤的执行细节（LM 提示和权重）分开。这使管理和更新系统变得更加容易。
引入新的优化器： DSPy 使用先进的算法，根据您的目标自动微调 LM 提示和权重，例如提高准确性或减少错误。

查看快速入门的 Nodebook、了解有关如何使用 DsPy 的更多信息，如下地址。

地址：https://github.com/stanfordnlp/dspy/blob/main/intro.ipynb

4. TaiPy：使用 Python 更快地构建 AI Web 应用程序

Taipy 是一款基于 Python 的开源软件，旨在在生产环境中构建 AI Web 应用。它通过允许 Python 开发人员在生产中部署演示应用来增强 Streamlit 和 Gradio 的功能。

地址：https://taipy.io/

Taipy 专为数据科学家和机器学习工程师构建数据和人工智能 Web 应用程序而设计。

支持构建可用于生产的 Web 应用程序
无需学习新语言。只需要 Python。
专注于数据和人工智能算法，无需开发和部署复杂性。

快速开始，我们使用 pip 安装。

pip install taipy

以下，是个简单的 Taipy 应用程序演示了如何使用 Taipy 创建一个基本的电影推荐系统。

import taipy as tpimport pandas as pdfrom taipy import Config, Scope, Gui
# Defining the helper functions
# Callback definition - submits scenario with genre selectiondef on_genre_selected(state):    scenario.selected_genre_node.write(state.selected_genre)    tp.submit(scenario)    state.df = scenario.filtered_data.read()
## Set initial value to Actiondef on_init(state):    on_genre_selected(state)
# Filtering function - taskdef filter_genre(initial_dataset: pd.DataFrame, selected_genre):    filtered_dataset = initial_dataset[initial_dataset["genres"].str.contains(selected_genre)]    filtered_data = filtered_dataset.nlargest(7, "Popularity %")    return filtered_data
# The main scriptif __name__ == "__main__":    # Taipy Scenario & Data Management
    # Load the configuration made with Taipy Studio    Config.load("config.toml")    scenario_cfg = Config.scenarios["scenario"]
    # Start Taipy Core service    tp.Core().run()
    # Create a scenario    scenario = tp.create_scenario(scenario_cfg)
    # Taipy User Interface    # Let's add a GUI to our Scenario Management for a complete application
    # Get list of genres    genres = [        "Action", "Adventure", "Animation", "Children", "Comedy", "Fantasy", "IMAX"        "Romance","Sci-FI", "Western", "Crime", "Mystery", "Drama", "Horror", "Thriller", "Film-Noir","War", "Musical", "Documentary"    ]
    # Initialization of variables    df = pd.DataFrame(columns=["Title", "Popularity %"])    selected_genre = "Action"
    # User interface definition    my_page = """
# Film recommendation

## Choose your favorite genre<|{selected_genre}|selector|lov={genres}|on_change=on_genre_selected|dropdown|>
## Here are the top seven picks by popularity<|{df}|chart|x=Title|y=Popularity %|type=bar|title=Film Popularity|>    """
    Gui(page=my_page).run()

您还可查看其技术文档以了解更多信息。

https://docs.taipy.io/en/latest/getting_started/

5. Phidata：构建具有记忆功能的LLM代理

通常，构建有效的代理可能并不像听起来那么容易。管理内存、缓存和工具执行可能会变得很有挑战性。

地址：https://www.phidata.com/

Phidata 是一个开源框架，它提供了一种方便可靠的方法来构建具有长期记忆、上下文知识和使用函数调用采取行动的能力的代理。

通过以下方式安装开始使用 Phidatapip：

pip install -U phidata

让我们创建一个可以查询财务数据的简单助手。

from phi.assistant import Assistantfrom phi.llm.openai import OpenAIChatfrom phi.tools.yfinance import YFinanceTools
assistant = Assistant(    llm=OpenAIChat(model="gpt-4o"),    tools=[YFinanceTools(stock_price=True, analyst_recommendations=True, company_info=True, company_news=True)],    show_tool_calls=True,    markdown=True,)assistant.print_response("What is the stock price of NVDA")assistant.print_response("Write a comparison between NVDA and AMD, use all tools available.")

一个可以上网的助手。

from phi.assistant import Assistantfrom phi.tools.duckduckgo import DuckDuckGo
assistant = Assistant(tools=[DuckDuckGo()], show_tool_calls=True)assistant.print_response("Whats happening in France?", markdown=True)

可以参阅官方文档以获取示例和信息。

https://docs.phidata.com/introduction

6. Phoenix：LLM 可观测性更加高效

构建 AI 应用程序只需添加可观察层即可完成。通常，LLM 应用程序有许多活动部分，例如提示、模型温度、p 值等，即使发生轻微变化也会对结果产生重大影响。

这些条件会使应用程序变得极不稳定和不可靠。这就是 LLM 可观察性发挥作用的地方。

ArizeAI 的 Phoneix 可以方便地跟踪 LLM 执行的整个轨迹。

地址：https://phoenix.arize.com/

ArizeAI 是一个开源的 AI 可观察性平台，专为实验、评估和故障排除而设计。它提供：

跟踪 - 使用基于 OpenTelemetry 的仪器跟踪 LLM 应用程序的运行时。
评估 - 利用 LLM 通过响应和检索评估来对应用程序的性能进行基准测试。
数据集 ——创建用于实验、评估和微调的示例版本数据集。
实验 ——跟踪和评估提示、LLM 和检索变化。

Phoenix 与供应商和编程语言无关，支持 LlamaIndex、LangChain、DSPy 等框架以及 OpenAI 和 Bedrock 等 LLM 提供商。

它可以在各种环境中运行，包括 Jupyter NodePad、本地机器、容器或云。

Phoneix 的入门非常简单。如下安装方式：

pip install arize-phoenix

首先，启动 Phoenix 应用程序。

import phoenix as pxsession = px.launch_app()

这将启动 Phoneix 服务器。您现在可以为您的 AI 应用程序设置跟踪，以便在跟踪流入时调试您的应用程序。

要使用 LlamaIndex 的一键功能，须先安装之。如：

pip install 'llama-index>=0.10.44'

使用示例代码如下：
import phoenix as px
from openinference.instrumentation.llama_index import LlamaIndexInstrumentorimport osfrom gcsfs import GCSFileSystemfrom llama_index.core import (    Settings,    VectorStoreIndex,    StorageContext,    set_global_handler,    load_index_from_storage)from llama_index.embeddings.openai import OpenAIEmbeddingfrom llama_index.llms.openai import OpenAIimport llama_index
# To view traces in Phoenix, you will first have to start a Phoenix server. You can do this by running the following:session = px.launch_app()
# Initialize LlamaIndex auto-instrumentationLlamaIndexInstrumentor().instrument()
os.environ["OPENAI_API_KEY"] = ""
# LlamaIndex application initialization may vary# depending on your applicationSettings.llm = OpenAI(model="gpt-4-turbo-preview")Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
# Load your data and create an index. Here we've provided an example of our documentationfile_system = GCSFileSystem(project="public-assets-275721")index_path = "arize-phoenix-assets/datasets/unstructured/llm/llama-index/arize-docs/index/"storage_context = StorageContext.from_defaults(    fs=file_system,    persist_dir=index_path,)
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()
# Query your LlamaIndex applicationquery_engine.query("What is the meaning of life?")query_engine.query("How can I deploy Arize?")
# View the traces in the Phoenix UIpx.active_session().url

一旦您为应用程序执行了足够数量的查询（或聊天数据），可以通过刷新浏览器 URL 来查看 UI 的详细变化和信息。

您还请参阅其开发者文档获取更多跟踪、数据集版本控制和评估代码示例。

7. Airbyte：可靠且可扩展的数据管道

数据对于构建 AI 应用程序至关重要，尤其是在生产中，您必须管理来自各种来源的大量数据。Airbyte 在这方面表现出色。

Airbyte为 API、数据库、数据仓库和数据湖提供了超过 300 个连接器的广泛目录。

地址：https://airbyte.com/

Airbyte 还具有一个名为 PyAirByte 的 Python 扩展。此扩展支持 LangChain 和 LlamaIndex 等流行AI框架，可轻松将数据从多个来源移动到 GenAI 应用程序。

更多信息，还请大家去查看其技术文档。

8. AgentOps：代理监控和可观察性

与传统软件系统一样，人工智能代理需要持续监控和观察。这对于确保代理的行为不偏离预期非常重要。

AgentOps 为监控和观察 AI 代理提供了全面的解决方案。

它提供重放分析、LLM 成本管理、代理基准测试、合规性和安全性工具，并与 CrewAI、AutoGen 和 LangChain 等框架本地集成。

地址：https://www.agentops.ai/

同样地，安装 AgentOps ，请您使用 pip 命令：

pip install agentops

初始化 AgentOps 客户端并自动获取每个 LLM 调用的分析。

import agentops
# Beginning of program's code (i.e. main.py, __init__.py)agentops.init( < INSERT YOUR API KEY HERE >)
...
# (optional: record specific functions)@agentops.record_action('sample function being record')def sample_function(...):    ...
# End of programagentops.end_session('Success')# Woohoo You're done 🎉