Name: Ailog - RAG as a Service Platform
Availability: InStock
Rating: 4.8 (156 reviews)

Pourquoi Weaviate ?

API GraphQL (requêtes flexibles)
Modules de vectorisation intégrés
Recherche hybride (vecteur + BM25)
Recherche générative (RAG intégré)
Open-source + cloud géré

Configuration Docker

DEVELOPERbash
docker run -p 8080:8080 -p 50051:50051 semitechnologies/weaviate:latest

Ou avec docker-compose :

DEVELOPERyaml
version: '3.8'
services:
  weaviate:
    image: semitechnologies/weaviate:1.24.6
    ports:
      - "8080:8080"
      - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
      ENABLE_MODULES: 'text2vec-openai,generative-openai'
      OPENAI_APIKEY: ${OPENAI_API_KEY}

Client Python

DEVELOPERpython
import weaviate

client = weaviate.Client("http://localhost:8080")

# Create schema
schema = {
    "class": "Document",
    "vectorizer": "text2vec-openai",
    "moduleConfig": {
        "text2vec-openai": {
            "model": "text-embedding-3-small"
        }
    },
    "properties": [
        {
            "name": "content",
            "dataType": ["text"],
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": False,
                    "vectorizePropertyName": False
                }
            }
        },
        {
            "name": "title",
            "dataType": ["text"]
        },
        {
            "name": "category",
            "dataType": ["text"]
        }
    ]
}

client.schema.create_class(schema)

Insertion de Documents

DEVELOPERpython
# Auto-vectorization
client.data_object.create(
    class_name="Document",
    data_object={
        "content": "Weaviate is a vector database...",
        "title": "Introduction to Weaviate",
        "category": "tutorial"
    }
)

# Batch import (faster)
with client.batch as batch:
    batch.batch_size = 100

    for doc in documents:
        batch.add_data_object(
            class_name="Document",
            data_object={
                "content": doc['text'],
                "title": doc['title'],
                "category": doc['category']
            }
        )

Recherche Sémantique (GraphQL)

DEVELOPERpython
# nearText search
result = (
    client.query
    .get("Document", ["content", "title", "category"])
    .with_near_text({"concepts": ["vector database tutorial"]})
    .with_limit(5)
    .do()
)

print(result["data"]["Get"]["Document"])

Recherche Hybride

Combiner vecteur + recherche par mots-clés :

DEVELOPERpython
result = (
    client.query
    .get("Document", ["content", "title"])
    .with_hybrid(
        query="machine learning models",
        alpha=0.5  # 0=BM25, 1=vecteur, 0.5=équilibré
    )
    .with_limit(10)
    .do()
)

Filtrage

DEVELOPERpython
# Filter by category
result = (
    client.query
    .get("Document", ["content", "title"])
    .with_near_text({"concepts": ["python tutorial"]})
    .with_where({
        "path": ["category"],
        "operator": "Equal",
        "valueText": "programming"
    })
    .with_limit(5)
    .do()
)

Recherche Générative (RAG Intégré)

DEVELOPERpython
# Generate answer from retrieved documents
result = (
    client.query
    .get("Document", ["content", "title"])
    .with_near_text({"concepts": ["how to use embeddings"]})
    .with_generate(
        single_prompt="Résume ce document : {content}"
    )
    .with_limit(3)
    .do()
)

# Access generated text
for doc in result["data"]["Get"]["Document"]:
    print(doc["_additional"]["generate"]["singleResult"])

Multi-Tenancy

DEVELOPERpython
# Create tenants
client.schema.add_class_tenants(
    class_name="Document",
    tenants=[
        {"name": "tenant_a"},
        {"name": "tenant_b"}
    ]
)

# Query specific tenant
result = (
    client.query
    .get("Document", ["content"])
    .with_tenant("tenant_a")
    .with_near_text({"concepts": ["query"]})
    .do()
)

Réplication

DEVELOPERyaml
# docker-compose.yml with 3 nodes
services:
  weaviate-node1:
    image: semitechnologies/weaviate:latest
    environment:
      CLUSTER_HOSTNAME: 'node1'
      CLUSTER_GOSSIP_BIND_PORT: '7100'
      CLUSTER_DATA_BIND_PORT: '7101'

  weaviate-node2:
    image: semitechnologies/weaviate:latest
    environment:
      CLUSTER_HOSTNAME: 'node2'
      CLUSTER_JOIN: 'weaviate-node1:7100'

Pipeline RAG Python

DEVELOPERpython
def weaviate_rag(query):
    # Retrieve with generative search
    result = (
        client.query
        .get("Document", ["content", "title"])
        .with_near_text({"concepts": [query]})
        .with_generate(
            grouped_task=f"Réponds à cette question : {query}",
            grouped_properties=["content"]
        )
        .with_limit(5)
        .do()
    )

    # Extract answer
    answer = result["data"]["Get"]["Document"][0]["_additional"]["generate"]["groupedResult"]

    return answer

# Usage
answer = weaviate_rag("What is machine learning?")
print(answer)

L'interface GraphQL de Weaviate et le RAG intégré le rendent idéal pour le prototypage rapide.

Weaviate : Base de Données Vectorielle Alimentée par GraphQL

Pourquoi Weaviate ?

Configuration Docker

Client Python

Insertion de Documents

Recherche Sémantique (GraphQL)

Recherche Hybride

Filtrage

Recherche Générative (RAG Intégré)

Multi-Tenancy

Réplication

Pipeline RAG Python

Tags

Articles connexes

Bases de Données Vectorielles : Stocker et Rechercher des Embeddings

Milvus : Recherche Vectorielle à l'Échelle Milliards

Pinecone pour le RAG de Production à Grande Échelle

Ailog Assistant