4. StorageIntermédiaire

Weaviate : Base de Données Vectorielle Alimentée par GraphQL

16 novembre 2025
12 min de lecture
Équipe de Recherche Ailog

Configurez Weaviate pour le RAG de production avec les requêtes GraphQL, la recherche hybride et les modules génératifs.

Pourquoi Weaviate ?

  • API GraphQL (requêtes flexibles)
  • Modules de vectorisation intégrés
  • Recherche hybride (vecteur + BM25)
  • Recherche générative (RAG intégré)
  • Open-source + cloud géré

Configuration Docker

DEVELOPERbash
docker run -p 8080:8080 -p 50051:50051 semitechnologies/weaviate:latest

Ou avec docker-compose :

DEVELOPERyaml
version: '3.8' services: weaviate: image: semitechnologies/weaviate:1.24.6 ports: - "8080:8080" - "50051:50051" environment: QUERY_DEFAULTS_LIMIT: 25 AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true' PERSISTENCE_DATA_PATH: '/var/lib/weaviate' DEFAULT_VECTORIZER_MODULE: 'text2vec-openai' ENABLE_MODULES: 'text2vec-openai,generative-openai' OPENAI_APIKEY: ${OPENAI_API_KEY}

Client Python

DEVELOPERpython
import weaviate client = weaviate.Client("http://localhost:8080") # Create schema schema = { "class": "Document", "vectorizer": "text2vec-openai", "moduleConfig": { "text2vec-openai": { "model": "text-embedding-3-small" } }, "properties": [ { "name": "content", "dataType": ["text"], "moduleConfig": { "text2vec-openai": { "skip": False, "vectorizePropertyName": False } } }, { "name": "title", "dataType": ["text"] }, { "name": "category", "dataType": ["text"] } ] } client.schema.create_class(schema)

Insertion de Documents

DEVELOPERpython
# Auto-vectorization client.data_object.create( class_name="Document", data_object={ "content": "Weaviate is a vector database...", "title": "Introduction to Weaviate", "category": "tutorial" } ) # Batch import (faster) with client.batch as batch: batch.batch_size = 100 for doc in documents: batch.add_data_object( class_name="Document", data_object={ "content": doc['text'], "title": doc['title'], "category": doc['category'] } )

Recherche Sémantique (GraphQL)

DEVELOPERpython
# nearText search result = ( client.query .get("Document", ["content", "title", "category"]) .with_near_text({"concepts": ["vector database tutorial"]}) .with_limit(5) .do() ) print(result["data"]["Get"]["Document"])

Recherche Hybride

Combiner vecteur + recherche par mots-clés :

DEVELOPERpython
result = ( client.query .get("Document", ["content", "title"]) .with_hybrid( query="machine learning models", alpha=0.5 # 0=BM25, 1=vecteur, 0.5=équilibré ) .with_limit(10) .do() )

Filtrage

DEVELOPERpython
# Filter by category result = ( client.query .get("Document", ["content", "title"]) .with_near_text({"concepts": ["python tutorial"]}) .with_where({ "path": ["category"], "operator": "Equal", "valueText": "programming" }) .with_limit(5) .do() )

Recherche Générative (RAG Intégré)

DEVELOPERpython
# Generate answer from retrieved documents result = ( client.query .get("Document", ["content", "title"]) .with_near_text({"concepts": ["how to use embeddings"]}) .with_generate( single_prompt="Résume ce document : {content}" ) .with_limit(3) .do() ) # Access generated text for doc in result["data"]["Get"]["Document"]: print(doc["_additional"]["generate"]["singleResult"])

Multi-Tenancy

DEVELOPERpython
# Create tenants client.schema.add_class_tenants( class_name="Document", tenants=[ {"name": "tenant_a"}, {"name": "tenant_b"} ] ) # Query specific tenant result = ( client.query .get("Document", ["content"]) .with_tenant("tenant_a") .with_near_text({"concepts": ["query"]}) .do() )

Réplication

DEVELOPERyaml
# docker-compose.yml with 3 nodes services: weaviate-node1: image: semitechnologies/weaviate:latest environment: CLUSTER_HOSTNAME: 'node1' CLUSTER_GOSSIP_BIND_PORT: '7100' CLUSTER_DATA_BIND_PORT: '7101' weaviate-node2: image: semitechnologies/weaviate:latest environment: CLUSTER_HOSTNAME: 'node2' CLUSTER_JOIN: 'weaviate-node1:7100'

Pipeline RAG Python

DEVELOPERpython
def weaviate_rag(query): # Retrieve with generative search result = ( client.query .get("Document", ["content", "title"]) .with_near_text({"concepts": [query]}) .with_generate( grouped_task=f"Réponds à cette question : {query}", grouped_properties=["content"] ) .with_limit(5) .do() ) # Extract answer answer = result["data"]["Get"]["Document"][0]["_additional"]["generate"]["groupedResult"] return answer # Usage answer = weaviate_rag("What is machine learning?") print(answer)

L'interface GraphQL de Weaviate et le RAG intégré le rendent idéal pour le prototypage rapide.

Tags

weaviatebase-de-données-vectoriellegraphqlstorage

Articles connexes

Ailog Assistant

Ici pour vous aider

Salut ! Pose-moi des questions sur Ailog et comment intégrer votre RAG dans vos projets !