DocsIntegrationsLlamaIndexGet Started

🦙 LlamaIndex Integration

LlamaIndex (GitHub) is an advanced “data framework” tailored for augmenting Large Language Models (LLMs) with private data.

It streamlines the integration of diverse data sources and formats (APIs, PDFs, docs, SQL, etc.) through versatile data connectors and structures data into indices and graphs for LLM compatibility. The platform offers a sophisticated retrieval/query interface for enriching LLM inputs with context-specific outputs. Designed for both beginners and experts, LlamaIndex provides a user-friendly high-level API for easy data ingestion and querying, alongside customizable lower-level APIs for detailed module adaptation.

Langfuse offers a simple integration for automatic capture of traces and metrics generated in LlamaIndex applications. Any feedback? Let us know on Discord or GitHub. This is a new integration, and we’d love to hear your thoughts.

This integration is based on the LlamaIndex instrumentation module which allows for seamless instrumentation of LlamaIndex applications. In particular, one can handle events and track spans using both custom logic as well as those offered in the module. Users can also define their own events and specify where and when in the code logic that they should be emitted.

If you are interested in the deprecated callback-based integration, see the deprecated LlamaIndex callback integration.

Currently only Python is supported by this integration. If you are interested in an integration with LlamaIndex.TS, add your upvote/comments to this issue.

Example LlamaIndex trace in Langfuse.

Add Langfuse to your LlamaIndex application

Make sure you have both llama-index and langfuse installed.

pip install llama-index langfuse

At the root of your LlamaIndex application, register Langfuse’s LlamaIndexInstrumentor. When instantiating LlamaIndexInstrumentor, make sure to configure your Langfuse API keys and the Host URL correctly via environment variables or constructor arguments.

from langfuse.llama_index import LlamaIndexInstrumentor
 
# Get your keys from the Langfuse project settings page and set them as environment variables
# or pass them as arguments when initializing the instrumentor
instrumentor = LlamaIndexInstrumentor()
 
# Automatically trace all LlamaIndex operations
instrumentor.start()
 
# ... your LlamaIndex index creation ...
index.as_query_engine().query("What is the capital of France?")
 
# Flush events to langfuse
instrumentor.flush()

Done! Traces and metrics from your LlamaIndex application are now automatically tracked in Langfuse. If you construct a new index or query an LLM with your documents in context, your traces and metrics are immediately visible in the Langfuse UI.

Check out the notebook for end-to-end examples of the integration:

Additional configuration

Queuing and flushing

The Langfuse SDKs queue and batches events in the background to reduce the number of network requests and improve overall performance. In a long-running application, this works without any additional configuration.

If you are running a short-lived application, you need to flush Langfuse to ensure that all events are flushed before the application exits.

instrumentor.flush()

Learn more about queuing and batching of events here.

Custom trace parameters

You can update trace parameters at any time via the observe context manager to set a custom trace ID or add additional context to the trace, such as a user ID, session ID, or tags. See the Python SDK Trace documentation for more information.

PropertyDescription
nameIdentify a specific type of trace, e.g. a use case or functionality.
metadataAdditional information that you want to see in Langfuse. Can be any JSON.
session_idThe current session.
user_idThe current user_id.
tagsTags to categorize and filter traces.
versionThe specified version to trace experiments.
releaseThe specified release to trace experiments.
from langfuse.llama_index import LlamaIndexInstrumentor
 
instrumentor = LlamaIndexInstrumentor()
 
# Pass custom trace ID and additional params
with instrumentor.observe(trace_id='custom-trace-id', session_id="my-session", user_id='my-user'):
    # ... your LlamaIndex index creation ...
 
    index.as_query_engine().query("What is the capital of France?")
 

Interoperability with Langfuse SDK

The Langfuse Python SDK is fully interoperable with the LlamaIndex integration.

This is useful when your LlamaIndex executions are part of a larger application and you want to link all traces and spans together. This can also be useful when you’d like to group multiple LlamaIndex executions to be part of the same trace or span.

When using the Langfuse @observe() decorator, langfuse_context.get_current_llama_index_handler() you can pass the current trace ID or parent observation ID to the instrumentor.observe() context manager.

from langfuse.decorators import langfuse_context, observe
from langfuse.llama_index import LlamaIndexInstrumentor
from llama_index.core import Document, VectorStoreIndex
 
instrumentor = LlamaIndexInstrumentor()
 
@observe()
def llama_index_fn(question: str):
    # Get current trace ID and observation ID
    current_trace_id = langfuse_context.get_current_trace_id()
    current_observation_id = langfuse_context.get_current_observation_id()
 
    # Set IDs on the instrumentor
    with instrumentor.observe(
        trace_id=current_trace_id,
        parent_observation_id=current_observation_id,
        update_parent=False,  # Do not update the parent observation or trace, default is True
    ):
        # Run application
        index = VectorStoreIndex.from_documents([doc1, doc2])
        response = index.as_query_engine().query(question)
 
        return response

The instrumentor.observe() context manager will yield a trace object that you can use to attach scores and metadata to your trace.

from langfuse.llama_index import LlamaIndexInstrumentor
 
instrumentor = LlamaIndexInstrumentor()
 
with instrumentor.observe(
    trace_id="custom-trace-id", session_id="my-session", user_id="my-user"
) as trace:
    # ... your LlamaIndex index creation ...
 
    index.as_query_engine().query("What is the capital of France?")
 
# Use the trace client yielded by the context manager for e.g. scoring:
trace.score(name="my-score", value=0.5)
 
# Add additional metadata to the trace
trace.update(metadata={"my-key": "my-value"})
 
# Flush events to langfuse
instrumentor.flush()

Troubleshooting

Streamed OpenAI responses and token usage

OpenAI returns token usage only if stream_options={"include_usage": True} is passed in the streamed completion query. LlamaIndex does not pass this option by default. If you miss the token usage on your streamed responses, please instantiate the OpenAI client as follows:

from llama_index.llms.openai import OpenAI
from langfuse.llama_index import LlamaIndexInstrumentor
 
instrumentor = LlamaIndexInstrumentor()
 
llm = OpenAI(additional_kwargs={"stream_options": {"include_usage": True}})
 
with instrumentor.observe():
    # ... your LlamaIndex index creation ...
 
    index.as_query_engine(llm=llm).query("What is the capital of France?")
 
# Flush events to langfuse
instrumentor.flush()

The streamed token usage should now be captured on the generation.

GitHub Discussions

Was this page useful?

Questions? We're here to help

Subscribe to updates