Mar 05, 2025
Shad Reynolds
VP, Engineering
Enterprise organizations accumulate vast amounts of data assets, but making these resources discoverable and accessible remains a significant challenge. Data catalogs provide structure, but navigating complex metadata through traditional interfaces limits their utility for many business users. Archie bridges this gap by creating a conversational interface to data.world catalogs, enabling natural language interactions with enterprise knowledge graphs.
Knowledge graphs represent information as a network of interconnected concepts and relationships rather than isolated data points. In an enterprise context, this might include datasets, business terms, metrics, and their interdependencies.
For example, a simple representation in RDF (Resource Description Framework) Turtle format might look like:
@prefix dw: <https://data.world/schema/> .
@prefix ex: <https://example.org/> .
ex:QuarterlySales a dw:Dataset ;
dw:owner ex:FinanceTeam ;
dw:contains ex:RevenueMetric ;
dw:tag "financial", "quarterly", "sales" .
This structure enables complex pattern-matching through query languages like SPARQL:
SELECT ?dataset ?owner
WHERE {
?dataset dw:tag "financial" ;
dw:owner ?owner .
}
This query identifies all financial datasets and their owners—a powerful capability for data discovery. For those wanting deeper knowledge graph fundamentals this video introduces RDF and SPARQL in more detail.
Archie combines the reasoning capabilities of large language models with the structured relationships of knowledge graphs. Unlike traditional RAG (Retrieval Augmented Generation) that primarily focuses on document retrieval, Graph RAG navigates relationship networks to build comprehensive context windows. This architecture significantly reduces hallucinations and enables query checking and rewriting based on the defined catalog structure.
The core technical advantage of Archie lies in its ability to traverse knowledge graphs to build context. Unlike traditional RAG systems that focus on document retrieval, Archie's Graph RAG performs contextual traversals through the enterprise knowledge graph.
When a user asks a question, Archie first identifies relevant concepts in the query and locates corresponding nodes in the knowledge graph. It then follows relationship paths to build a comprehensive context, examining both direct properties and related entities. This multi-hop traversal can discover connections that might not be immediately obvious from a direct query.
For example, when a user asks about a specific metric's reliability, Archie might traverse:
From the metric node to its source datasets
From those datasets to their update frequency and data quality scores
To the teams responsible for maintaining those datasets
This relationship-aware retrieval enables Archie to build rich, relevant context that goes beyond simple keyword matching.
When users ask about specific data assets, Archie locates the relevant nodes through a semantic vector database lookup and examines their properties and relationships. The knowledge graph structure allows Archie to provide comprehensive information by following established relationship types.
For instance, consider this expanded RDF representation of a dataset:
ex:MonthlyRevenueReport a dw:Dataset ;
dw:title "Monthly Revenue Report" ;
dw:owner ex:FinanceTeam ;
dw:updateFrequency "Daily" ;
dw:contains ex:GrossRevenue,
ex:CustomerAcquisitionCost ;
dw:derivedFrom ex:SalesTransactions ;
dw:usedBy ex:ExecutiveDashboard,
ex:SalesPerformanceReport,
ex:QuarterlyReview .
Using this structure, Archie can respond with comprehensive asset information:
"The Monthly Revenue Report dataset contains 8 metrics including Gross Revenue and Customer Acquisition Cost. It's updated daily by the Finance team and is derived from the Sales Transactions table. Three dashboards currently use this dataset. Would you like details about any of these connected assets?"
For more complicated questions requiring data synthesis, Archie generates appropriate SPARQL queries based on the user's intent. For example, if a user asks about impact analysis, a query might look like:
SELECT ?affectedAsset ?assetType ?owner
WHERE {
ex:customer_demographics dw:usedBy* ?affectedAsset .
?affectedAsset a ?assetType ;
dw:owner ?owner .
}
This SPARQL query traverses all assets that directly or indirectly depend on the customer_demographics table, enabling Archie to answer: "Which dashboards would be affected if we deprecate the customer_demographics table?"
The power of this approach lies in its ability to transform natural language questions into structured graph queries that leverage the knowledge graph's semantic relationships.
Beyond simple query-response patterns, Archie also acts as an agent to complete multi-step data tasks. Using the knowledge graph as its environment, Archie chains together graph traversals to answer complex questions that require several operations, such as:
Finding relevant datasets based on business terms
Examining metadata and quality attributes
Identifying relationships to dashboards and reports
Determining who has expertise or responsibility for a given asset
This agentic approach allows Archie to solve problems like "Who should I talk to about improving our customer retention metrics?" by navigating from business terms to datasets to owners and experts.
Archie transforms how organizations interact with their data resources by making complex knowledge graphs accessible through natural conversation. Every response is verifiable through thought process logs and direct links to catalog resources, ensuring users can trust the information they receive. More than a search tool, Archie serves as a guide through the enterprise data landscape, helping users discover connections they might not have known to look for.
For organizations ready to unlock the full value of their data investments, Archie offers an intuitive interface that bridges the gap between technical data structures and business users. Schedule a demonstration to see how Archie can transform data discovery in your organization.
Enterprise organizations accumulate vast amounts of data assets, but making these resources discoverable and accessible remains a significant challenge. Data catalogs provide structure, but navigating complex metadata through traditional interfaces limits their utility for many business users. Archie bridges this gap by creating a conversational interface to data.world catalogs, enabling natural language interactions with enterprise knowledge graphs.
Knowledge graphs represent information as a network of interconnected concepts and relationships rather than isolated data points. In an enterprise context, this might include datasets, business terms, metrics, and their interdependencies.
For example, a simple representation in RDF (Resource Description Framework) Turtle format might look like:
@prefix dw: <https://data.world/schema/> .
@prefix ex: <https://example.org/> .
ex:QuarterlySales a dw:Dataset ;
dw:owner ex:FinanceTeam ;
dw:contains ex:RevenueMetric ;
dw:tag "financial", "quarterly", "sales" .
This structure enables complex pattern-matching through query languages like SPARQL:
SELECT ?dataset ?owner
WHERE {
?dataset dw:tag "financial" ;
dw:owner ?owner .
}
This query identifies all financial datasets and their owners—a powerful capability for data discovery. For those wanting deeper knowledge graph fundamentals this video introduces RDF and SPARQL in more detail.
Archie combines the reasoning capabilities of large language models with the structured relationships of knowledge graphs. Unlike traditional RAG (Retrieval Augmented Generation) that primarily focuses on document retrieval, Graph RAG navigates relationship networks to build comprehensive context windows. This architecture significantly reduces hallucinations and enables query checking and rewriting based on the defined catalog structure.
The core technical advantage of Archie lies in its ability to traverse knowledge graphs to build context. Unlike traditional RAG systems that focus on document retrieval, Archie's Graph RAG performs contextual traversals through the enterprise knowledge graph.
When a user asks a question, Archie first identifies relevant concepts in the query and locates corresponding nodes in the knowledge graph. It then follows relationship paths to build a comprehensive context, examining both direct properties and related entities. This multi-hop traversal can discover connections that might not be immediately obvious from a direct query.
For example, when a user asks about a specific metric's reliability, Archie might traverse:
From the metric node to its source datasets
From those datasets to their update frequency and data quality scores
To the teams responsible for maintaining those datasets
This relationship-aware retrieval enables Archie to build rich, relevant context that goes beyond simple keyword matching.
When users ask about specific data assets, Archie locates the relevant nodes through a semantic vector database lookup and examines their properties and relationships. The knowledge graph structure allows Archie to provide comprehensive information by following established relationship types.
For instance, consider this expanded RDF representation of a dataset:
ex:MonthlyRevenueReport a dw:Dataset ;
dw:title "Monthly Revenue Report" ;
dw:owner ex:FinanceTeam ;
dw:updateFrequency "Daily" ;
dw:contains ex:GrossRevenue,
ex:CustomerAcquisitionCost ;
dw:derivedFrom ex:SalesTransactions ;
dw:usedBy ex:ExecutiveDashboard,
ex:SalesPerformanceReport,
ex:QuarterlyReview .
Using this structure, Archie can respond with comprehensive asset information:
"The Monthly Revenue Report dataset contains 8 metrics including Gross Revenue and Customer Acquisition Cost. It's updated daily by the Finance team and is derived from the Sales Transactions table. Three dashboards currently use this dataset. Would you like details about any of these connected assets?"
For more complicated questions requiring data synthesis, Archie generates appropriate SPARQL queries based on the user's intent. For example, if a user asks about impact analysis, a query might look like:
SELECT ?affectedAsset ?assetType ?owner
WHERE {
ex:customer_demographics dw:usedBy* ?affectedAsset .
?affectedAsset a ?assetType ;
dw:owner ?owner .
}
This SPARQL query traverses all assets that directly or indirectly depend on the customer_demographics table, enabling Archie to answer: "Which dashboards would be affected if we deprecate the customer_demographics table?"
The power of this approach lies in its ability to transform natural language questions into structured graph queries that leverage the knowledge graph's semantic relationships.
Beyond simple query-response patterns, Archie also acts as an agent to complete multi-step data tasks. Using the knowledge graph as its environment, Archie chains together graph traversals to answer complex questions that require several operations, such as:
Finding relevant datasets based on business terms
Examining metadata and quality attributes
Identifying relationships to dashboards and reports
Determining who has expertise or responsibility for a given asset
This agentic approach allows Archie to solve problems like "Who should I talk to about improving our customer retention metrics?" by navigating from business terms to datasets to owners and experts.
Archie transforms how organizations interact with their data resources by making complex knowledge graphs accessible through natural conversation. Every response is verifiable through thought process logs and direct links to catalog resources, ensuring users can trust the information they receive. More than a search tool, Archie serves as a guide through the enterprise data landscape, helping users discover connections they might not have known to look for.
For organizations ready to unlock the full value of their data investments, Archie offers an intuitive interface that bridges the gap between technical data structures and business users. Schedule a demonstration to see how Archie can transform data discovery in your organization.
Get the best practices, insights, upcoming events & learn about data.world products.