Concept
Background
This is a demonstration of a proof of concept Toxicology Search Agent, which combines structured data, unstructured data, and programmatically accessible public data sources. These information sources are integrated in order to build a semantic search engine specifically focused on toxicology.
As a proof of concept, the system shows how it responds to different layers of complex queries (as seen on the left-hand side of the screen). The system automatically builds an intelligence layer, combines data from structured and unstructured sources, and performs agentic planning to aggregate information, analyze, and present results based on a user query.
Evidence Search
"What mutagenicity evidence exists for bromoform?"
The system is accessing the database—both structured and unstructured—gathering information, and processing it at the agentic layer. Once completed, it structures the results in different forms:
- Summary: Bromoform is classified as non-mutagenic in the benchmark dataset.
- Structure & Properties: Obtained via queries to public domain databases like Chembl.
- Literature Context: A summary obtained from unstructured data.
- References: Obtained from structured data in the system.
Compound Specific Search
Detailed profile for 1,2-dibromoethane
Next, a more specific query asking for a detailed profile on a compound: 1,2-dibromoethane.
The system retrieved information for this specific compound, identified its mutagenicity status, and gathered data on its structure, properties, literature context, and references from structured data sources.
Compound Library Search
"Find all compounds associated with liver injury."
Finally, a much more complex query broad and generic in terms of finding all compounds associated with liver injury. This query is not specific to a single compound but requests a lookup across the broad spectrum of the compound footprint associated with a specific condition.
In this case, the results are not with respect to one compound but provide a whole list of compounds from the benchmark dataset, including structural properties, literature context, and references.
Summary
This demonstration shows how the agentic system architecture is built to integrate multiple modalities of data sources (structured, unstructured, programmatic access to public sources), which is processed and reasoned at the agentic intelligence layer to construct a context-specific knowledge graph, and applies real-time planning and reasoning of the information depending on the level of complexity in the query.
The system can be scaled on the dimensions of:
Information
Multiple sources of data.
Integration
Building knowledge graphs from the information.
Intelligence
Context specific reasoning and planning.
Interface
Providing structured reports.