MLOps-enhanced generative AI pipelines evaluation, boosting underwriting accuracy by 20%
We recently partnered with a leading provider of risk assessment solutions for insurance underwriters. Our team worked on generative AI pipeline evaluation and enhanced the accuracy, efficiency, and fairness of their existing Generative AI solution, which automates various aspects of insurance underwriting.

Addressing the Challenges of Gen AI for Risk Assessment
Gen AI offers immense potential for automating and improving risk assessment processes. However, inherent biases and the potential for “hallucinations” (generating irrelevant or misleading information) can hinder its effectiveness. To address these challenges, we implemented the following strategies
Pipeline Replication and Customization
To ensure seamless integration with the existing production pipeline, we replicated the original workflow. We then developed a custom workflow tailored to the specific needs of insurance underwriting, focusing on key functions. These modifications aimed to improve the overall accuracy and efficiency of the AI output.
Performance Tracking and Evaluation System
We integrated a robust tool for experiment tracking and logging, enabling comprehensive evaluation of AI performance. This provided valuable insights into the behavior and effectiveness of the AI models.
Bias Mitigation and Transparency
We implemented robust mechanisms to address the issue of bias and hallucinations within the Gen AI pipeline. These measures aimed to ensure that the AI outputs were fair, unbiased, and free from inaccuracies. Additionally, we provided detailed logging and monitoring capabilities to maintain transparency and accountability. This allowed for thorough generative AI pipeline evaluation and tracking of the AI models’ performance, facilitating ongoing improvements and adjustments.
Custom Metrics
To assess AI performance more accurately, we introduced new prompt-based metrics. These metrics, including the “Answer Relevancy Score,” “Faithfulness Score,” and “Document Relevancy Score,” provided specific insights into the LLM’s performance and retrieval models. This data was instrumental in refining and enhancing the system’s accuracy.
The Result: A More Robust and Reliable Risk Assessment System

Custom metrics provided valuable insights, leading to a 15% improvement in AI output accuracy and system reliability.

Provided detailed logging and monitoring capabilities, allowing for ongoing evaluation and improvement of AI models, resulting in a 30% reduction in model errors.

The API allowed the team to manage data logging efficiently, ensuring better control over the evaluation process, and increasing 10% data accuracy.

The improved Gen AI pipeline contributed to a revenue of over $20M.

Significantly decreased the occurrence of biased and hallucinated outputs, enhancing the reliability of the Gen AI pipeline.