Client Background
A leading CRO delivering real-world evidence and clinical informatics solutions
Our client is a global CRO focused on generating high-quality real-world evidence (RWE) to support clinical, regulatory, and commercial decision-making for life sciences companies. As part of a broader evidence-generation initiative, the CRO partnered with a medical device manufacturer seeking to better understand real-world disease patterns and procedural contexts relevant to revascularization therapies. The objective was to derive clinically meaningful insights from large volumes of de-identified real-world clinical data, specifically to understand disease presence and severity patterns that influence procedural decision-making and product use.
Challenge
Extracting disease severity insights from de-identified, unstructured data
A key challenge was characterizing obstructive Coronary Artery Disease (CAD) and its severity using real-world data. Obstructive CAD is rarely captured as a standardized field in EHRs and often lacks consistent diagnostic coding. Instead, critical details—such as lesion presence and extent, vessels involved, anatomical location, degree of stenosis, and prior interventions (e.g., stents or grafts)—are embedded in unstructured cath lab and angiography reports. These insights are spread across multiple reports, inconsistently documented, and buried in free-text narratives. Working with de-identified unstructured data required systematic extraction, standardization, and aggregation while preserving clinical meaning and ensuring data privacy.
Healthark’s role
Transformed de-identified unstructured RWD into structured, analyzable clinical evidence
Healthark collaborated with the client to design and implement a clinically informed analytics framework to extract and structure disease severity insights from de-identified unstructured cardiology reports.
- Clinical Expert–Led Feature Definition We collaborated with cardiology subject-matter experts to define clinically relevant indicators of disease severity. This involved identifying key features such as lesion presence, number of affected vessels, anatomical context, degree of stenosis, and history of revascularization.
- Cross-Institutional Data Understanding We reviewed cath and angiogram reports from multiple data sources to thoroughly understand the variability in documentation styles and terminology. This step ensured that the solution could generalize effectively across diverse real-world data sources.
- Evidence Structuring We applied natural language processing (NLP) techniques to de-identified free-text reports and extracted clinically meaningful attributes while carefully accounting for linguistic variation and reporting nuances. We then developed rule-based clinical logic to translate the extracted features into structured severity categories, incorporating clinical thresholds (e.g., percentage stenosis), anatomical factors, and procedural history to enable robust stratification.
- Iterative Validation Using De-identified Data We refined and tested the logic across multiple validation cycles using de-identified data. This iterative process ensured consistency, accuracy, and full clinical interpretability of the results.
Empowering Tomorrow's Healthcare
This case study demonstrates how high-quality evidence was generated with de-identified patient data to support clinical and commercial decision-making.
Want to learn more about Healthark’s expertise in Real-World Evidence? Explore our website or contact us today!
