February 1, 2025
Artificial Intelligence (AI) has shown significant promise in assisting with the diagnosis of complex women’s health conditions, such as endometriosis. However, the success of these AI tools depends heavily on the quality of the data they are trained on. A key concern in the development of AI diagnostic systems is how to ensure that low-quality or inaccurate images do not negatively impact the effectiveness of these technologies.
A new study led by Ph.D. student Alison Deslandes from the IMAGENDO team at the University of Adelaide’s Robinson Research Institute has taken an important step toward addressing this issue. Deslandes has developed a quality scoring system for gynecological images used by diagnostic AI algorithms. The research, published in the journal Ultrasound in Obstetrics & Gynecology, sheds light on the critical importance of high-quality data for AI tools, particularly when dealing with ultrasound images of the uterus and ovaries.
“The development of valuable AI tools to assist with ultrasound diagnosis is dependent on algorithms being developed with high-quality data,” said Deslandes. “Research has already shown that the performance of deep learning systems is significantly reduced when applied to images from low-cost ultrasound machines with lower image quality.”
To tackle this challenge, the team conducted a detailed analysis of transvaginal ultrasound (TVUS) images, focusing on the quality of images used for training AI models. A set of 150 TVUS images was scored by six professionals across five key quality factors: the correct depiction of anatomy, the view of anatomical structures, image optimization, the image’s ability to be interpreted for diagnosis or pathology, and overall clarity. Images were rated on a scale from one to four, with a score of four given to optimal quality images and one for those that were rejected due to poor quality.
While the scoring system developed by Deslandes and her team holds promise, the study revealed some challenges in achieving consensus on image quality. “What we found was only poor to moderate agreement when our paired observers looked at the images, and mostly weak to moderate levels when the individual observers re-examined the images after more than a week,” Deslandes explained. “Interpreting ultrasound image quality carries a level of subjectivity, which may explain the weaker-than-expected results.”
This subjectivity poses a significant challenge for AI systems, which require consistent, high-quality data to perform accurately. AI is expected to assess image quality more objectively than humans, but the development of these systems still relies on human labeling, which is susceptible to noisy data due to the subjective nature of ultrasound image assessment.
Deslandes emphasizes the need for reliable image quality scoring methods to ensure progress in the development of AI systems for gynecological diagnostics. “Reliable methods of image quality scoring are essential to further the development of AI systems in gynecological ultrasound,” she said, adding that AI systems capable of accommodating noisy labeling are also crucial for ensuring more accurate diagnoses.
This research points to a future where AI can play a transformative role in women’s health, but it also highlights the ongoing need for innovation in both image quality assessment and AI model development.
Disclaimer: This study has been published in Ultrasound in Obstetrics & Gynecology. While it provides valuable insights into the potential of AI in gynecological diagnostics, the findings of this research are based on preliminary data, and further studies are needed to refine the proposed image quality scoring system and its application in real-world clinical settings.