Evgeny Nikitin, the Head of AI at Celsus AI, discusses the latest advancements in machine learning for medical applications, data processing techniques, and the growing market for medical image analysis.
Artificial intelligence is increasingly being used to analyze medical images, with services that gained popularity during the COVID-19 pandemic now being supplemented with tools for detecting cancer and being used in emergency medicine. Last year, Celsus developed a system for analyzing hemorrhagic strokes based on brain CT scans, which is already in use in Moscow clinics and has analyzed over 100,000 images. Evgeny Nikitin, the Head of AI at Celsus, discusses the implementation of this project and its position among the company's other developments.
— Last year, you took part in the Data Award and discussed your methods for working with data, constructing data annotation pipelines, and developing two machine learning models for analyzing mammographic and chest CT scan results. What progress has been made since then?
— Regarding the business side of things, last year was a challenging year due to the political climate. Foreign markets closed, and in addition, many skilled IT professionals left the country, while some foreign IT services and tools had to be replaced or abandoned. Despite these challenges, we were able to maintain our position in the Moscow experiment, with our four AI services (Mammography, Chest X-Ray, Chest CT, and Brain CT) consistently ranking at the top. Towards the end of the year, we achieved a significant milestone for both our company and the industry as a whole: our AI-powered software for analyzing mammography exams was purchased using a special fee under compulsory medical insurance (OMS). We have already begun processing exams under this contract
In terms of working with data, the highlight for me this year was the launch of the data platform team. Its primary objective is to reduce the cognitive load on our machine learning specialists and simplify data analysis and model training pipelines. Among the new team's accomplishments are the automation of manual data movement processes, the creation of several internal services that enhance the speed and convenience of working with medical data, and the standardization of the annotation storage format.
— You mentioned your efforts to optimize the image annotation process, emphasizing the crucial role of the human factor. What progress has been made in this direction?
— I would point out three main directions. Firstly, as I previously mentioned, most manual operations have been automated. This includes uploading selected exams to the data labeling platform, running data tests, and uploading the verified data to ML command databases. These improvements have not only reduced the annotated data delivery time, but also decreased the number of errors and data issues by a factor of two.
Secondly, we worked out a case for purchasing pre-annotated datasets last year. While we take pride in the annotation team we have built, it is sometimes more cost-effective to buy off-the-shelf data. However, we had to navigate some challenges, such as the blind data selection process and legal issues. Nonetheless, we are pleased with the outcome.
Lastly, over the past year, we have conducted various experiments involving data annotated by multiple radiologists. This type of annotation is used to mitigate the impact of the human factor. We are scheduled to give a talk on this topic at the upcoming OpenTalks.AI conference in March. Looking ahead, we have made good progress using an approach that models the annotation style of each radiologist. Essentially, during model training, we provide the neural network with information about which doctor was responsible for a given annotation, allowing the model to learn each doctor's unique style.
— When selecting new modalities to work on, what factors do you consider? Do you prioritize modalities that are in high demand or those that pose the greatest technical challenges? Or do you choose modalities based on data availability?
— When selecting new modalities to work on, we take a systematic approach and consider various factors. These include technical considerations such as data availability, necessary skills, hardware and processing speed requirements, and quality metrics. We evaluate whether the needed data is publicly available or can be obtained elsewhere, as well as how long the development process might take.
The second group of factors includes product and clinical considerations, such as market demand, potential use cases, key users, annual number of exams for this modality, disease mortality rates, and potential for early detection. We also consider business factors, including potential customers or partners, competitors, and funding opportunities. Lastly, we take into account whether the project is exciting to us, as we prefer to work on interesting and engaging projects.
While we may be working on 5-10 hypotheses, most of them don't survive past the research stage, while others die after the MVP appears. The last surviving hypotheses are the ones that go into commercial operation.
— What is the main difference between analyzing brain CT scans compared to, for example, mammographic and chest X-ray images?
— The primary distinction between CT and X-ray is that CT provides 3D data. As a result, developing neural network architectures, processing data, meeting hardware requirements, and annotating data are all more complex. Annotating a CT volume is typically ten times more time-consuming and costly than annotating an X-ray image.
Moreover, each medical domain has its unique features. For example, the brain is relatively symmetrical as a whole, and this information can be utilized to design effective ML systems. There are also differences in application scenarios - stroke detection is most often related to emergency medicine, which imposes additional requirements on the system reliability and speed.
— How was the work to develop a system for analyzing haemorrhagic strokes carried out?
— Well, the process of developing and launching new AI products is something that we are very familiar with. We conducted preliminary research on the possible scenarios, clients, available datasets and launched the annotation process. Of course, we also carefully analyzed all diagnostic requirements of the Moscow experiment. The first version was ready in a few months - and we immediately went through all the necessary procedures and started working in Moscow clinics.
— What results did you eventually achieve?
— According to the last external calibration test results in the Moscow experiment, the system for detecting signs of intracranial hemorrhage achieved ROC-AUC of 0.98, which is almost a perfect result. So far, our system has already analyzed more than 100,000 CT scans.
Sure, there is still room for improvement - for example, the system does not calculate the subarachnoid hemorrhage volume very accurately. This year we will also be adding two new pathologies to the system - ischaemic strokes and brain tumors.
— How many mosmed.ai users have chosen your solutions? What is the feedback?
— According to the Moscow radiologists latest survey results (it took place in December 2022), our systems were chosen by 51 organizations for analyzing brain CT scans, 15 - for mammography, 10 - for chest X-rays and 19 for fluorography. Thanks to the experiment organizers - we have feedback from the different sources: a Telegram chat with doctors, a monthly clinical evaluation procedure, and face-to-face meetings with doctors.
Every year, more and more doctors are willing to integrate artificial intelligence systems into their practice. Although these systems are not yet perfect, radiologists are finding ways to use them to their advantage. For example, some doctors are employing our system for mammography scans to prevent overlooking poorly visible groups of malignant calcifications.
— What is your business model?
— The market for artificial intelligence in radiology is still in its early stages, and us and our competitors are constantly exploring various ways to monetize our services. Currently, three of the most relevant scenarios can be identified.
Firstly, our AI system can be monetized through payment by OMS (compulsory medical insurance in Russia). In this scenario, the AI system acts as a "second opinion" or decision-making aid system, and payment for its "services" is included in the OMS tariff.
Secondly, the AI system can automatically identify scans where there is a very high degree of certainty (over 99.9%) that there is no pathology. A report is generated for such examinations, which the doctor simply has to approve. We charge a fee for each report generated.
Finally, it's possible to retrospectively analyze large databases of radiology scans. For example, during the pandemic, a significant number of chest CT scans were performed. AI systems can analyze these studies for other types of pathologies, such as cancer, at a relatively low cost.
Although the Moscow experiment still generates the majority of our revenue, we expect the situation to change dramatically in 2023 as we explore new monetization strategies and expand our customer base.
— You are not alone in the market; medical image analysis has become quite a popular research area. What is the competitive environment like? What do you see as your strengths?
— While there are numerous companies in the AI medical systems market, it seems that a consolidation may be on the horizon, with only the clear leaders and promising newcomers likely to remain. Fortunately, our systems' quality and accuracy are our main advantages. This is evidenced by the results of the Moscow experiment and the feedback we have received from medical professionals in other regions. Our investment in our ML team, development team, and data annotation has enabled us to achieve this level of excellence.
Another significant advantage is our well-established process of hypothesis testing and launching new projects. We have a high degree of confidence in the success of new products that we launch. This is due, in part, to the well-established mechanisms we have in place for collecting and prioritizing user feedback. This enables us to make changes to the product quickly, to ensure it is as useful as possible for medical professionals and organizations. This process applies not only to improving the models' quality and reducing their errors but also to the wording used in the text reports, for instance.
— You have talked about entering regional markets and even about plans to work in the other countries. Is it working out?
— One of our major goals is to expand our regional presence, and we started working on this last year. Currently, the Celsus solutions are being used in over 30 regions in Russia, including both implemented solutions and pilot projects.
Although the situation with foreign markets has become more complicated due to obvious reasons, our commercial service is actively preparing for implementation in several countries, such as the UAE, Egypt, and Morocco.
— In which direction do you plan to continue your work?
— Our main commercial objective for the year is to expand the geographical reach of our solutions and close several major regional implementation deals. As the company grows, we are generating many new ideas, including ones outside of the radiology field, and we need funding to implement them.
We are committed to continuously enhancing the functionality of our services. For example, our Chest CT system is already capable of detecting eight abnormalities, our mammography solution can assess not only the probability of cancer risk but also the quality of lab technician work, and our brain CT model will soon be able to detect three types of abnormalities.
Our main technical goal remains the same, which is to improve our systems. We understand that achieving 100% quality and readiness for such systems is an impossible task, but every incremental improvement brings us closer to a future in which doctors cannot imagine their work without the assistance of AI.