All news

Artificial intelligence, medicine and creativity. Interview with a data scientist at a med tech startup


Artificial intelligence technologies are one of the most promising areas in the world of information technology. Thanks to the complex architecture of neural networks and the ability to process huge amounts of data, artificial intelligence can be applied (and is already being used) in various spheres of human life, performing tasks that were previously considered a human prerogative. And what about the healthcare sector? How is artificial intelligence being developed and applied there?

Since 2017, our Celsus team has been developing and implementing artificial intelligence systems that help radiologists more accurately and quickly identify pathologies in medical images. We decided to interview our colleague Masha Garets, who has been working in Celsus as a data scientist for three years. And she told us what it's like to work in the medical industry as a technical specialist, and why this industry is so unusual.

Tell us in a few words what you are doing in the project.

I am an ML engineer, data scientist and team leader. There are four people in my team we are working on the direction of fluorography and x-ray of the lungs in Celsus. The goal of our work is to give the doctor an additional (second) opinion about the patient’s diagnosis.

The process looks as follows: we take an x-ray image, process it using the algorithms, and highlight the “zones of interest” – the areas that the doctor should take a closer look at, as they may contain malignant neoplasms or precancerosis. Our product is a “helping hand” to the doctor, which, we hope, makes his work easier.

I perform two roles in this process: as a team leader, I plan work and interaction in a team, and as an ML engineer, I create and train algorithms.

Is there an element of creativity in your work?

There are a lot of non-trivial tasks - both in terms of organizational aspects and technical ones.

Our work involves testing many hypotheses. We don't know which hypotheses will increase product metrics and which won't. In such conditions, we cannot accurately predict the time - therefore, standard methodologies by which IT companies work are not suitable for us. We have made a selection of best practices from different methodologies and adapted this “hybrid” for ML development.

In technical tasks, there is probably even more space for creativity. I remember how we were turning fluorographic pictures into x-rays. It was more than a year ago, our service worked only with fluorography, and we were offered to add functionality for processing x-ray images. As it usually happens, we didn’t have a labeled dataset with radiographic data for training the neural network. We analyzed the images and noticed that the fluorographic data were very similar to X-rays, but only in “poor” quality, less clear.

We came up with the idea to turn fluorography into radiography and use the existing dataset, which had pathology markings. We used a neural network that translates one domain into another: for example, it can turn a zebra into a horse and vice versa. Our experiment showed that, indeed, fluorographic studies became clearer, but a complete transition to another domain still didn’t work.

What is the difference between the specifics of working with medical data and any other data?

Let’s start with the fact that medical data is very difficult and painful. Radiography and fluorography are images that don’t clearly show objects inside the lungs, because they are two-dimensional images that display the lungs in one plane.

It is often difficult for doctors to unambiguously make a diagnosis based on a picture without additional information about the patient: age, gender, body temperature, history of previous diseases, and so on. This leads to a conflict of opinions between doctors. For example, if we ask five physicians to label the same image, it is likely that we will not get full agreement on labeled pathologies.

In the future, we, ML engineers, will have to figure out how to use the experience of all doctors when developing products, as well as eliminate the human factor (for example, the doctor is tired and marked the pathology with the wrong class).

An additional challenge in the medical field is model testing. It is very difficult for us, ML specialists, without medical education to conduct visual testing of neural networks. Therefore, we try to interact with doctors as often as possible, arrange joint sessions and receive feedback on the work of our service.

Has your opinion about doctors changed in the course of work?

There is a stereotype: if people weren’t rude with you at the clinic, then you have not visited the clinic. My experience with doctors completely refutes this stereotype. Usually, these are very polite and tactful people who patiently answer all our numerous questions. And doctors are not at all against technological development in medicine. On the contrary, they are moving it forward, collaborating with developers.

Did you get any kind of professional deformation due to the fact that your work is related to medicine?

Yes! For three years of work with lung images (fluorography and x-rays), I have seen the most terrible stages of scoliosis, which are really frightening. Therefore, every time I catch myself slouching, I instantly straighten my back. And I don’t smoke either.

In general, before working in Celsus, I thought less about cancer, I didn’t know the statistics of diseases among women and men. Now I know that breast cancer is the most common among women and that once a year women need to have breast checks (up to 40 years old – with ultrasound, after 40 – with mammography). This is important to remember and should not be neglected.

Is mission important to you in your work?

Yes. I have a global goal: for Celsus to be a convenient and high-quality service, so high-quality as to replace a doctor. It's a joke. Of course, no AI service can replace a doctor. I would like Celsus to be a worthy assistant, whose opinion the doctor will listen to, which he will use not because he was forced to do it, but because it really helps. And in such a collaboration - Doctor-Celsus – we will not miss a single malignant neoplasm.

What is the biggest obstacle to achieving this result? Technical nuances or conservatism of the medical field, ignorance?

Regarding business specifics, marketing and other things, I cannot say: this is not my area of competence. But we are participating in an experiment by the Moscow Department of Health, where AI services are used in real clinical practice. So there is interest in the use of such technologies in radiology.

The reason we can't make a perfect product right now is because it’s hard for us to get consistent, verified pathology markup for training neural networks. Different doctors often have different opinions about the same study. Once we gave the data for labeling to five doctors and received five different opinions. Because of this, in the final markup, we get some mix, a New Year tree from pathologies.

It seems to me that it is the lack of “ground truth” (true meaning) in medical images that prevents a super-accurate product from being made. Therefore, the dream of any developer of medical AI is the so-called “golden dataset”, that is, a reference data set.

What advice can you give to those who are starting or are just planning to start their journey in Data Science?

To begin with, it would be nice to make sure that you are really interested in machine learning (or you were brought there on a wave of hype, because you really wanted to earn 300 thousand rubles per nanosecond, but in fact you generally dream of being an artist). To do this, at least you need to study what machine learning is, how it works, what tasks it solves.

If you are ready to dive deep into machine learning and take courses in Python, statistics, calculus, machine learning and more, I would advise you not to forget to pay a lot of attention to practice, do Pet-projects (personal project for the soul). They allow you to get real work experience, go through the entire life cycle of the product. In Pet-projects, you are not limited by anything, you can come up with the craziest idea and implement it.

If you are ready to dive deep into machine learning and take courses in Python, statistics, calculus, machine learning and more, I would advise you not to forget to pay a lot of attention to practice, do Pet-projects (personal project for the soul). They allow you to get real work experience, go through the entire life cycle of the product. In Pet-projects, you are not limited by anything, you can come up with the craziest idea and implement it.

There were also Pet-projects in my life. I studied language models, recurrent networks, and, of course, I wanted to try them in action. I set myself a goal: to generate the lyrics of a song using a recurrent model trained on all the songs of the Black Star Mafia label. I had ambitious plans - I even agreed with fellow musicians that they would play this song at their concert. But, as it turned out, Black Star Mafia doesn't have that many songs, and there wasn't enough data to train the model. As a result, a complete nonsense came out, but I gained important experience: I learned to set tasks for myself, do an analysis of the subject area, plan work, test the model.

When I was interviewed in Celsus, I had no experience in ML/DL, but it was Pet-projects that helped me pass the interview successfully.