Classifier based on 3D vector analysis (continued) and search engine. This work is a continuation of all my previous work on this topic. In it, I continue to analyze the possibilities of classifying of complex images, such as ultrasound images in particular. In some way, all the objects in this world are similar. We can ask different people about the similarity of the same objects and hear the different opinions. When we talk about the computer, this does not make any sense to ask it a similar question at all, because a people uses some kind of irrational extrapolation of data. But computer is not. This is due to the complexity of the human brain, which can be roughly divided into the conscious and subconscious, and if complicated, there can be several consciousnesses, this can be quite clearly observed in children up to 4-5 years old, when they can easily switch from one mood to another for a split second - a second ago, baby was crying, and after a second he or she was already smiling. For an adult, mentally healthy person, this is problematic, since he has a dominant consciousness, the rest are suppressed. The subconscious is an extremely mysterious and unexplored part of our mind. It can be compared with the controlling authority, which constantly with you, monitors you and sometimes through some of your emotions gives you solutions, without arguments, appeals, sources of these data, any pro and contra - i.e. imperatively. The computer doesn't have it; I’m not talking about the pitiful attempts to simulate these processes called artificial intelligence or artificial neural networks. Yes, to a computer can also be given certain parameters for extrapolation, but the computer does not suffer with irrationality. Therefore, the specified parameters of the degree of similarity will always be known to the computer. Therefore, it is not necessary to ask the computer whether these or other objects are similar, ask as far as (%) these or other objects can be considered similar! Ultrasound images of cases of verified mammary gland lesions were selected for this publication from the site external link From two categories: benign lesions: external link and malignant: external link The files have been renamed, including the name includes their GET variables from the URL on the site - i.e. all the names of the files of benign images began with "295_" , and the malignant ones with "304_" . Further, these images were cropped to eliminate unnecessary objects (background), such as a black background on the sides and even just large areas of normal breast tissue around the lesions, as far as it was possible because of the shape of these lesions. Several images were excluded from the dataset, since they contained additional graphical information, such as arrows and measurement markers. In this paper, to the type of classification already announced in a previous publication - 1 - according to the percentage of matched vectors in the compared images; I added another type: 2 - by the percentage of the largest found region of interest (ROI) to the target image (or in other words, the template image). To determine the range of coincidence of images from this dataset, a step of 5% was chosen. According to the first type of classification in this dataset, can be find groups of similar images only within 15%. According to the second type of classification - up to 70% It is clear that the higher threshold percentage corresponds to the smaller group. And as the threshold decreases, the groups become larger, but less selective, i.e. both benign and malignant tumors get into them. So the upper thresholds can be considered more specific, and the lower more sensitive. The most interesting thing is that in the layout of the second type of classifier with a dataset of 24 images and with a threshold of 70%, I got the same tendency of the "tree" , which was in the previous version of the classifier, i.e. in 1 type with hemangiomas - 3 groups, one of 3 images, and 2 of 2 images. All three groups have 1 common image - "parent" - which links the remaining two either into 2 groups of 2 or into one group of 3. All three images are visually similar and all belong to the same category "304" , i.e. malignant (see attached file 1). Certainly, as the threshold is lowered, as it has already been described, specificity decreases, and sensitivity increases, which means that benign and malignant images can be found in one group - because all of them have the background of normal breast tissue, which unites them all. Further, on the basis of existing solutions, it was not difficult to implement what I mentioned in the very first publication. I.e. about the prototype of the search engine, like Google.Images, but to search for medical images on the base of the sample image. Imagine such a situation, experienced doctors hardly need it, of course, that the beginning ultrasound doctor finds some kind of lesion, for example, in the mammary gland, and does not know how to properly describe it, doubts. Of course, you can just go to the wonderful site ultrasoundcases.info, where there is a directory with cases and to see all of them, if you have time for this? I think this is the most correct decision! But suppose you do not have this time and want to get a quick answer to the eternal question - what is it? And here just such a search engine can come to the aid of a young doctor (see the attached file 2). It' s very simple as you can see! Just uploaded the image and got similar with the percentage (probability) of matching your findings to those already in the database! Moreover, such a database is easy to update - just add pictures with the necessary code marks in the file name to database directory on the hard disk drive! Very simple, but not quite! Because, as I wrote in previous articles, to use this algorithm in the full power, you need a very powerful computer with a large amount of RAM (terabytes) and a large hard disk capacity (petabytes or more), and moreover, a very fast hard disk drive such as M.2 or faster. Otherwise, technically, everything is simple! There remains one more, the most important component, not directly related to the operation of the algorithm, but having the most direct relation to its results - the database! I.e. the algorithm cannot work correctly without a database, namely, a set of morphologically verified images! I am grateful to the doctor Taco Geertsma for permission to use images from ultrasoundcases.info |