Unfortunately MDP(Markov decision process) ,which is core part of learning agent of autonomous car can’t help to develop inference system which can help to predict Covid19 positive patient based on symptoms or X-ray image of the patient. Offcourse we have multiple option’s (i.e. classification using logistic regression or can go for CNN) however both have dependency on the data..ohh.. NOT DATA …… It’s reliable and lots of data. As both are extension of Bayes theorem which in-fact quite old (aged more than a century) however solely dependent on large data which is not biased and fairly distributed to cover almost all possible relevant cases.
There is a strong reason that we are still not confident on Cancer prediction model to identify or it’s customized treatment (I am not from medical back ground but what I have understood that ML is still struggling to prove its beta hypothesis theory coz of complex genome structure of cancer- variant types.
Now lets talk about Covid 19. Yes – its not BATS, its Human who caused this world wide spread (based on some reliable information from multiple authentic publicly available source which I will use in my upcoming Covid – chapter). For now important point is: if we do have multiple options to build a prediction model which can remotely help to scan patient data or x-ray images and can detect and confirm Covid19 positive cases instead of traditional process of testing which is time consuming and not approachable everywhere. Why we are not able to build a module which is even closure to the accuracy. As alternative option, transfer learning can be leveraged and we can build a CNN to take X-ray data as input and predict the positive possibility of the patient.
I guess you got it. YES, we don’t have enough desired data. Since last two week I have used multiple search engines (public SE, Scholar SE , research SE) to search covid19 data set. There are plenty of data sets which are available with feature information, X-ray images as well. However the way Covid19 is mutating (or pretending by showing different symptoms after every few days) , I am not confident on the available limited data. Reason is reliability and data set size both.
As I have to move further, after struggling a bit, I was able to filter out small data set which I will use to build a prediction model. I will articulate all my exercises and process in upcoming chapters along with data set and result both. So now about my EDA part of data set. I have checked Kaggle, data.gov.in , arxiv.org, scholar.google.com, nih.gov , sciencedirect and few more. Few dataset was bit complex so I was bit afraid to put effort on that with dilemma. Most of the available dataset is not with the feature variables. Its with the status data (Country , date, confirmed cases etc) which is not helpful in this case. So to start with one step at a time , I filtered out one medium size data set of chest x-ray images of around 5500+ of two class (pneumonia) from Kaggle (https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia) which is at-least available to compare and segregate with limited X-ray data set of Covid19 due to identical symptoms. COVID-19 attacks the epithelial cells of our respiratory tract, hence I need Covid19 positive case X-ray image as well to analyze health of patient lungs. I found one at github(https://github.com/ieee8023/covid-chestxray-dataset) having a number of X-ray images. This repo is mix repo having X-ray images of MERS,SARS and others. Which can be used for initial training of the deep learning model for classification. As its not easy what we think , specially when we try to run IT based solution on the specialized medical area which is alien so far for everyone. So need to understand the data. Which means needs to understand the X-ray images to figure out :
- Understand the metadata to understand what data we need to understand
- Filter only Covid 19 case data.
- There are multiple view of the images (i.e. PA, L, AP, Axial, AP Supine, Coronal etc). Need to consider the right category of image. Here we need help from medical professional to clarify the category and select the rightmost or just google.
There are around 20+ X-Ray images of Covid19 positive. So THIS DATA IS NOT ENOUGH. Even just getting enough Image data is not enough to process. There are a lot of pre requites as well to remove any kind of biasness and convert all image with similar channel and size. There are many libraries which make all these operation easier(will use them going forward). So will go step by step.
Here I would like to close this chapter by setting above context about data and challenges. In my next chapter I will articulate about what feature consideration based on meta data, start with Neural net , selection of no of layers, data distribution, Image data transformation and way forward steps.
At the end I would like to grab your attention on the Chapter Name “Need of Master Algo” . Mostly all DS algorithms are connected/extended with each other in a way or another (I am biased and referring Naïve Bayes here). It is very well possible that one Master algorithms exists like Master key which can open any kind of lock in such critical situation when whole Man kind desperately need it .