This blog explores what Machine Learning (ML) is and it’s difference variations. We will cover the three types of ML and present real-life examples from the pharmaceutical industry of all three types. We will also cover the SAS Data Mapper Tool which is one of the ML algorithms. In addition to this, we will also touch base upon challenges of data science and the regulatory processes for approvals of AI/ML Products.
Before we dive into ML lets first define Data Science, Data science is a big umbrella covering each aspect of data processing and not only statistical or algorithmic aspects. Data science includes:
Machine learning is an application of artificial intelligence (AI) that essentially teaches a computer program or algorithm the ability to automatically learn a task and improve from experience without being explicitly programmed. It focuses on the development of computer programs that can access data and use it learn for themselves. Programmers need to examine and code accordingly so that a system can independently perform iterative improvements. Most commonly there are three types of ML; Unsupervised Learning, Supervised Learning and Reinforcement Learning.
Typically the ML Process consists of:
Unsupervised learning is the opposite of supervised learning in that the algorithm learns from itself and does not have pre-programmed labels. The algorithm has the ability to understand the data itself and then learns to group/cluster/organize the input data. This type of algorithm determines these patterns and restructures data into something else which could be a value, it is a useful type of ML in that it provides insights into data that perhaps humans analysis may miss or hasn’t be preassigned in the supervised learning algorithms.
The algorithm works in a similar way to how humans learn themselves, in the way that we identify certain objects or events from the same types or categories and determines a degree of similarity between these objects. It is a common algorithm used in marketing automation as one of the first successful use cases was Amazon and suggested products after analyzing previous purchasing history, or Netflix and YouTube and suggestions of what piece of content to watch next.
An area which is useful to medicines and medical research, is that its an excellent algorithm in research papers. For example there could be a large database of all the papers on a given subject and an unsupervised learning algorithm would know how to group different papers in such a way that it was always aware of progress being made in different fields of medicine. If the paper was connected to the network, once you start to write your paper the ML could suggest certain references you may want to cite or even other papers you may wish to review to help your own paper prove its hypothesis. Think how powerful this type of ML could be in a clinical trial setting and how important clinical data transparency would become as the data being shared from other drug companies could enable future drugs to become more successful if this data was transparent and in the public domain but also hocked into an unsupervised learning environment. This type of ML does not just have potential for a clinical trial perspective but also a drug discovery perspective and is being used by companies such as Benevolent AI which recently formed a partnership with AstraZenca.
However with this type of ML it is not supportive of future outcomes and predictions.
Illustration of Unsupervised Learning:
Spread of Zika Virus
Input data is entered of patients suffering from Zika virus from various locations of India.
Machine algorithm analyses the data and clusters the data based on coastal region patients and inland region patients.
Based on the clustering density, we can identify where the Zika virus has spread to the most and an awareness campaigns can be launched in the concerned regions.
This example illustrates that in unsupervised learning only clusters are formed, and we cannot use this type for prediction of future or outcome.
Real life example of Unsupervised Learning:
Supervised learning is the easier type of machine learning algorithm to understand and implement, and proves to be very popular. It has been described as the same type of learning as a teacher educating a small child with the use of learning cards.
For example, the algorithm learns from example data and each type of example data is given a numeric value or string labels such as classes or tags. Multiple data can be loaded into the algorithm which will later predict the correct response with new examples based on its historical learning and original input data as each example was given a label and the algorithm learnt the correct label for that input data.
Supervised learning is known as being task-oriented because it requires multiple simulations to further increase its own ability to correctly predict the never seen example and align to the correct label. It is continually learning from each new task performed. This type of ML resolves classification problems which is a qualitative variable being the desired output, for example think of the face recognition on Facebook when a photo is uploaded and it provides a suggestion to tag a friend as it has lots of historical tags of that face to a Facebook account. Then there are regression problems, whose target output is a numerical value. This could be an algorithm that determines the average house price based in certain areas because as more and more houses enter the market in that geographic location it has more input data with a certain labels based on certain geographic coordinates.
Illustration of Supervised Learning:
Lab tests for Anemia
An algorithm is trained about Hb level and corresponding output of either Anaemic or non-Anaemic based on labelled data.
Input data for patients with their Hb levels is fed into the algorithm.
The algorithm analyses the patient’s data with Step 1 inputs.
When new data is entered, machine recognizes the Hb level and generates report if patient is suffering from Anaemia or not.
Real life example of Supervised Learning:
When the new protocol is uploaded based on the trained data the AI will provide potential barriers and help in mitigating risk.
Reinforcement learning is when an algorithm is learning from its mistakes or reward based learning. It is similar to unsupervised learning where input data examples lack labels and it is up to the algorithm to assign/generate its own output value, however the difference occurs in that the algorithm has to make an output decision which is then graded as either positive or negative and has a consequences, this makes the end result a prescriptive response not just a descriptive response like supervised learning. When an outcome is positive it learns from this reward and attempts to recreate this approach, similarly a negative signal enables the algorithm to learn that a certain approach was incorrect and therefore will learn from this and try to continual improve. In a human perspective it is the process of trial and error.
Reinforcement learning has been trialed in algorithms being taught to play video games. Google’s DeepMind project created algorithms which were able to play old video games and if we were to take an example of Mario you could see how the AI had to be programmed to play a certain level and would learn from its mistakes. There would be reward signals of points being collected and the negatives would be losing lives by hitting enemies or falling down pits. Once the algorithm was shown the buttons to explore and interact in its environment, through repetition it would slowly increase in its ability and seek behaviors that generate rewards. In the Google Deepmind example, the AI originally started off slowly and clumsily, losing lives and receiving game overs until it became better and better at the game, mastered it and rivaled the best human players.
An illustration as an example from healthcare sector:
Diagnosis based on X-ray
We use a trained data labelled with correct diagnosis (Disease/ Normal) and onto this data the machine learning algorithm is built.
Now when we load the new x-ray image (data) on this system and based on past learning, the model predicts the condition of the patient.
Simultaneously the doctor also diagnosis's the patient condition by taking a look at the same x-ray and giving a feedback on “Correctly diagnosed by ML” or “Incorrectly diagnosed by ML”.
The feedback (or rewards) by doctor makes the algorithm better for future diagnosis to a point where doctor intervention would be minimum.
An example from Industry:
A company Brite Health leverages the use of machine learning to better manage patient engagement in clinical trials. This company uses apps for patients (or volunteers) and dashboards for site management.
The app and dashboard is trained on millions of clinical data points. This trained data points are engineered to identify key markers that tend to correlate with patient disengagement from research studies which notifies the user and it also informs us about the next scheduled task & site visit. This encourages patient engagement and prevents disengagement. At the site the dashboard receives any disengagement notification of all the enrolled patients and helps in monitoring them to avoid any minor or major violation.
The app system also provides personalize communication and study documents for reference through curative content and a conversational Chatbot.
So, this company uses supervised machine learning for patient engagement through App and Dashboard, and at the same time uses reinforcement learning through the Chatbot.
Another example of use of Machine Learning’s NLP technique is data mapping.
Mapping raw data to standards is one of the most challenging process in the healthcare industry. Reusing or reapplying the information collected during mapping processes from previously mapped studies and building upon that knowledge inference is the most important part of the mapping process. It is usual for mapping to be done to CDISC Standards as this is a requirement of regulatory bodies such as the FDA when submitting data for approvals of a new IND.
Auto-mapping and smart-mapping features in the tool, which are based on knowledge inference derived from machine learning algorithms, reduce time and effort for the user. This leads to improvements in quality, efficiency and consistency. This tool is user-friendly interface for everything from mapping raw data to generating SDTM standards (including domain templates) in CDISC. Natural Language Processing (NLP) is a technique implemented here to predict the mapping of new source data or variable based on the learn information from existing mapping trained on previous data or variables.
In this, we have a standard repository like SDTM, ADAM and other CDISC standard documents; study documents like specification, protocol etc; study data from different sources; SAS Program generator generates SAS program from mapping metadata; and libraries provides a place where mapping metadata are available on which machine learning algorithms can be applied to learn from information.
Machine algorithms can be applied to different type of metadata captured at Dataset, Variable and Value level. The screenshot below represents the model Similarity vs NGram similarity for Tables Mapping.
|Predictions: ['ae' 'cm' 'lb' 'fa' 'eg' 'ie'] Expected:['AE', 'CM', 'LB', 'FA', 'EG', 'IE']|
|Model_Matched_Term Model_Similarity NGram_Matched_Term NGram_Similarity|
It is obvious that the entire data capturing, handling, and analytics process needs to shift. In fact, the approach towards managing data is already changing. So much so, that the latest technology and approach are impacting the way organizations conduct their business.
When leveraged correctly, data yields insights that directly impact business growth. Any data analytics solution that can help organizations save money, increase the bottom line, and are cost effective while doing so, would be sure to find its way on that organization’s wish list.
The Regulatory process is majorly involved in drug approval, but with emerging use of AI in drug discovery it is prompting important question on
To address this question, European Patent Office’s (EPO) has taken an initiative by publishing a draft of its updated guidelines on patenting, which include a new section devoted to AI. Aligning to EPO, the US FDA has also actively developing regulatory framework to promote innovation in artificial intelligence for healthcare. The following are the development in regulatory framework:
Quanticate's statistical programming team have AI solutions to support our work and delivery to clients. If you have a need for these types of services please Submit a RFI and member of our Business Development team will be in touch with you shortly.
Address - UK HQ:
Address - US HQ: