27 October 2020
Econ 4.0: Why bother with data science?
This article first appeared in The Edge Malaysia Weekly, on October 19, 2020 – October 25, 2020.
Here is a statistical short story. Three data scientists are planning to go on a hunting trip but are unable to agree on which animal they should hunt. “Let’s toss a coin,” suggests one data scientist. “If it’s heads, we shoot a deer; if it’s tails, we go kill a quail.”
They toss a coin, and it lands on its rim. “That’s a statistical improbability,” exclaims the second data scientist. “This means we either have to hunt a deer or a quail, or both, or neither. Let’s grab our guns and head out to the forest and have some fun!”
They drive to the forest and spot a wild boar. “The boar is out of scope, so let it go,” says the third data scientist. “Remember, we have to wait for a deer or a quail, or both, or neither.”
Soon, they spot a large deer coming towards them from a distance. The first data scientist fires his gun and the bullet darts past the deer’s left. As the animal advances, the second data scientist fires his rifle and the bullet whizzes past the deer’s right. The deer charges at them with its massive antlers, tosses the three men and gallops away.
“The deer was in scope. Why didn’t you fire?” the two ask the third man. “I was calculating the statistical outcome,” he says, wiping his wounds. “You missed one metre on the left, and you missed one metre on the right. But on average, we got it!”
If that story makes you smile, you are in good company. Data scientists are in demand for their focus and training. They combine the strength of statistics with the rigour of analytics to come up with data-derived, statistically relevant and logically predictable outcomes. In other words, they triangulate the ABCs of data — artificial intelligence (AI), big data and cloud computing.
What exactly is data science? It is an interdisciplinary field that uses scientific methods, logical processes, algorithms and systems to extract knowledge and insights from data. Data scientists use a mixture of data mining, data cleansing, machine learning (ML) and big data analytics to gain insights from data.
The term was first used in 1974, when Danish computer pioneer Peter Naur proposed it as an alternative name for computer science. In 1996, the International Federation of Classification Societies became the first conference to feature data science as a topic specifically. Prof Naur suggested that the science of data be called “datalogy” or data science. The field is now witnessing a boom with the emergence of AI.
Here is an example of how data science is used in conjunction with AI. While Covid-19 has grabbed the world’s attention, there is another insidious disease that infects an estimated 10 million people every year and kills 1.5 million (a higher mortality rate than Covid-19). That disease is tuberculosis (TB).
“Worldwide, TB is one of the top 10 causes of death and the leading cause from a single infectious agent,” reports the World Health Organization (WHO). “In 2018, an estimated 10 million people fell ill with TB worldwide — 5.7 million men, 3.2 million women and 1.1 million children. There were cases in all countries and age groups. A total of 1.5 million people died from TB that year.”
Given that TB can spread through saliva droplets in the air, the risk of a community getting infected is relatively high. No wonder then that about 25% of the world’s population is thought to be infected with latent TB. The good news is that TB can be managed, treated and cured if diagnosed early.
Pulmonary TB can be diagnosed through a chest X-ray. But most countries are short of radiologists, even in Asia. Take Thailand. It is one of the 14 countries most badly hit by TB, with 108,000 cases every year. The country has a shortage of radiologists, which means smaller hospitals have to send the X-ray images to the more equipped ones for diagnosis. Patients in rural areas may have to wait a week or more for their primary physician to receive the X-ray results. Only then can treatment begin.
This is where data science and AI are making a difference. Internet Thailand pcl (INET) has developed CXR, an AI-enabled app, to screen chest X-rays for rapid TB diagnosis. The AI system was trained using 40,000 chest X-rays, with input from radiologists and physicians. The result: CXR can detect the presence of TB with 96% accuracy. INET used IBM Visual Insights — a visual recognition solution running on IBM Power Systems — to create graphical models. The system can detect objects in chest X-ray images without requiring coding.
“We combined the strengths of INET and IBM, as well as the domain expertise of radiologists, physicians and experts from hospitals across Thailand, successfully. This collaboration represents the next frontier of a more accessible healthcare landscape in Thailand, made possible by disruptive technology,” says INET managing director Morragot Kulatumyotin.
Similar collaborations — between technology companies, healthcare organisations and governments — are occurring to find vaccines or a cure for Covid-19, a pandemic that has caused pandemonium worldwide. At the core are two transformational methodologies: data science and AI.
AI has the potential to add up to US$16 trillion to the global economy by 2030. For example, a global AI survey commissioned at end-2019 estimated that global AI adoption stood at 34% — dramatically higher than most had predicted. AI will be a vital tool to fight the consequences of Covid-19.
Gartner Inc has coined the term “X analytics”, where X is the data variable for a range of different structured and unstructured content such as text analytics, video analytics and audio analytics. Amid the Covid-19 pandemic, AI has been utilised to comb through thousands of research papers, news sources, social media posts and clinical trials data. The aim is to help public health experts predict how the disease will spread, plan capacity, find new treatments and identify vulnerable populations.
“X analytics — combined with AI and other techniques such as graph analytics — will play a key role in identifying, predicting and planning for natural disasters and other crises in the future. To innovate their way beyond a post-Covid-19 world, data and analytics leaders require an ever-increasing velocity and scale of analysis in terms of processing and access to succeed in the face of unprecedented market shifts,” says Rita Sallam, distinguished research vice-president at Gartner.
Data scientists are already using ML and AI techniques to optimise and improve operations. Gartner calls this augmented data management. “Augmented data management products can examine large samples of operational data, including actual queries, performance data and schemas,” says Sallam.
“By using the existing usage and workload data, an augmented engine can tune operations and optimise configuration, security and performance. It also converts metadata from being used in auditing, lineage and reporting to powering dynamic systems.”
ABC of D
AI, big data and cloud computing are the ABC of D (data). Governments and businesses could invest US$98 billion on AI-related solutions and services by 2023 — up 250% from the US$37.5 billion they will spend this year — according to International Data Corp (IDC). That is a compound annual growth rate (CAGR) of 28.4% between 2018 and 2023.
“The use of AI and ML is occurring in a wide range of solutions and applications, from ERP (enterprise resource planning) and manufacturing software to content management, collaboration and user productivity. AI and ML are top of mind for most organisations today. We expect AI to be the disrupting influence that will change entire industries over the next decade,” says IDC research director for cognitive and AI systems David Schubmehl.
Big data refers to making sense of the vast amount of data, both structured (in databases) and unstructured (in photographs, email and social media). Malaysia’s market for big data software solutions alone is set to top RM595 million by 2021, up 10.9% from RM433.7 million in 2016 — a CAGR of 8.8%.
“By using analytics capabilities, organisations can run their business operations better, improve overall customer experience and expand their product and service offerings by increasing their competitive advantage. Organisations in Malaysia have started to prioritise analytics solutions. C-level executives are looking for intelligent solutions to convert raw data into meaningful information to make strategic decisions in their businesses,” says Ng Quan Xiong, software market analyst for Malaysia at IDC.
Cloud adoption in Asean is still slow, but it has enormous potential to grow, especially if small and medium enterprises (SMEs) jump at this opportunity. That is one reason the market for cloud-related services is set to grow at 24% a year for the next five years.
“The pace of change in Malaysia is a tad slow because of the inherent preference for traditional and on-premise IT architecture and services sourcing models,” says Sherrel Roche, senior market analyst at IDC Asia-Pacific.
SMEs are the backbone of the Asean economy. Cloud computing could level the tech playing field for them.
The ABC of D can come together for Malaysia in two vital sectors — manufacturing and services. The services sector employs 62% of the country’s workforce while the manufacturing sector employs 17%. But then, manufacturing is a significant component of Malaysia’s economy and contributes about 23% to its GDP.
Up to 98% of companies in the manufacturing sector are SMEs. Naturally, services employ more people, since services need a substantial human component, while manufacturing requires a significant capital outlay.
McKinsey says that by 2030, automation could displace up to 25% of hours (equivalent to about 4.5 million workers) in Malaysia. Yet, the country’s job outlook is ultimately promising as the job losses will be more than offset by the demand for new skills and labour. Three factors will be at play: First, rising consumer incomes and their impact on consumer goods; second, increased spending on education; and third, an ageing population that will create demand for doctors, nurses and a range of service-related roles.
Data science will underline the bottom line. Data is a set of raw, unorganised facts that must be processed to be useful. Data is seemingly random and useless until it is captured, cleaned, collated, organised and analysed. This transformation, from information to insights, is at the core of data science. It is this core that all governments and businesses need to encourage and cherish to compete in the digital economy.
Raju Chellam is vice-president of new technologies at Fusionex International, Asia’s leading big data analytics company