Machine Learning: Harnessing the Predictive Power of Computers
Machine Learning: Harnessing the Predictive Power of Computers
Machine learning is everywhere. It has worked its way into our daily lives, from voice assistants like Siri and Alexa to traffic apps that guide us around gridlock, cars that drive themselves and news stories that pop up on our social media feeds. And there’s no end in sight to the potential applications of machine learning—in fraud protection, health care, the stock market and more. Researchers in the University of Maryland’s College of Computer, Mathematical, and Natural Sciences work at the forefront of machine learning technology, where computers analyze data to identify patterns and make decisions with minimal human intervention. These faculty members are using machine learning for applications that touch many aspects of our lives—from weather prediction and health care to transportation, finance and wildlife conservation. Along the way, they are advancing the science of exactly how computers learn. And they’re asking important questions about the impact of machine learning on our everyday lives and society itself.
Soheil Feizi: Building Defenses
Fighting Credit Card Fraud
Credit or debit? The shift from a cash economy to one reliant on electronic transactions has left many consumers feeling vulnerable to identity theft and bank fraud. And it’s no wonder—in 2018, the Federal Trade Commission received over 440,000 reports of identity theft, largely from stolen credit card and social security numbers.
For any consumer, that figure is concerning. But it represents only a tiny fraction of the 44.7 billion credit card transactions processed that same year, which makes fraud the proverbial needle in a haystack—nearly impossible to find.
Computer Science Assistant Professor Soheil Feizi and his collaborators at Capital One are counting on machine learning to address this problem. They are developing a system that can learn to identify fraud without relying on a large number of examples. One of the most common approaches to machine learning involves presenting a computer with lots of labeled examples of a specific thing and letting the computer learn to identify that thing. For example, a computer learns to recognize human faces by analyzing thousands of labeled images of human and non-human faces and finding the important features needed to distinguish a person from, say, a snowman or a smiley emoji.
The challenge for Feizi and his collaborators is that there are too few examples of fraudulent transactions to provide a reliable training dataset. So, rather than training the system to identify fraud, Feizi is developing a method that asks the machine to identify "normal." Then, the system can flag anything that doesn’t fit in. This approach, called "unsupervised learning," lets the machine learn how to find anomalies without being told what anomalies look like.
"The unsupervised model aims to characterize the underlying distribution of the normal data. Learning this distribution aids us in flagging anomalies," explained Feizi, who also holds a joint appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS).
Feizi’s method has proven effective on publicly available data. Now, he is working with Capital One to incorporate the company’s proprietary data and eventually deploy the method live on Capital One’s system to prevent fraudulent transactions.
Stopping Patch Attacks
Imagine a self-driving car that can’t recognize a stop sign—not because it can’t "see" it, but because someone slapped a three-inch strip of tape on it. Or a security system that can’t recognize a weapon concealed in a briefcase simply because someone applied a colorful sticker to the case. Known as patch attacks, such minor disturbances designed to scramble machine learning algorithms could pose significant threats to a world increasingly dependent on computers.
Some computer vision systems are notoriously vulnerable to patch attacks, which has led to an escalating cycle of defenses and counterattacks. Feizi is working to address these vulnerabilities.
"Robustness in machine learning models is an important focus of my lab," Feizi said. "We recently developed mathematically certifiable methods of defending against different types of adversarial attacks, including patch attacks."
Feizi and his team published their code and methods to reliably defend against patch attacks so that the machine learning community could use them and evaluate them. Although hackers will continue to find new ways to disrupt machine learning systems, shutting down patch attacks is a big win for these researchers.
Tom Goldstein: Revealing invisible Threats
Inventing an Invisibility Cloak
A few years ago, Tom Goldstein had an idea—what if he could make himself invisible? Not invisible to the average person, but invisible to a computer vision system designed to recognize objects like the human form. Such object detectors, which rely on machine learning, are essential for military surveillance, airport security systems and personnel safety programs on large construction sites. Not to mention that self-driving cars depend on these object recognition programs to avoid accidents.
Goldstein, an associate professor in computer science and UMIACS, wanted to know how easy these systems were to break. So he created what he calls an invisibility cloak, a sweatshirt that renders the wearer imperceptible to machine learning-based object detectors—in a sense, invisible.
Goldstein and his team used a machine learning algorithm to create patterns on the sweatshirt that confuse the most common object detectors. When they printed the patterns on paper and held them up to their chests, some of the patterns magically masked their bodies. One successful pattern looks similar to an impressionist painting of a gathering of people, though an actual painting of a similar scene doesn’t fool the computer at all. Another successful pattern looks like a slightly psychedelic version of camouflage.
Goldstein said he doesn’t know what makes one pattern work and another fail, but he doesn’t have to. He proved his point—visual recognition systems can be fooled.
"I mostly just want to understand what the limitations of the systems are," he said. And by understanding how to break them, researchers can learn how to better protect them from potential security threats.
Identifying Market Vulnerability
In the stock market, most trades are made by bots that use machine learning to predict market trends and strategically place buy and sell orders. That makes them a prime target for attacks, according to Goldstein. In the same way he used machine learning techniques to create patterns that confuse computer vision systems, he is using machine learning techniques to create strategic buy and sell orders that confuse the stock market prediction models that bots rely on.
In one example, using a mock stock market built with historical data, Goldstein’s machine learning algorithm learned to strategically sell stocks rising in price at just the right moment so that the market models mistakenly predicted the stock’s upward price momentum had stopped. If deployed in the actual stock market, Goldstein’s attack could have caused trading bots to buy and sell based on bad information. This type of attack could allow someone to buy more of a company’s stock before the price climbed higher. Or it could artificially suppress a company’s rising stock value.
Goldstein's research is the first work to demonstrate this type of vulnerability, but he's unsure if this type of attack has actually occurred because financial markets are not transparent.
"Whether or not firms are using these kinds of strategies to manipulate the market is very unclear," he said. "This whole industry is a black box and no one will tell you what they’re doing."
What’s more, it is unclear what legal framework computer-driven manipulation of the stock market would fall under. Most of the laws that regulate market manipulation are based on intent, and it's difficult to ascribe intent to a computer.
"If a complex computer system makes a buy-sell order, how can we say whether it was to profit off the sale, which would be legal, or manipulate other agents, which would be illegal?" Goldstein asked.
In many ways, society is in uncharted waters here, and that’s what motivates Goldstein.
"For me, this research is about awareness. If this is possible, it would be nice to have some sort of public discussion about it," he said. "Maybe we need a regulatory framework that makes it clear what sort of behaviors are allowed and what sort of behaviors aren’t, and we should be having these conversations."
Michelle Girvan: Creating Better Forecasts
If you’ve ever had your weekend plans soaked after a "clear and sunny" forecast, you know that predicting the future is, well, complicated.
Physics Professor Michelle Girvan is exploring how machine learning technology—with its ability to find patterns and make predictions about complex systems—can help improve such forecasts. And it’s not just weather forecasts that can benefit from Girvan's work. The methods she is developing can help predict anything—from sunspots and stock markets to the spread of infectious disease—as long as there is data about how those things have changed over time.
Her approach to better predictions combines machine learning technology with traditional mathematical models that are based on the knowledge of how something works. To understand how this hybrid approach could yield a more accurate forecast, Girvan applied the approach to simulated systems.
"We expected that the two methods combined would do better than either method individually, but the combined effect we saw was so much greater than we anticipated," said Girvan, who holds joint appointments in the Institute for Physical Science and Technology and the Institute for Research in Electronics and Applied Physics.
Traditional knowledge-based forecast models start with measurements of current and recent conditions. In the case of weather, that includes things like temperature, humidity, pressure and wind speed. Then, the models predict how conditions will change over time by applying known relationships between the variables. For weather, this means incorporating the laws of physics, like how fast heat rises and how much water the air can hold.
Machine learning models take a different approach. They crunch through massive amounts of current and historical data looking for patterns and then make predictions based on those patterns. In the case of weather, machine learning models forecast future conditions by assuming current conditions will progress by following the same patterns found in previous weather data. These models have no "knowledge" of the physics involved.
Girvan found that combining the two approaches resulted in faster and smarter forecasts, even when only limited data was available. This hybrid model could offer a significant improvement in detailed short-term predictions while also providing a picture of expected long-term behavior.
In addition, combining machine learning with other forecast models can help to identify strengths and weaknesses of each system. By analyzing the differences in how a mathematical model and an algorithm perform on their own and when coupled together, Girvan is gaining a better understanding of how the different systems work.
Girvan's forecast: predicting the future may soon be much easier thanks to machine learning.
David Jacobs: Filling in the Blanks
What if these same people could feel like they were actually in the same room, collaborating at the same table or watching the same television? What if trainers could walk around their virtual students during a Zoom fitness class? Would a more realistic social experience ease the feelings of isolation and distance that so many have felt during the COVID-19 pandemic?
David Jacobs, a professor in computer science and UMIACS, is developing technology that could one day provide just such realism through a virtual reality headset. Turning 2D images like those on a video screen into 3D reconstructions is a complex process. It involves distinguishing color shifts from shadows; anticipating contour, texture and depth; and filling in missing information that our brains automatically assume when we look at a picture. While artists spend hours creating simple 3D models, Jacobs is automating the process with machine learning.
"3D reconstruction of people, objects and natural scenes is difficult because images are created by a complex interplay of lighting, shapes and the materials things are made from," said Jacobs, who is also director of UMD’s Center for Machine Learning. "In real life, all these properties vary throughout the scene, so modeling and accurately capturing them from 2D images is a big challenge."
In addition to helping people maintain closeness during times of physical distancing, machine learning-based 3D rendering systems have countless other potential applications—like allowing doctors to noninvasively “see” internal organs or creating computer games and movies more easily and less expensively.
"Suppose I’m creating a game or a movie and I want a scene in the Parthenon," Jacobs explained. "Today, that would be an expensive project. Graphic designers would have to create the Parthenon by hand for a 3D game, or a movie production would require a live shoot on-site. But imagine the time and cost savings if I could just take some photos of the Parthenon and have a computer build an accurate 3D model."
Thanks to Jacobs’ research, we may soon see the world in a whole new dimension.
- The Center for Machine Learning: The center, launched in 2019 and led by David Jacobs, unifies and enhances the many machine learning activities underway at the University of Maryland.
John Dickerson: Improving Organ Exchanges
When a patient needs an organ transplant, the best-case scenario is to find someone—usually a family member or close friend—who has the same blood and tissue type as the patient and is willing to volunteer as an organ donor. The path to that best-case scenario isn’t as easy as it might seem, though. Say a woman in Chicago needs a kidney and her husband is willing to donate, but he is not a match for his wife. What if, by donating his kidney to another patient with a willing but incompatible donor in, say, Los Angeles, he could help his wife at the same time? That’s exactly how organ exchanges work. In this case, the husband in Chicago would donate his kidney to the patient in Los Angeles, and in return, his wife would get a kidney through the exchange from that patient’s paired donor who wasn’t a match for their loved one.
Over the past decade, organ exchange programs have saved thousands of lives by matching people who need organs with willing donors—typically total strangers. But the system is challenging. How should an organ exchange decide who to prioritize and by how much, when matching patients to donors? By health condition? By age? By proximity? Should someone’s lifestyle or ability to pay for the procedure affect that person’s prioritization in this central matching system? John Dickerson, an assistant professor in the Department of Computer Science and UMIACS, believes new ideas from machine learning, melded with known concepts from economics, can help organ exchanges answer those difficult questions and more.
Dickerson has been working with organ exchanges worldwide to develop machine learning systems that recommend an organ-matching policy that will best meet an exchange’s objectives and treat patients as fairly and equitably as possible.
“Let’s say an exchange’s objective is to maximize the number of people who are matched, and they want to give a little bit of priority to pediatric patients, and they want to tie-break toward people who have been waiting around longer,” Dickerson said. “How do we elicit those priorities, and how do we translate those into mathematics, which these machine-learning-based techniques require to be able to operate?”
Dickerson’s algorithms take an exchange’s objectives and determine the best policy to meet them. Then, he takes it a step further and uses machine learning to help understand the possible impacts of that policy over time, giving organ exchange programs a clearer picture of the potential, and perhaps unintended, consequences of their policy decisions.
For example, if an organ exchange decides to prioritize pediatric patients, it could give a 10-year-old boy a chance at a healthy life. But what if that organ could have gone to a single parent struggling to support four children? Or, in the case of some exchanges, what if that organ could have triggered a multi-party swap that resulted in two or more parties receiving lifesaving organs, instead of just the one 10-year-old boy?
In another example, deprioritizing people with certain health conditions could mean that, over time, survival rates drop in people from a certain demographic because they are more prone to that condition.
Armed with the information that Dickerson’s machine learning analysis provides, organ exchanges can try to reduce the likelihood of specific unintended outcomes by making adjustments to their objectives and organ-matching policies. The goal, according to Dickerson, is to make a successful system work even better.
“The dream, of course, is to help organ exchanges maximize their matches, while increasing donation success and ensuring that matches more closely align to the values of the stakeholders involved in a particular exchange,” he said.
Anahí Espíndola: Finding Threatened Species
We are living in an age of mass extinction. Scientists estimate wildlife populations have plunged by 60% in the last 40 years. It is impossible to know how many species of plants and animals are threatened with extinction, but identifying which are most at risk of disappearing is a challenge Anahí Espíndola is taking on with the help of machine learning.
An assistant professor of entomology, Espíndola is developing machine learning tools that conservation organizations and resource managers can use to predict which species are most likely to need conservation.
“When you have an enormous number of species to consider and only limited resources to assess their conservation needs, this method allows you to decide where to prioritize,” Espíndola said.
Conservation of a species is no simple matter. It involves countless ideas about what should be done and what sacrifices society is willing to make to save a plant or animal. Before any conservation action is taken, a detailed assessment must be made of the species and the threat level it faces. Such assessments are time-consuming and expensive, so resource managers and conservationists choose which species to assess and what order to assess them in. Often, they rely on educated guesses or they conduct a systematic assessment of all species within a large group—all bees in Europe, for example.
But educated guesses can be misleading. Sometimes a species that appears rare may not be threatened, and one that appears abundant may actually face a critical danger. On the other hand, broad, systematic assessments spend valuable resources on species known to not need protection. This is where Espíndola hopes machine learning can help.
Machines are masters of finding patterns in data. So Espíndola created and trained a machine learning algorithm to evaluate 150,000 species of plants from all corners of the world. The system identified more than 15,000 species that were highly likely to meet the criteria for one of the at-risk categories in the International Union for Conservation of Nature’s Red List of Threatened Species.
"Our method isn’t meant to replace formal assessments," Espíndola said. "It's a tool that can help prioritize the process, because you’re never going to be able to assess all species. So, we’re helping resource managers and conservationists make an informed choice by calculating the probability that a given species is at risk." Espíndola’s initial work predicted that 10% of the world’s plants are likely in need of conservation and should be prioritized for assessment. Now, with increasing interest in her research from state and local resource managers, Espíndola is applying the same technology at the local level. She recently assessed the status of all known bee species in the state of Maryland to assist in focusing conservation efforts. The work confirmed some known at-risk species and identified a few others that state managers should assess more closely.
One of the most useful features of Espíndola’s machine learning system is the simplicity of the system itself.
"It's very accessible," she said. "You don’t need access to crazy clusters and computing power. It can run on a laptop. You just need data and the biological knowledge to evaluate the machine learning predictions."
In the fight to conserve at-risk species, machine learning could make a world of difference.
(Original story written by Kimbra Cutlip)
(Image Credit: Ayush Pokharel)
This article was published in the Summer 2020 issue of Odyssey magazine. To read other stories from that issue, please visit go.umd.edu/odyssey.
August 25, 2020