AI For Good: How Machine Learning is Resolving the Flint Water Crisis
It has been six years since the water system of Flint (Michigan, USA) was contaminated by lead. At the heart of the, still ongoing, effort to replace hazardous pipes, is a machine learning model that helped identify thousands of water lines containing lead. Lack of data inputs, budget constraints, governance issues, etc… this bittersweet story shows the unparalleled potential of AI for good, but also the vast obstacles that are preventing its democratization.

The problem
Flint is one of the poorest cities in the US. In 2014, in an effort to save money, officials stopped buying treated water from Detroit and started using a city-owned treatment facility. Anti-corrosion agents, that would have cost $140 a day, were not used. While the aim of cutting costs was achieved, simultaneously a much greater problem arose. Lead found its way into the water supply, severely endangering the 100’000 residents. Lead poisoning can damage the nervous system and is especially dangerous to young children.
Replacing the lead pipes would have been easy, if the city had known where they were, and how many there were. “Service line materials are in theory documented […], but in practice these records are often incomplete or lost.” (1).
The data
A committed team of researchers from Georgia Tech and the University of Michigan started collaborating with Flint promptly to resolve this problem. Their goal was challenging: find and replace an unknown number of hazardous pipes throughout the city with a limited budget.
They had access to a description of all 55 k parcels in the city (address, value, construction year, …), but no reliable labels (with the material of each pipe), which are necessary to train a model to predict the presence of lead in each house. Without accurate historical records, labeling was expensive: "the information is buried underground, it is costly to determine the material composition of even a single pipe.” (1).
The team built an application where all contractors excavating pipes would record their findings to centralize this information. In late 2016, early data arrived: lead was found in 165 out of 171 inspected pipes, a number much higher than expected.
The model
Excavating a safe pipe was useful in terms of data collection, but considered “wasted money” due to the limited budget. The hit-rate (number of lead pipes among those excavated) had to be maximized throughout the entire process to minimize the impact on the inhabitants health and the effective cost of each replacement. A typical exploration vs. exploitation trade-off.
Using data from the application, the team trained a XGBoost model to predict the probability of a lead in each parcel, with an AUROC score of 0.939. Then, they used an active learning algorithm, IWAL, to decide which parcel to inspect next, based on lead probability, but also on model uncertainty. By repeating this process iteratively, the model confidence would improve over time while maintaining a high hit-rate.
The adoption
It was working. In 2017, thanks to this model, 70% (6’228) of the inspected homes had their pipes replaced. But in 2018, the city and a new contractor discarded the model predictions and started to inspect all pipes in some selected blocks. The hit rate dropped to 15% (1’537 replacements), worse than selecting pipes randomly.

According to a 2019 article from The Atlantic (2), the reason was political: “People would say, ‘You did my neighbor’s house and you didn’t do mine’”, explained Karen Weaver, Flint mayor at the time. The city administration “did not want to have to explain to a councilperson why there was no work in their district”, said Alan Wong, project manager. In 2019, after spending millions to re-bury needlessly excavated pipes, the city started using the model again and the hit-rate rose once again (3, 4).
The democratization
As of April 2020, 25’000 lines had been excavated, more than 9’500 pipes replaced, and it is estimated that ~1’000 water accounts still needed replacements (5). A visualization of the progress made is now openly accessible on an interactive map, where every Flint resident can input their address and visualize the status of their water pipes.

Map of 2018 pipe excavation activity showing copper pipes in blue and lead or stainless steel ones in red. In the three highlighted areas, contractors excavated large numbers of homes and found little or no lead service lines. (City of Flint)
In 2016, a study estimated that 6.1 million lead lines were still in service in the US. The research team recently created BlueConduit, a company analysing water infrastructure in more than 30 towns. The Flint case study shows the great impact AI can have on public health, as long as its potential is not ignored and it is used with transparency. While this case is very specific, the issue can be relevant in a number of instances. In fact, the World Health Organisation estimates that over the world, 2.2 billion people do not use safely managed drinking-water services. AI could have a pivotal impact in changing this.
Sources:
(1) Abernethy, J., Chojnacki, A., Farahi, A., Schwartz, E., & Webb, J. (2018, July). ActiveRemediation: The search for lead pipes in Flint, Michigan. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 5-14). https://arxiv.org/pdf/1806.10692.pdf
(2) How a Feel-Good AI Story Went Wrong in Flint https://www.theatlantic.com/technology/archive/2019/01/how-machine-learning-found-flints-lead-pipes/578692/
(3) Flint Returns To Machine Learning Solution For Lead Line Replacement https://www.wateronline.com/doc/flint-returns-to-machine-learning-solution-for-lead-line-replacement-0001
(4) BlueConduit's impact in Flint https://www.blueconduit.com/flint-case-study
(5) Visualizing and Communicating the Water Pipe Replacement Program Progress in Flint, Michigan, April 2020 OmniSci Virtual Summit https://www.youtube.com/watch?v=EE12JbKqa54
(6) Drinking Water (World Health Organisation) https://www.who.int/news-room/fact-sheets/detail/drinking-water