Green ML, Interpretable Cancer Detection, and Self-supervised Transformers
A Google and UC Berkeley collaboration suggests novel tactics to reduce carbon emissions of Machine Learning systems.
Context
The size of state-of-the-art models for tasks in Computer Vision and Natural Language Processing are increasing every year. As you can imagine, the bigger a model is, the more compute resources are needed to train it adequately. Recent research has highlighted the different costs that are associated with such training. In particular, an important paper led by Timnit Gebru and Emily Bender called On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? brings up the environmental and financial costs of training with large data as well as the dangers of deploying these systems at scale. This paper is covered in more detail in an earlier Digest issue.
The important takeaway is that the environmental cost of these immense Machine Learning systems is too large of a burden. As such, the community has increasingly focused on finding new methods to reduce carbon emissions. MLCommons, a global ML community founded late last year, has recently published MLPerf, the latest benchmark that includes ~2000 performance and ~850 power efficiency results for leading Machine Learning systems.
What's new
Amidst the rising concern for the environmental impact of Machine Learning solutions, a research team from Google and UC Berkeley recently published Carbon Emissions and Large Neural Network Training. They posit that it is possible to reduce a model's carbon footprint 1000x using relevant hardware, efficient data centers, and streamlined architecture.
Reporting the energy consumption of five important NLP models, they found that inference consumes more energy than training.

Accelerator years of computation, energy consumption, and CO2e for five large NLP DNNs
The paper brings up various strategies that can help reduce their impact:
- Model design is essential. For example, using transfer learning allows you to re-utilize hours of saved training. Moreover, there are also several pruning and distillation techniques that are known to increase energy efficiency by a factor of 3 to 7. Data Scientists should remain wary, however, of the potential bias that compression can bring to your model.
- Hardware plays its role as well. Novel processing units such as Google TPUs are built specifically for Machine Learning workloads and are known to be more efficient than GPUs. Do not be shortsighted, however, as their use entails their production and its associated environmental cost.
- Efficient data centers can greatly improve your carbon emissions. In fact, traditional data centers are not optimized for Machine Learning workloads. The switch to a more suited cloud computing service can do wonders.
Why it matters
Training and deploying large systems is associated to many different costs. For Machine Learning systems in particular, the environmental cost of a single model can be immense. This cost has become a growing concern in recent years due to the widespread adoption of AI. It is promising that the major AI players are working together to find solutions for a greener Machine Learning future.
What's next
As stated by the authors themselves: "We are working to be more transparent about energy use and CO2e in our future research. To help reduce the carbon footprint of ML, we believe energy usage and CO2e should be a key metric in evaluating models, and we are collaborating with MLPerf developers to include energy usage during training and inference in this industry standard benchmark."
Researchers from Google Health present a novel human-interpretable weakly supervised Deep Learning system for predicting disease-specific survival.
Context
As you can imagine, treatment decisions for cancer patients are strongly guided by disease prognosis. For an accurate prognosis, it is essential to understand and characterize a patient's cancer adequately. Traditionally, the TNM stage combined with additional data such as histopathological parameters and molecular features provide relevant information for patient management.
In recent years, many have attempted to process this data with Machine Learning algorithms to extract more precise prognostic information automatically as a way to gain further insights. However, its application has remained limited due to its unreproducibility and the black box aspect of most applications.
What's new
A research team from Google Health recently published an interesting paper proposing interpretable methodology for disease survival prediction. In particular, their work focuses on the prognostication of colorectal adenocarcinoma, the third-most common cancer. In fact, risk stratification can be particularly beneficial in its combination with a Machine Learning approach. The team used archived anonymized pathology slides with their associated clinicopathologic variables and outcomes from the Institute of Pathology and the Biobank at the Medical University of Graz.
Their methodology relies on the sequential combination of a tumor segmentation model that reaches an AUC of 0.985 and a prognostic model reaching an AUC of 0.692. The latter uses segmented image patches of the tumor containing region to predict a case-level risk score (using a combination of CNN layers with Cox regression layer).
To understand the output of the model, several analyses were performed. First, the output of the Deep Learning system was evaluated with a multi-variable linear regression using the cases' associated clinicopathologic features. Overall, this regression achieved an R-square value of 18%. This result indicates that the features alone do not explain the result of the model.
Second, the model's output was associated to histological features derived from clusters that were formed using a different unsupervised image-similarity model. The resulting embeddings were used with the k-means algorithm using 100'000 tumor-containing target patches. The researchers selected k=200 clusters of similar histological features, later used to classify case images. All 200 features combined explained 73% of the model's variance, showing that histological features are of very high importance with respect to predicting prognosis.

Above that, the research team investigated the importance of patch-level histoprognostic features and the tumor-adipose feature.
Why it matters
The novel approach achieves high performance in stratifying disease prognosis, which is particularly helpful for colorectal adenocarcinoma. Moreover, it incorporates post-hoc interpretability techniques that can be used to explain the Deep Learning model's predictions. In addition, the histological features can go beyond explaining the model's predictions by also serving as independent prognostic features to be used by physicians on a standalone basis.
What's next
The paper's code is available here. As the adoption of Artificial Intelligence in Healthcare increases, it is essential for researchers to explore relevant and useful interpretability techniques. In fact, for such delicate use-cases, gaining insight on a model's inner workings can increase user trust tremendously, which is particularly relevant in the healthcare industry.
New methods proposed by Facebook AI Research advance state-of-the-art image segmentation with self-supervised Transformers and 10x more efficient training.
Context
Image segmentation is a flagship Computer Vision task. It helps a system understand where certain objects or semantic concepts are situated within an image. Useful for many different use-cases, it is considered as one of the hardest challenges in Computer Vision. Until now, immense amounts of meticulously annotated data and unthinkable compute resources were a must in order to beat benchmarks.
What's new
Facebook AI Research has published two distinct methods that advance state-of-the-art segmentation significantly, DINO and PAWS. While DINO is a novel way of training self-supervised Vision Transformers (ViT), PAWS is a general model-training approach that uses much less compute than traditional ones.
DINO is able to segment objects in an image or video with absolutely no supervision or segmentation-targeted objective! Above that, they have discovered that the model's features are interpretable. This finding clearly suggests that there is a capacity of higher-level understanding. The methodology was developed in collaboration with researchers at INRIA.

While high performance accuracy is important, it remains paramount to produce and maintain efficient Machine Learning algorithms. In fact, it allows to democratize state-of-the-art models to researchers that don't have access to large-scale compute and has lower costs (both financial and environmental). To this end, Facebook AI proposes PAWS, a new model-training method. By using a ResNet-50 model to pretrain on 1% of ImageNet labels, PAWS is able to achieve state-of-the-art performance in 10x less steps.

For more details about DINO and PAWS:
Why it matters
Innovation and research in image segmentation is stifled by the need for human annotation. Often, an immense amount of manual labor and significant domain expertise is need, making this manual labeling step a huge bottleneck.
The research discussed in this post is focused on two of the most important breakthroughs in recent AI history: Transformers and self-supervised learning. These breakthroughs are particularly important for fields such as medical imagery where annotated data is rare. Above that, the use of PAWS allows to train these models way more efficiently, allowing researchers across the globe to benefit from complex models.
What's next
As stated by the researchers themselves: "We hope that reducing the computation requirements of self-supervised and semi-supervised approaches will increase their adoption and stimulate further research in this area. In the spirit of collaboration and open science, we are publishing our work, and the associated source-code will be released."
With these novel techniques, image segmentation models can become far less dependent on huge amounts of annotated data and immense compute resources.
It is important to add that these breakthroughs would not be possible without Facebook's incredible academic partners in Paris, France and Montreal, Canada (INRIA, Sorbonne University, MILA, and McGill University).