Thursday, June 9, 2016

ImageXD: New trends in image processing and computer vision

Incredible advances are being made in image processing techniques and tools, but the scientists who use them typically don't have the opportunity to communicate with scientists who work on similar problems in different domains.

To address this issue, the Berkeley Institute for Data Science (BIDS) invited researchers from a variety of disciplines to ImageXD, a three-day workshop designed to discuss common problems in image processing across domains, ranging from academic disciplines such as deep learning, astronomy, and systems biology to the computer animation film industry. Among the highlights was a talk by Susan Fong, Technical Director Supervisor at Pixar, who shared her expertise on what it takes to bring a high-production movie like The Good Dinosaur to life.


Daniela Ushizima (Lawrence Berkeley National Lab): From face detection to the faces of scientific images

Day 2 of the workshop was kicked off by Daniela Ushizima from the Lawrence Berkeley National Laboratory (LBNL). The range of topics in her talk was as diverse as her background. A central topic of her work is the idea that common image analysis and pattern recognition techniques can find applications in a wide variety of scientific domains—ranging from biomedical micrographies to geological materials and composites, such as micro-tomography of materials and carbon sequestration. Popular examples include segmentation algorithms and CNNs, the latter of which she also implemented on IBM's TrueNorth Neurosynaptic Chip to classify medical images in mere microseconds. Her full talk can be found below:

Jitendra Malik (UC Berkeley, Google): Deep visual understanding from deep learning

Jitendra Malik, a renowned computer vision scientist went on to give a brief overview of current deep learning techniques as well as a historical account of image understanding. Malik aptly categorized historical approaches to visual scene understanding into eras that include linear filters (e.g., edge detection, simple and complex cells) to histogram operations (e.g., SIFT, HOG) and the current trend of deep neural network architectures (e.g., successors of the Neocognitron, deep learning, CNNs).

In some of his more recent work, Malik suggested to think of image segmentation as a combination of a top-down and bottom-up problem, where top-down signals inform us about abstract properties of the objects to be segmented. This top-down information can then be fused with bottom-up signals to determine how likely each pixel in the image belongs to a certain object. Like many other computer vision algorithms, this idea is inspired by how the brain is believed to process visual information. His full talk is available below:


Ben Bowen (LBNL): Web based analysis of mass spectrometry data

Next up was Ben Bowen, Applications Lead in the Environmental Genomics and Systems Biology (EGSB) Division at LBNL. As an expert in mass spectrometry and quantitative analysis, he is also the lead at OpenMSI, a web-based service that allows for state-of-the-art processing, analysis, and visualization of mass spectrometry imaging (MSI) data. Since MSI files contain hyperspectral data that can blow up to over 1TB per image, Bowen saw a need for an open-source technology that let people outsource their imaging processing to a supercomputing cluster. OpenMSI allows both management and storage of MSI files under user accounts, and gives the user ways to perform visualization and statistical analysis on their uploaded data. His full talk can be found below:

Ned Horning (American Museum of Natural History): Computer vision and Earth Science

The American Museum of Natural Histroy (AMNH) has been involved in image processing for a long time. Ned Horning, Director of Applied Biodiversity Informatics at AMNH, has over 30 years experience using remote sensing, GIS, and related skills including field mapping and the collection of training and validation data to aid and evaluate remote sensing-based mapping projects. During his talk Horning pointed out that an increasing number of governments and institutions are making their geospatial data publicly available, which is a big step towards getting rid of some of the barriers that have been troubling the field for years. Especially for developing countries there is still limited access to state-of-the-art image processing software, and some datasets are still locked behind restrictive software licensing constraints.

In his work, Horning applies state-of-the-art image processing techniques to high-resolution, low-altitude photographs to perform a number of tasks. Structure-from-motion and image warping can be used to remove distortions to due terrain. Statistical modeling is used to perform pixel-by-pixel classification and regression to locate and track certain geospatial structures. Finally, the field is getting increasingly interested in advanced computer vision and sensor fusion techniques to perform real-time object recognition and control unmanned vehicles such as drones. His full talk can be found below:


Matthew Turk (U Illinois): Imaging and "Me"

Matthew Turk from the University of Illinois uses image processing and computer vision techniques to study the formation of the first stars and galaxies in the universe. Turk's simulations are not only computationally expensive, but also require a good understanding of computer vision. Volumetric segmentation, ray tracing, rasterization/pixelation, and warped coordinate systems all find use in the generation and analysis of astronomical data—and Turk has been at the forefront of developing the necessary tools to handle this kind of data. One of his projects, called yt, is a community-developed open-source analysis and visualzation toolkit for volumetric data. yt has been applied mostly to astrophysical simulation data, but it can be applied to many different types of data including seismology, radio telescope data, weather simulations, and nuclear engineering simulations.

Susan Fong (Pixar): The Good Dinosaur

One of the highlights of the workshop was a fantastic talk by Susan Fong from Pixar Animation Studios. Fong was quick to point out that at Pixar people are not so much involved with image processing, but much more with image rendering (i.e., the process of creating images). Images are created on so called render farms—a high-performance computer cluster with tens of thousands of cores used to generate movie shots. During the creation of a new feature film such as The Good Dinosaur, for which Fong served as Technical Director Supervisor, these render farms produce images 24/7—and it was her job to make sure that the system always runs efficiently and finishes on time.

In The Good Dinosaur, the characters are under a constant threat. In order for the reviewer to perceive this threat as real, the world must feel real. Pixar thus spent a lot of time trying to make the world look and feel real, by having every single object in the scene (i.e., every tree, leaf, grain of grass) being subject to wind and weather. You can imagine how this made the creation of the movie incredibly computationally intensive. In fact, had Pixar had only one computer with a single core, that computer would have had to crunch numbers for 37 hours straight (CPU-hours) in order to produce just a single frame of the movie. Overall, it took Pixar more than 16,000 CPU-hours to render the entire movie.

A central task was therefore to design a system that could scale to thousands of computing nodes, and devise a scheduling algorithm to make sure that all CPUs were busy all the time. While this task is already theoretically daunting, in reality it is further complicated by the everyday work schedule of the programmers and artists involved in the movie. People would usually work on a shot during the day, and submit a job to the render farm before they leave work. When they come back the next morning, the images better be rendered, otherwise production stalls. One of Fong's tasks was therefore to accurately predict the workload of the render farm.

To make things even more complicated, what if all of the render jobs share data? For example in The Good Dinosaur, there is a river in the center of the valley that can be seen in almost every shot. The necessary data to represent the physical appearance of the river with all its dynamics encompasses 300TB of data. 300TB! And these data need to be accessed and shared among thousands of computing nodes every night.

Fong then shared some of her expertise on how to approach such a daunting task. Among a variety of methods to get a clear specification of the jobs to be run and the available resources (e.g., deep IO analysis, disk space estimates, backup policies), she highlighted the importance of determining data access patterns for successful memory management. Often, her methods reminded her of that ominous Google interview about the number of barbers in Texas, she joked. Truth is, lots of efforts are required to keep time and memory under control in order to fully utilize the available machines—and even the best heuristics do not prevent you from having to react to last-minute "curve balls".