Airbus is deriving value from its use of synthetic data to process satellite imagery, and premier game engine player Unity is using synthetic data to break into new markets, including robotics and autonomous vehicles.
Those were takeaways from two talks at the recent Ai4 virtual conference with enterprise AI users in a range of industries.
“Simulation and synthetic data generation is a game changer for AI,” stated Danny Lange, senior VP of AI and Machine Learning at Unity of San Francisco. The company has grown up in game development and now offers an engine for producing 3D projects, which they plan to use to extend into new industries including robotics, autonomous cars, engineering and manufacturing.
Over 60% of all games are made with Unity, which is installed on over six billion unique devices and has three billion monthly active players, Lange stated. Addressing what game engineers can bring to the table for the development of AI, Lange noted the use of games as proof points in the development of AI.
Arthur Lee Samuel, an American pioneer in gaming and AI, who popularized the term “machine learning” in 1959, created a checker-playing program which was an early demonstration of concepts of AI. In another example, the Chinook checker-playing program originated at the University of Alberta, Canada, won the World Championship in 1994, said to be the first program to win a human world championship. In 1997, IBM’s Deep Blue chess playing computer became the first to win a chess match against a reigning human world champion, Garry Kasparov. The human champion Kasparov beat Deep Blue in 1996, but lost in a rematch in May 1997. Kasparov accused IBM of cheating.
In 2011, IBM’s Watson defeated two champions on the televised Jeopardy game show, using natural language processing, information retrieval and automated reasons. Then in 2016, Google’s DeepMind unit created a game to defeat the human champion in Go, which is many times more complicated and difficult than chess.
“Video games are fantastic to drive AI,” stated Lange. Explaining why, he outlined four dimensions the Unity engine can produce: visual components, physical components, cognitive components and finally, a social aspect. “These four dimensions are also relevant to the advancement of AI,” Lange said.
The 3D engine can produce spatial environments with advanced lighting, visual cameras to provide different points of view, and a physics engine (PhysX 4.1 from NVIDIA), to provide physical reality. Control is provided through an AI reinforcement learning approach, which Lange called “nature’s way” of learning.
In the real world, collecting the data needed to train the machine learning algorithms at the heart of many AI systems is expensive and time-consuming to acquire. “The synthetic data advantage is if you can generate all that data at a high-volume and lower cost, to get perfect-labeled training data. If you need more, you generate more. And you have enough, you stop. You only pay for the resources consumed,” Lange stated.
Outlining an example of reinforcement learning, Lange described a game showing a chicken trying to cross a road with moving obstacles to get gift rewards. The chicken gets better over time, and after six hours, “It has become superhuman,” Lange stated. “By going through this feedback loop of observing, taking action and reaping the wards, it has become perfect for crossing the road.”
He demonstrated how the movements of a digital robot created with the Unity 3D engine can be transferred to a physical robot, via a robot software engine. Several alternatives are available. “Now we have a digital version of the robot,” Lange stated, showing a split screen with the digital robot on the left and the physical robot on the right, making the same movements. “It’s much easier and faster to make changes and make mistakes. And you don’t break anything using the simulated model.”
He coined a new rule, “Moore’s Law for AI,” that states, “The amount of training data doubles every 18 months. The world is running out of real-world data; we need synthetic data to keep up.”
He offered some comparisons: one year of a human life at 30 frames per second would be one billion frames. Synthetic data could be generated at the rate of 200 to 500 billion frames per second, depending on the hardware and software available, or the synthetic data could be generated in a cloud system with massive parallelism. “Look at this as using simulation to unlock data generation for AI,” Lange stated.
He showed an example of a robotic hand solving a Rubik’s cube, using OpenAI and Unity. In the initial simulation, OpenAI generated 300 billion simulated frames, equivalent to 300 years of experience. For a second simulation, OpenAI and Unity generated 10 trillion simulated frames, equal to 10,000 years of virtual experience.
“So we want from 300 billion to ten trillion to solve a more complex task,” Lange said to contrast the synthetic data volumes.
The Unity Perception package is available on GitHub, for developers to use to create visual training data.
Airbus Testing Synthetic Data to Process Satellite Imagery
In addition to designing and manufacturing civil and military aerospace products, the European multinational Airbus is the prime contractor for over 70 earth observation satellites, providing imagery for its customers.
Jeff Faudi, senior innovation manager for Airbus Defense and Space-Intelligence, shared an Ai4 session with Omri Greenberg, cofounder and CEO, OneView. Offering a platform for the creation of virtual synthetic datasets for analysis of earth observation imagery, OneView engaged in a pilot project with Airbus to test the effectiveness of synthetic data.
For the past five years, Faudi has been working to foster the use of AI in satellite imagery inside and outside his organization. “Airbus is known for manufacturing aircraft, but so much for manufacturing satellites and operating them in orbit. This is what my business unit does,” Faudi stated. Airbus has more than 600 employees spread over 30 receiving stations worldwide receiving images directly from satellites. Airbus is processing 100 times more imagery data today than 10 years ago. Airbus describes its imagery services at OneAtlas.
Users of satellite imagery services include: urban planners, for extracting buildings and roads as needed; railway companies, who can detect vegetation encroachment over their lines, retail specialists, who can measure the number and location of private cars in front of major shopping malls; and military defense services, who may be searching for aircraft in crisis areas, Faudi stated.
In the old days, when clients were getting one image per day or week, “People would process it with their own eyes,” stated Faudi. Now, “we are getting too much data. To find one specific object among thousands and thousands, is like finding a needle in a haystack,” he stated.
“We believe the only way to do this systematically and consistently is to leverage machine learning to automatically extract insights from satellite imagery,” Faudi stated. To do this effectively, requires very, very large datasets.
To train a new AI machine learning algorithm, it takes 10s or 100s of thousands of objects, which need to be manually identified and annotated, in order to train the algorithm. Then the dataset is reviewed to correct annotation mistakes, and the process repeats until a sufficient level of quality is reached. It takes on average one week to complete a cycle, and 10 to 20 cycles are needed depending on the complexity of the use case.
Sees Potential To Roll Out New Services More Quickly
“In some cases, we cannot start the process because we lack the required imagery,” Faudi stated. Greenberg stepped in to outline how his OneView product can be used to more quickly create labeled training data, for a lower cost. For the proof of concept, Airbus chose a project on the classification of aircraft, which was done in part to test how new services can be created for Airbus clients.
“Every time one of our partners needs to create a new algorithm, they need to identify, acquire and download a lot of imagery. This is often a showstopper,” Faudi stated. Aircraft identification is a common use case in his business, and Airbus has deep experience doing it. “We knew we would be able to compare the results of using synthetic data to the use of real data, to validate the approach,” Faudi stated.
Three training datasets were created: one with real images for benchmarking, one with a 100 percent generated training dataset from OneView, and one with a combination, including 95 percent synthetic data and five percent real data. The resulting dataset included 82,000 synthetic aircraft images and 3,000 real images.
The results showed a 20% improvement from the mixed data set, over what Airbus was experiencing with real data. “The synthetic data made for an actual algorithm to be deployed into production, which is extremely valuable,” Faudi stated. Overall, he considered the pilot a success.
“We think this has huge potential for the future,” he stated. “It will definitely help us cut the annotation costs, so we can reduce the turnaround time for a new algorithm, and we can accurately assess the accuracy of the algorithm. Over the coming years, we will see a huge increase in the use of synthetic data in training algorithms.”