The concept is certainly compelling. Having a machine capable of reacting to real-world visual, auditory or other type of data and then responding, in an intelligent way, has been the stuff of science fiction until very recently.
We are now on the verge of this new reality with little general understanding of what it is that artificial intelligence, convolutional neural networks, and deep learning can (and can’t) do, nor what it takes to make them work.
At the simplest level, much of the current efforts around deep learning involve very rapid recognition and classification of objects—whether visual, audible, or some other form of digital data. Using cameras, microphones and other types of sensors, data is input into a system that contains a multi-level set of filters that provide increasingly detailed levels of differentiation. Think of it like the animal or plant classification charts from your grammar school days: Kingdom, Phylum, Class, Order, Family, Genus, Species.
The trick with machines is to get them to learn the characteristics or properties of these different classification levels and then be able use that learning to accurately classify a new object they haven’t been previously exposed to. That’s the gist of the “artificial intelligence” that gets used to describe these efforts. In other words, while computers have been able to identify things they’ve seen before, learning to recognize a new image is not just a dog, but a long-haired miniature dachshund after they’ve “seen” enough pictures of dogs is a critical capability. Actually, what’s really important—and really new—is the ability to do this extremely rapidly and accurately.
Like most computer-related problems, the work to enable this has to be broken down into a number of individual steps. In fact, the word “convolution” refers to a complex process that folds back on itself. It also describes a mathematical formula in which results from one level are fed forward to the next level in order to improve the accuracy of the process. The phrase “neural network” stems from early efforts to create a computing system that emulated the human brain’s individual neurons working together to solve a problem. While most computer scientists now seem to discount the comparison to the functioning of a real human brain, the idea of a number of very simple elements connected together in a network and working together to solve a complex problem has stuck, hence convolutional neural networks (CNNs).
Deep learning refers to the number, or depth, of filtering and classification levels used to recognize an object. While there seems to be debate about how many levels are necessary to justify the phrase “deep learning,” many people seem to suggest 10 or more. (Although Microsoft’s research work on visual recognition went to 127 levels!)
A key point to understanding deep learning is there are two critical but separate steps involved in the process. The first involves doing extensive analysis of enormous data sets and automatically generating “rules” or algorithms that can accurately describe the various characteristics of different objects. The second involves using those rules to identify the objects or situations based on real-time data, a process known as inferencing.
No comments:
Post a Comment