Feature Selection

 by G. Petrie, P. Heasler and T. Warner

Pacific Northwest National Laboratory Remote Sensing and Electro-Optics Technical Group, in cooperation with the West Virginia University have been developing, testing and applying hyperspectral data sets ranging from early AVIRIS data sets to HYDICE systems to characterize subtle landcover features. This differentiation was not possible with low resolution multispectal data sets. However, the large amounts of data that must be processed present significant problems to the end user. In order to help overcome these problems a number of band selection strategies were developed and tested that significantly reduced the requirement to process and analyze redundant data. In general the application of these methods suggests that the number of bands needed to solve a given problem can often be reduced to 10 or less. In many cases this reduction in bands can remove significant barriers to the application of hyperspectral data to a large number of environmental problem sets.

METHODOLOGY

Figure 1. Examples of Band Selection Methods.

Fig. 1 is a schematic overview of the several options available to select bands. The most straightforward is expert judgment. Under the right circumstances this approach can be quite effective. However, beyond the obvious actions, such as removing water absorption bands, there are a number of disadvantages associated with such a subjective method. For example, it is difficult to justify and explain a selection, the method is subject to human error, and it can be dangerous to apply in new situations. Data compression is another option (e.g., Principle Component Analysis). However, compression has several significant disadvantages.For these reasons our experience suggest that the methods outlined by boxes in Fig. 1 can be particularly useful.

Spatial Autocorrelation

Narrow-band feature selection begins with the ratioing of each band with every other band, and the resulting images are ranked based on their relative spatial autocorrelation. The highest ranking pair is listed in the band rank table as the best two images. The third best image is found by searching the table of ranked pairs for that band that gives the most information (i.e., the highest spatial autocorrelation) when ratioed with the previously selected two images. The fourth and higher order choices are identified in the same way, each time finding the next image that gives the most information when compared to all the previously selected bands.

Broad-band feature selection starts with a pre-selected number of ranked bands from the narrow-band feature selection. The number of bands used corresponds to the number of broad-band features that are desired. Each remaining band is then tested to determine if adding it to one of the previously chosen bands adds to the overall spatial autocorrelation of those bands. If only bands spectrally adjacent to the previously chosen bands are candidates for testing and incorporation with those bands, the method is termed broad-band feature selection. However, if all remaining bands are candidates, then it is termed multiple band feature selection.

Spectral Autocorrelation-Based Approaches

In this approach the objective is to construct n aggregated bands from the raw hyperspectral bands available, utilizing spectral information only (i.e., the spatial component is not considered). These aggregated bands are constructed by averaging adjacent bands together. The aggregated bands need not use all raw bands and may consist of individual raw bands. The methods used to achieve this reduction in band complexity can be grouped into two broad classes: (1) optimization with distance metrics (e.g., divergence) and (2) basis function optimization. These two approaches are summarized below.

Optimization with Distance Metrics

Optimization of distance matrices (e.g. divergence) is based on the idea that, for a given set of n bands, it is possible to effectively measure the discriminating power, or degree of separation, between the bands for a specific target detection problem. The user provides (or picks out from a image) a set of spectra for both the target (e.g., tanks) and background (e.g., vegetation) and the algorithm performs an optimization search to find the best set of bands which separate the target from the background class(s).

Basis Function Optimization

The use of basis functions to define the n best bands follows the work of Price. The fundamental idea is that the hyperspectral data set can be represented accurately by a small set of n basis functions that, when multiplied by the appropriate scalar value and added together, will represent the original data well. For each hyperspectral sample the scalar values used to multiply the corresponding basis functions are constructed by averaging the appropriate adjacent bands together. Since these n scalar values are all that are necessary to replicate all the original band values in the original sample, when used with the corresponding basis functions, these bands are taken to represent the best n bands in our band selection analysis. We have experimented with a modification original methodology. In Price's original approach, the bases functions are calculated in an iterative fashion, one at a time. In our modification we use an alternative least squares strategy to calculate the basis functions all at the same time. This extension has the advantage of being both conceptually and operationally simpler.

CONCLUSIONS

Hyperspectral data sets offer the potential to characterize important environmental problems that require the differentiation of subtle spectral signatures. This capability can be expected to increase with the availability of new hyperspectral systems (e.g., OrbView 4, http://www.orbimage.com/satellite/orbview4/orbview4.html). However, to fully capitalize on these new opportunities, particularly for the analysis of large areas, it will often be necessary to use a band selection approach that captures the essential information for a given problem set. Our experience suggests that the three methods described above each offer advantages that can help reduce operational complexity and cost, computational burdens, data transmission bandwidth limitations, and improve system trade-offs. For instance, the distance metric method allows the user to focus attention on features (in terms of target and background spectra) that are known to be important in advance of data collection based on previous imagery or laboratory spectra. Because the spatial correlation methodology does not exploit a priori information, it has the advantage of considering information over the entire image. Thus unexpected classes are less likely to be overlooked. Further, since the spectral and spatial methods consider complementary factors, using them together help insure that the user has captured all the relevant information.