Kinect for Windows SDK Programming Guide
上QQ阅读APP看书,第一时间看更新

Features of the Kinect for Windows SDK

Well, as of now we have discussed the components of the Kinect SDK, system requirements, installation of SDK, and setting up of devices. Now it's time to have a quick look at the top-level features of Kinect for Windows SDK.

The Kinect SDK provides a library to directly interact with the camera sensors, the microphone array, and the motor. We can even extend an application for gesture recognition using our body motion, and also enable an application with the capability of speech recognition. The following is the list of operations that you can perform with Kinect SDK. We will be discussing each of them in subsequent chapters.

  • Capturing and processing the color image data stream
  • Processing the depth image data stream
  • Capturing the infrared stream
  • Tracking human skeleton and joint movements
  • Human gesture recognition
  • Capturing the audio stream
  • Enabling speech recognition
  • Adjusting the Kinect sensor angle
  • Getting data from the accelerometer
  • Controlling the infrared emitter

Capturing the color image data stream

The color camera returns 32-bit RGB images at a resolution ranging from 640 x 480 pixels to 1280 x 960 pixels. The Kinect for Windows sensor supports up to 30 FPS in the case of a 640 x 480 resolution, and 10 FPS for a 1280 x 960 resolution. The SDK also supports retrieving of YUV images with a resolution of 640 x 480 at 15 FPS.

Using the SDK, you can capture the live image data stream at different resolutions. While we are referring to color data as an image stream, technically it's like a succession of color image frames sent by the sensor. The SDK is also capable of sending an image frame on demand from the sensor.

Chapter 4, Getting the Most Out of Kinect Camera, talks in depth about capturing color streams.

Processing the depth image data stream

The Kinect sensor returns 16-bit raw depth data. Each of the pixels within the data represents the distance between the object and the sensor. Kinect SDK APIs support depth data streams at resolutions of 640 x 480, 320 x 240, and 80 x 60 pixels.

Near Mode

The Near Mode feature helps us track a human body within a very close range (of approximately 40 centimeters). We can control the mode of sensors using our application; however, the core part of this feature is built in the firmware of the Kinect sensor.

Note

This feature is limited to the Kinect for Windows sensor only. If you are using the Xbox sensor, you won't be able to work with Near Mode.

We will talk about depth data processing and Near Mode in Chapter 5, The Depth Data – Making Things Happen.

Capturing the infrared stream

You can also capture images in low light conditions, by reading the infrared stream from the Kinect sensor. The Kinect sensor returns 16 bits per pixel infrared data with a resolution of 640 x 480 as an image format, and it supports up to 30 FPS. The following is an image captured from an infrared stream:

Capturing the infrared stream

Note

You cannot read color and infrared streams simultaneously, but you can read depth and infrared data simultaneously. The reason behind this is that an infrared stream is captured as a part of a color image format.

Tracking human skeleton and joint movements

One of the most interesting parts of the Kinect SDK is its support for tracking the human skeleton. You can detect the movement of the human skeleton standing in front of a Kinect device. Kinect for Windows can track up to 20 joints in a single skeleton. It can track up to six skeletons, which means it can detect up to six people standing in front of a sensor, but it can return the details of the full skeleton (joint positions) for only two of the tracked skeletons.

The SDK also supports tracking the skeleton of a human body that is seated. The Kinect device can track your joints even if you are seated, but up to 10 joint points only (upper body part).

The next image shows the tracked skeleton of a standing person, which is based on depth data:

Tracking human skeleton and joint movements

The details on tracking the skeletons of standing and seated humans, its uses, and the development of an application using skeletal tracking, is covered in Chapter 6, Human Skeleton Tracking.

Capturing the audio stream

Kinect has four microphones in a linear configuration. The SDK provides high-quality audio processing capabilities by using its own internal audio processing pipeline. The SDK allows you not only to capture raw audio data, but also high-quality audio processing by enabling the noise suppression and echo cancellation features. You can also control the direction of the beam of the microphone array with the help of the SDK.

We have covered the details of audio APIs of Kinect SDK in Chapter 7, Using Kinect's Microphone Array.

Speech recognition

You can take advantage of the Kinect microphone array and Windows Speech Recognition APIs to recognize your voice and develop relevant applications. You can build your own vocabulary and pass it to the speech engine, and design your own set of voice commands to control the application. If a user says something with some gestures, say while moving a hand as shown in the following picture, an application can be developed to perform some work to be done depending on the user's gestures and speech.

Speech recognition

In Chapter 8, Speech Recognition, we will discuss the APIs and build some sample applications by leveraging the speech recognition capability of the Kinect for Windows SDK.

Human gesture recognition

A gesture is nothing but an action intended to communicate feelings or intentions to the device. Gesture recognition has been a prime research area for a long time. However, in the last decade, a phenomenal amount of time, effort, and resources have been devoted to this field in the wake of the development of devices. Gesture recognition allows people to interface with a device and interact naturally with body motion, as with the person in the following picture, without any device attached to the human body.

Human gesture recognition

In the Kinect for Windows SDK, there is no direct support for an API to recognize and deal with human gestures; however, by using skeleton tracking and depth data processing, you can build your own gesture API, which can interact with your application.

Chapter 9, Building Gesture-controlled Applications, has a detailed discussion about building gesture-controlled applications using the Kinect for Windows SDK.

Tilting the Kinect sensor

The SDK provides direct access to controlling the motor of the sensor. By changing the elevation angles of the sensors, you can set the viewing angle for the Kinect sensor as per your needs. The maximum and minimum value of elevation angle is limited to +27 degrees and -27 degrees, in SDK. If you try to change the sensor angle more or less than these specified ranges, your application will throw an invalid operation exception.

Note

The tilting is allowed only for the vertical direction. There is no horizontal tilting with Kinect sensors.

We will cover the details of tilting motors and the required APIs in Chapter 4, Getting the Most Out of Kinect Camera.

Getting data from the accelerometer of the sensor

Kinect treats the elevation angle as being relative to the gravity and not its base, as it uses its accelerometers to control the rotation. The Kinect SDK exposes the APIs to read the accelerometer data directly from the sensor. You can detect the sensor orientation by reading the data from accelerometer of the sensor.

Controlling the infrared emitter

Controlling the infrared emitter is a very small but very useful feature of the Kinect SDK, where you can forcefully turn the infrared emitter off. This is required while dealing with the data from multiple sensors, and when you want to capture data from specific sensors by turning off the IR emitters of other sensors.

Note

This feature is limited only to the Kinect for Windows sensor. If you are using the Xbox sensor, you will get InvalidOperationException with the The feature is not supported by this version of the hardware message.