Have you ever wondered how an IP camera captures video, converts it to a digital format, and sends it on its way? Understanding how IP cameras work is important for understanding datasheet specifications and IP camera capabilities, so you can choose the best devices for your customers' needs. Knowing at least the basics of how this technology works can help you better choose the appropriate hardware based on the application requirements.
In this article, we examine the processes involved in video capture on IP cameras and the various associated technologies used for this purpose.
As you know, an IP video is a device with a lens that connects to the network via an Ethernet port or a Wi-Fi link, capturing video and sending it over the network to be viewed in real time or recorded on a network video recorder (NVR).
A camera records video by capturing light reflected from objects within its field of view, much like how our eyes give us sight. The reflected light that enters the camera lens is analog in nature. In another article, we examined how the human voice is digitized, and we explained in more detail about analog phenomena in the natural world. In the same sense, the capture of video is analogous to the recording of audio… pun intended 😉.
So, we must somehow convert this analog light into bits to be transmitted over an IP network. Let’s break down this process and unpack each step to more fully understand how this works.
IP cameras, like all digital cameras, use a light sensor to collect light. This sensor uses a technology called complementary metal-oxide semiconductor (CMOS). The CMOS sensor is essentially a rectangular array of microscopic sensor elements. Each sensor element is composed of photodetectors, each sensitive to different wavelengths or colors of light.
Due to the geometry of the sensor elements, four photodetectors are necessary, even though there are three primary colors to capture. The green photodetector was chosen to be duplicated to make the CMOS more sensitive to green light in much the same way that our eyes are more sensitive to it. The resulting pattern of photodetectors is called a Bayer filter mosaic. A small portion of a typical CMOS pixel array is depicted in the following diagram, along with a closeup of a single element.
Each sensor element corresponds to a single pixel of the captured video stream and is composed of a combination of the light levels captured by those four photodetectors, resulting in a specific brightness and color.
Each photodetector is composed of various layers, including a silicon substrate, a photodiode, a potential well, a color filter, and a microlens. When photons enter the microlens, they are focused to enter the photodetector. They pass through the color filter, ensuring that only the desired wavelengths are allowed through. The photons hit and are absorbed by the photodiode, resulting in the ejection of an electron from the photodiode material. That electron is captured in the potential well below the diode. The more electrons that are captured, the brighter the corresponding pixel becomes.
Once this process is complete, one frame of video will have been detected and stored. The electrons are then flushed out of the potential wells of all the photodetectors in the array, neutralizing any accumulated charge and making them ready to start capturing the next frame. This process is repeated at the frequency of the framerate for which the camera is rated. Typically, IP camera frame rates range from 5 to 120 frames per second.
Once a frame is digitized, the digital signal undergoes additional processing to correct for noise, adjust color balance, and optimize the image quality. This includes compensating for variations in sensor sensitivity, applying gamma correction, and employing various compression algorithms. All of these processes are performed by purpose-built digital signal processors (DSPs), ensuring that the resulting video stream is correctly processed and sufficiently compressed so it can be sent on the network as efficiently as possible.
Modern IP cameras are usually rated based on the pixel resolution of the video they provide, with terms such as “High Definition” and “4K” commonly being seen. Here is a table that describes these resolutions and gives an idea of the physical sizes of typical corresponding CMOS image sensors.
Note the following:
As you look at the values of these resolutions, you can quickly see that the number of pixels in most of these CMOS sensors is in the millions and, in some cases, tens of millions. Since each sensor element has four photodetectors, the total number of photodetectors can exceed 100 million. You can begin to appreciate the processing power and the transmission speeds needed within the chip itself to achieve frame rates of 60 or 120 per second with larger sensor sizes.
A completed CMOS photodetector on a modern IP camera looks something like this:
Browse our catalog for IP cameras
The development of CMOS image sensors for digital and IP cameras is a testament to decades of scientific innovation. From precision manufacturing to advanced light capture, digitization, and processing, even entry-level IP cameras incorporate sophisticated technology that is often taken for granted. Gaining insight into these processes helps us choose the best solution for our customers' needs.
You may also like:
Demystifying codecs: digitizing the human voice
The wonderful world of voice codecs
Grandstream facility management solutions