What you need to know about IP cameras before you buy

Written by Daniel Noworatzky | Apr 9, 2025 2:09:00 PM

Have you ever wondered how an IP camera captures video, converts it to a digital format, and sends it on its way? Understanding how IP cameras work is important for understanding datasheet specifications and IP camera capabilities, so you can choose the best devices for your customers' needs. Knowing at least the basics of how this technology works can help you better choose the appropriate hardware based on the application requirements.

In this article, we examine the processes involved in video capture on IP cameras and the various associated technologies used for this purpose.

A quick intro to IP video

As you know, an IP video is a device with a lens that connects to the network via an Ethernet port or a Wi-Fi link, capturing video and sending it over the network to be viewed in real time or recorded on a network video recorder (NVR).

A camera records video by capturing light reflected from objects within its field of view, much like how our eyes give us sight. The reflected light that enters the camera lens is analog in nature. In another article, we examined how the human voice is digitized, and we explained in more detail about analog phenomena in the natural world. In the same sense, the capture of video is analogous to the recording of audio… pun intended 😉.

So, we must somehow convert this analog light into bits to be transmitted over an IP network. Let’s break down this process and unpack each step to more fully understand how this works.

Light capture

IP cameras, like all digital cameras, use a light sensor to collect light. This sensor uses a technology called complementary metal-oxide semiconductor (CMOS). The CMOS sensor is essentially a rectangular array of microscopic sensor elements. Each sensor element is composed of photodetectors, each sensitive to different wavelengths or colors of light.

Sensor elements

Due to the geometry of the sensor elements, four photodetectors are necessary, even though there are three primary colors to capture. The green photodetector was chosen to be duplicated to make the CMOS more sensitive to green light in much the same way that our eyes are more sensitive to it. The resulting pattern of photodetectors is called a Bayer filter mosaic. A small portion of a typical CMOS pixel array is depicted in the following diagram, along with a closeup of a single element.

Each sensor element corresponds to a single pixel of the captured video stream and is composed of a combination of the light levels captured by those four photodetectors, resulting in a specific brightness and color.

Photodetector structure

Each photodetector is composed of various layers, including a silicon substrate, a photodiode, a potential well, a color filter, and a microlens. When photons enter the microlens, they are focused to enter the photodetector. They pass through the color filter, ensuring that only the desired wavelengths are allowed through. The photons hit and are absorbed by the photodiode, resulting in the ejection of an electron from the photodiode material. That electron is captured in the potential well below the diode. The more electrons that are captured, the brighter the corresponding pixel becomes.

The transistors measure the charge that has accumulated in the potential well and transmit that information in the form of a voltage through the array. This voltage is then converted to a numerical value by an analog-to-digital converter (ADC), which assigns a discrete digital value corresponding to the amount of charge accumulated in the potential well. This process ensures that each pixel's brightness level is accurately represented in digital form. That value is then stored along with the values of all the other pixels in the array.

Once this process is complete, one frame of video will have been detected and stored. The electrons are then flushed out of the potential wells of all the photodetectors in the array, neutralizing any accumulated charge and making them ready to start capturing the next frame. This process is repeated at the frequency of the framerate for which the camera is rated. Typically, IP camera frame rates range from 5 to 120 frames per second.

Digital signal processing

Once a frame is digitized, the digital signal undergoes additional processing to correct for noise, adjust color balance, and optimize the image quality. This includes compensating for variations in sensor sensitivity, applying gamma correction, and employing various compression algorithms. All of these processes are performed by purpose-built digital signal processors (DSPs), ensuring that the resulting video stream is correctly processed and sufficiently compressed so it can be sent on the network as efficiently as possible.

Sizes and resolutions

Modern IP cameras are usually rated based on the pixel resolution of the video they provide, with terms such as “High Definition” and “4K” commonly being seen. Here is a table that describes these resolutions and gives an idea of the physical sizes of typical corresponding CMOS image sensors.

Resolution	Common Sensor Sizes	Aspect Ratio
HD (720p) (1280×720)	1/4", 1/3", 1/2.7"	16:9
Full HD (1080p) (1920×1080)	1/3", 1/2.7", 1/2.5", 1/2"	16:9
4MP (2688×1520)	1/2.7", 1/2.5", 1/2"	16:9
5MP (2592×1944)	1/2.7", 1/2.5", 1/2"	4:3
4K (8MP) (3840×2160)	1/1.8", 1/2", 1/2.5"	16:9
12MP (4000×3000)	1/1.7", 1/1.8", 1"	4:3
16MP (4608×3456)	1/1.7", 1"	4:3
20MP+ (5472×3648 and above)	1", APS-C, Full-Frame	4:3, 3:2

Note the following:

Smaller sensors (1/4", 1/3") are common in budget IP cameras and webcams.
Medium-sized sensors (1/2.7", 1/2.5", 1/2") balance cost and performance and are widely used in security IP cameras.
Larger sensors (1/1.8", 1/1.7", 1") offer better low-light performance and dynamic range and will more often be found in photography and videography devices.
Professional and high-end cameras that use APS-C and full-frame formats are used in specialized applications like cinematic surveillance or AI-based analytics. (“Full frame” in this context means the sensor is the same size as a 35mm film frame: 24x36 mm.)

As you look at the values of these resolutions, you can quickly see that the number of pixels in most of these CMOS sensors is in the millions and, in some cases, tens of millions. Since each sensor element has four photodetectors, the total number of photodetectors can exceed 100 million. You can begin to appreciate the processing power and the transmission speeds needed within the chip itself to achieve frame rates of 60 or 120 per second with larger sensor sizes.

Manufacturing process

A completed CMOS photodetector on a modern IP camera looks something like this:

The production of such devices is achieved using semiconductor manufacturing techniques that are based on solid-state physics. The process begins with fabricating a silicon wafer. Then millions of photodiodes and transistors are created using various processes, such as photolithography, ion implantation, and thin-film deposition, to form pixel arrays and readout circuits. Next, the microlenses and color filters are deposited on top of each photodetector, while metal interconnects are layered to route signals from each sensor element to processing units. Finally, the wafer is cut into individual sensors, packaged, and tested for sensitivity, noise performance, and defects before being integrated into cameras and imaging devices.

Browse our catalog for IP cameras

Conclusion

The development of CMOS image sensors for digital and IP cameras is a testament to decades of scientific innovation. From precision manufacturing to advanced light capture, digitization, and processing, even entry-level IP cameras incorporate sophisticated technology that is often taken for granted. Gaining insight into these processes helps us choose the best solution for our customers' needs.