A to Z of Digital Image Processing

Friday, 28 February 2014

Abstract DIP Model - Part III (Rise of Display Devices)

An abstract model for DIP came across my mind. It contains four sections viz; Acquire, Transfer, Display and Interpret. This post is belongs to Display section of abstract model. Refer earlier posts for Acquire and Transfer sections. This article deals with the rise of display devices and social factors that helped for the phenomenal rise. This post does not elaborate the principles of display technologies.

Display systems are one of the wonders of the 21st century. Origins of this wonder are blood soaked. Brutal World War II killed and maimed millions of people and destroyed many cities. Scientists of fighting nations, worked day and night to improvise their countries' war machine with new inventions. Two such inventions namely, computing machines and radar display units helped display systems to blossom into a multi-billion dollar industry. After the war, radar display units were modified to show pictures. This resulted in the birth of Television. After the war bulky computers paved the way to personal computers that possess Video Display Unit (VDU) as output section.

In earlier days Cathode Ray Tube (CRT) was “the display technology.” The shadow-mask colour CRT came to the market way back in 1950. Yet most information displays and imaging applications used monochromatic displays until 1970. Microprocessors in arrived in the market mid 1970s and facilitated the introduction of CRT colour displays for computers. The processing power of microprocessors enabled to encode, manipulate colour in these devices. In the next twenty five years, exponential growth in terms of technology and application of colour occurred [1].

An ideal display system is expected to possess the following features, namely, high contrast, maximum brightness, high resolution and lower cost. Moreover, consumer expects larger display, larger colour gamut and better colour saturation [1]. Customer expectations for color display have risen in rapid pace, driving the development of display technologies and allied colour control and image processing algorithms [1]. Till date no display technology possesses such venerated features. Thus diverse colour display technologies have evolved to support wide variety of applications. It is imperative to build taxonomy (i.e. classification) of display systems and technologies to comprehend them.

VDU classification

Classification of Visual display units can be based on duration, number of viewers, technology incorporated and targeted applications. Movie screens are viewed for minimum 90 minutes and in another extreme digital signage are viewed for a maximum of 90 seconds. VDU can be broadly divided into public-viewer, multi-viewer and mono-viewer displays based on number of simultaneous viewers. Display systems can be segmented into the following categories based on underlying technologies used. They are CRT, Digital Light Processing (DLP), Plasma Display Panel (PDP), Liquid Crystal Display (LCD), Light Emitting Diode (LED), e-ink (electronic ink). Targeted applications like entertainment, consumer, automotive, informative etc can be the basis of classification. I strongly feel classification like this is very fuzzy.

As categorization based on number of viewers is not standardized, let us define in the following way. Public-viewer displays are capable of exhibiting the content to more than 100 people at a time. The movie theatre projection systems perfectly fall into this category. Public address systems are used to reach large gathering of audience via loud speaker. Likewise public-viewer display reaches large viewer via projection systems. Multi-viewer display is suited for few people at a time. Television is the best example. As the name suggests mono-viewer displays are suited for single viewers. Instrument panels, automotive panels, Personal Digital Assistant (PDA), e-readers, tablets, mp3 player, cellphone and digital camera display falls into this category.

Display Characteristics

Comparison between display technologies can be performed based on the following parameters; brightness, contrast, field of view, colour gamut, resolution, physical dimension and cost [2]. Brightness is light perceived by the human eye and luminance refers to the amount of light emitted by a source say VDU. Luminance is measured in Candela per meter squared (cd/m2). Dynamic range refers to the luminance difference between white and black pixels [2]. Contrast also refers to the difference in range but with respect to images. But Contrast and dynamic range is used interchangeablely in every day use. I believe contrast is connected to perception of human eye and dynamic connected to luminance (i.e. measurable quantity). Field of view measured in degrees, determines the number of people can view the display device simultaneously. Colour gamut is an enclosed triangle in chromacity diagram. Area of triangle represents the VDU unit's ability to display range of colours. For details refer to the earlier post of this blog [3]. Resolution refers to the number of pixels and provides rough estimates of the height and width of image.

Public-viewer Display

Rapid proliferation of digital technology has affected conventional projection of movies through films. Today digital projection systems are widely used. It is a combination of projector and cinema screen which is made up of silver halide. Digital projectors can be broadly classified into three categories based on technology; DLP, Liquid Crystal on Silicon (LCoS) and Grating Light Valve (GLV). DLP is made up of tiny mirrors which can move -10 to 10 degrees [4]. Small rotation of mirror causes grey scale values. LCoS is similar to LCD and acts as a 'Window Blinds.' Thus passage of light is controlled by LCoS. Size of silver screen will be in the range of 14 meters to 20 meters (diagonal). The lamps used to display pictures on the screens consume around 1kW of power. The resolution of ranges starts from 2K, 4K and reaches 8K. The 2K resolution is equivalent to High Definition Television (HDTV) resolution. The luminance and dynamic range is very much higher than Television systems. Movies are viewed in dark environment. These reasons make 2K resolution is sufficient for the average movie patron. If a patron sit in the front rows and watch movies then he or she can spot the difference between 2K and 4K systems. Patron sitting in last row cannot distinguish the difference unless he or she is a digital cinema expert [citation required].

Figure 1. Display device classification. (a) Public-viewer display - Movie screen (b) Multi-viewer display (c) Mono-viewer display - smart phone Image Courtesy: Wikipedia

Multi-viewer Display

Television (TV) is the apt example for multi-viewer display. At present, the preferred technology for production of TV set is CRT. But within a decade, other technologies like LCD or PDP may take over.

Merits of CRT technology are as follows. (i) It is a century old technology. So, it is very matured. (ii) Display is bright. So, it is not very much affected by stray external lights. (iii) Long life and have high reliability. (iv) Inexpensive. Cost per pixel is lowest among the display technologies. (v) In 1980s, 640 x 480 was high resolution. Today HDTV (1920x1080?) resolution CRTs are available. (vi) Viewing angle is high. This enables multiple people to view the TV comfortably.

The demerits of CRT technology are as follows. (i) It is very bulky i.e. voluminous. Weight is also high compared to other display technologies. (ii) Size of the screen is limited. Seeing 36” and above is rarity or even a technical wonder. (iii) Consumes high power. Minimum power consumption is 100W per hour. (iv) Emits electromagnetic radiation.

Introduction of HDTV standards (in USA, Japan) created a need for larger display. Next proliferation of digital satellite broadcasting helped to get crisp video signals. Satellite TV channels earmarked for sports and movies created a need for large screen. Conventional large CRT screens spoilt the aesthetics of the room in which it was housed. Slim form factor of new technologies resembled wall hanging paintings. Thus Techno-savvy people preferred to shift their loyalty to LCD and PDP technologies. They were ready to shell out extra money for aesthetics.

The other merits of new technologies are as follows, (i) Rich colour (ii) Consumes less power (particularly LCD) (iii) Suited for high resolution (iv) Viewing angle is high for PDP and relative low for LCD (v) No electromagnetic radiation (vi) Suited for mobile environment (like in car, van, lorry i.e. truck)

Mono-viewer Display

In earlier days, computer screens were the dominant mono-viewer display. Automotive and industrial instruments used mechanical dials only. Rapid automation and digitization of the industry paved way for electronic displays. Rapid increase in oil (i.e. gasoline) price made automobile manufacturers to opt for fuel efficiency. All mechanical controls were replaced with electronic controls and electronic displays were introduced to assist the drivers with timely information (for example GPS enabled car).

Prices of computers were falling and after the advent of Web, non-programmers started using the computers extensively. A huge pool of volunteers put up required content for the Web. The content is in digital form. Internet connects geographically separated computers and enables seamless flow of digital data between them. Thus digital delivery favoured display of information, rather than print form of information [5].

Parallelly, cell phones which were built to transfer speech were improvised to transfer digital data and act as low-end digital cameras. In earlier days, most of the mobile displays were expected to display alphanumeric and limited graphical icons. This was satisfied with monochromatic screens. The requirement for colour screens emerged because of the need of view finder in digital camera and on-board monitors to display captured pictures. The requirement was emboldened by picture phones and embedded digital camera in cell phone [1]. As the mobile phone user base far exceeds the Personal computer user base, large amount of small-screen display are manufactured. Internet enabled smart phones were introduced and size of the screen increased to 5” (pocket size of an adult). These screens have very good resolution to facilitate reading.

Seven inch electronic readers help to house electronic books. E-readers can hold as much as e-books as memory can hold. Font size of text can be customized. Search facility is possible [6]. It is a very handy method for globe trotters to take their favourite book shelf. Thus E-readers made a niche market for itself. The E-readers extensively rely on e-ink technology. It is a bistable technology. So power is not required to display content. Power is used only when a page is turned (i.e. previous or next). It consumes very low power. Present day battery technologies augment e-reader dominance. Full colour e-ink is emerging. Present day e-readers are Sony reader, Amazon Kindle and Barnes & Noble Reader.

Mobile devices need display units that are capable of visibility under diverse illumination environments, small form factor, less power consumption and longer battery life. Under this conditions LCD is the dominant technology. Organic Light Emitting Diode (OLED) is a promising technology [1].

Display of high quality image is a great engineering challenge. The colour gamut of mobile colour displays is compromised compared to TV and computer screens. This is due to the limitations of mobile computing power. Mobile devices have limited processing resources. They are expected to provide ability to handle out-of-gamut colours, contrast stretching and saturation enhancement. This has to be carried out by image processing algorithms [1].

Source

[1] Louis D. Silverstein, “Color Display Technology: From Pixels to Perception,” The Reporter, vol. 21, no. 1, pp. 1–12, Feb. 2006.
[2] Paul Anderson, “Advanced Display Technologies,” JISC Technology & Standards Watch.
[3] A to Z of Digital Image Processing: Abstract DIP Model - Part III (Science of colour) [Online] http://diwakar-marur.blogspot.in/2014/01/abstract-dip-model-part-iii-science-of.html
[4] A to Z of Digital Image Processing: Digital Cinema Projection Technologies [Online] http://diwakar-marur.blogspot.in/2013/05/digital-cinema-projection-technologies.html
[5] ADT Michael Kleper, Advanced Display Technologies, A Research Monograph of the Printing Industry Center at RIT, Rochester, New York, USA, October 2003.
[6] Eva Siegenthaler, Laura Schmid, Michael Wyss and Pascal Wurtz, “LCD vs. E-ink: An Analysis of the Reading Behavior,” Journal of Eye Movement Research, 5(3):5, pp. 1–7, 2012.

Note

All articles related to science and technology glamourize the technological feats. This 2000 worded article is written in a technology deemphasised and human centered approach. In this approach, technology is seen as a tool to achieve human goals and technology per se is not human goal. I want to acknowledge Mr. Varun Vinod for doing proof correction for this article.

Friday, 31 January 2014

Abstract DIP Model - Part III (Science of colour)

An abstract model for DIP came across my mind. It contains four sections viz; Acquire, Transfer, Display and Interpret. This post is third part of the series. First and second part discussed Acquire and Transfer blocks. Please refer November 2013 post in my blog for detailed introduction.

A colour picture is mapped into matrix of numbers, compressed and stored as an image file. At the time of display on the computer screen, files are decompressed and numbers are remapped into pixels. In the case of printing, numbers are remapped into dots. Thus display block is connected with remapping of numbers into pixels. To understand the functioning of the block one should know following things; Science of colour, Screen technologies, Printing technologies and Mapping algorithms. In this post only science of colour will be discussed.

Science of Colour

Light enters into eye via cornea and it is focused by lens and reaches the retina. It lies on the inside of eye wall. It contains two types of photoreceptive cells called cones and rods. Six to seven million cones lie near the central portion of retina called fovea. The maximum absorption occurs at 430, 530 and 560 nm and they can be referred as red, green and blue cones. Approximately 65% of cones are sensitive to red, 33% are sensitive green and remaining 2% is sensitive to blue color. But blue cones are extremely sensitive. Rods are around 70 million in numbers and they are spread all over retina. The rods are very sensitive to light and low levels of illumination are sufficient to function. Cones require bright light to function. That is why a brightly coloured flower in the daylight appears colourless in the moon light.

Visible range exists between 400 nm to 700 nm. The entire visible spectrum is divided into 10 nm wide band and represents spectral power. Thus a collection of 31 bands may be used to describe a particular spectral density distribution. Or else simply use red, green and blue primaries to represent a colour. Isaac Newton said, “Indeed rays, properly expressed, are not colored.” In the real world only Spectral Power Distributions (SPDs) exist and colour is perceived by our eyes and brains [1]. In image processing we use simple red, green and blue to describe colour rather than 31 bands SPD.

Any image acquiring device (still or video camera) as well as image display device should have the same spectral response as the human eye. With prevailing technologies, spectral responses of devices can be made as close to human visual system. Radiance means amount of light is emitted from a source. It is objectively measurable quality. Luminance means light intensity perceived by eye. Infra red (IR) radiance can be measured but IR ray luminance will be zero as eyes can’t perceive IR.

CIE MODEL

The science of colorimetry tries to find the relationship between SPD and perceived color. Way back in 1931 International Colour Commission (in French it is Commission Internationale de L’Éclairage (CIE)) developed a tristimulus model that have X, Y and Z primaries. In this Y corresponds to Luminance of the light. These XYZ tristimulus primaries are developed by colour matching experiment. X, Y and Z form a colour volume. If the luminance component is ignored then it becomes a two-dimensional picture. Representation of colour without the luminance component is created by following way.

x=X/(X+Y+Z) and y= Y/(X+Y+Z)

A plot between x and y is done and it comes in the shape of shark-fin as in Fig. 1. This is called chromaticity diagram. It is the desired two-dimensional plot without the presence of luminance.

Figure 1. The chromaticity diagram (shark-fin shaped)

contains colour gamut (triangular shaped) of a device.

Camella red is seen in white colour. Image courtesy: Wikipedia

Gamut: A triangle inside the chromaticity diagram provides a fair idea about the colour reproduction capability of display or capture device. The gamut is device dependent entity and size varies among devices. Larger gamut is always preferable. At present no device is capable to capture or display all colours shown in chromaticity diagram. For example, Camellia flower shown in Fig. 2, lies outside the gamut of given device [2]. It is shown in the Fig.1 as white spot. As a result, the display device fails to faithfully reproduce the original colour of camellia flower. It will use nearest equivalent colour to represent the flower.

Figure 2. The Camella flower
Image courtesy: www.techmind.org

RGB MODEL

Later Red, Green and Blue colours are used as primaries and it is called RGB model. This colour space is used for display devices. Any colour can be created in display devices by adding red, green and blue colour in a proportion. This is called Additive Mixing. The normalized values of primaries range from 0 to 1. Thus it forms a perfect colour cube. This colour model is device dependent. It implies a when a RGB value is passed to display devices they need not give guarantee to produce the same colour. Slight variation in colour is expected. Using RGB colour space it is not possible to create all colours. But the colours produced by RGB model are sufficient for practical purposes.

A variety of devices that is used to capture image and display follow power law equation i.e light intensity is proportional to some power of signal amplitude or pixel value. The exponent component value varies from 1.8 to 2.5. Pre-multiplying by inverse of exponent is called gamma correction. Appearance of image in a Cathode Ray Tube (CRT) screen will be darker than the original image without the involvement of gamma correction. Introduction of gamma makes RGB colour space as non-linear. To differentiate linear and non-linear RGB a suffix apostrophe is added with each colour component. Thus R' represents gamma corrected red colour. Many books fails to understand the important difference and use R and R' interchangeably.

Hewlett-Packard (HP) and Microsoft developed a new colour space, sRGB, which is specifically suited for operating systems and Internet. In sRGB the gamma value is 2.2. It is not a perfect colour space as CIE's XYZ. But it is representative of the majority of the devices on which colour is viewed by average computer user. The new colour system helps to create substantial degree of consistency among the various devices [3].

Figure 3. (a) Additive colours (Light) (b) Subtractive colours (Pigments)

CMYK MODEL

The RGB colour model that is based on additive mixing of colours is suited for printing purposes. Printing press produces an image by reflective light. This falls into category of colour generation by subtractive process. Refer Fig. 3 for further understanding. The secondary colours that are used in printing are Cyan, Magenta and Yellow. Cyan pigment will absorb red light and reflect all other colours. Likewise magenta will absorb green and yellow will absorb blue. Black colour may be produced by cyan, magenta and yellow. But it will be muddy-looking black. Carbon black that is used to produce black in printing is cheaper than coloured pigments. These are the two reasons that make four colour model as standard. The colours are cyan, magenta, yellow and black (CMYK) [4].

OTHER MODELS

The RGB and CMYK are the primary colour models used for display and printing. Few other models like YUV, YIQ, CIELAB, CIELUV, HSI, HLS, and HSV color models are used for specific applications.

YUV MODELS

Television broadcasting carries colour pictures from one place to another and projects the pictures on Television screens. Colour reproduction in screens is performed using additive mixing. Thus obvious choice of colour space should be RGB. But Television and Video systems seldom use the RGB as it is bandwidth inefficient. Next, when colour television broadcast was introduced in way back in 1950s there were lots of B&W TV sets. To cater to monochrome TVs the colour broadcast was made backward compatible. It means old B&W TVs can receive colour TV signals but B&W signals only projected on the screen. Our eye is sensitive to luminance variations than chrominance variations. Thus RGB colours were transformed to YUV colour space. Where Y stand for luminance and U and V contain colour information. Video engineers piggy-backed colour information on the TV composite signal to conserve bandwidth. Thus colour TV signals consumed same bandwidth as monochrome signals (6 to 7 MHz). This was an engineering marvel. The YUV space is used by PAL (Phase Alternation Line, Europe and Asia), NTSC (National Television System Committee, USA), and SECAM (French system) colour TVs [5],[6].

The conversion equations between R'G'B' and YUV is given below

Y = 0.299R ́ + 0.587G ́ + 0.114B ́

U= – 0.147R ́ – 0.289G ́ + 0.436B ́

V = 0.615R ́ – 0.515G ́ – 0.100B ́

YIQ colour space is a variant of YUV and used in NTSC systems. Here I stand for In-phase and Q stands for quadrature. Digital Video standard ITU-R BT.601 (International Telecommunication Union - Recommendation) uses YCbCr which again a scaled version of YUV. Here Cb and Cr represent chrominance signals. The High Definition Television (HDTV) uses ITU-R BT.709 standard. The primaries used in this colour space closely correspond to the contemporary monitors.

Source

A Guided Tour of Color Space [Online] http://www.poynton.com/papers/Guided_tour/abstract.html
Introduction to colour science [Online] http://www.techmind.org/colour/
sRGB: A Standard for Color Management - NEC SpectraView - spectraview.nec.com.au/wpdata/files/40.pdf (PDF, 1.1 MB)
Rafael C. Gonzalez and Richard E.Woods, Digital Image Processing, 2nd edition, Pearson Education.
Keith Jack, Video Demystified: A Handbook for the Digital Engineer 4th edition, Newnes Publishers
Noor A. Ibraheem, Mokhtar M. Hasan, Rafiqul Z. Khan, Pramod K. Mishra, "Understanding Color Models: A Review", ARPN Journal of Science and Technology, vol. 2, no. 3, April 2012, pp.265 -- 275.

Tuesday, 31 December 2013

Abstract DIP model - Part II

An abstract model for DIP came across my mind. It contains four sections viz; Acquire, Transfer, Display and Interpret. This post is second part of the series. Please refer earlier post (November 2013) for detailed introduction and to know about Acquire block.

Transfer Block:

It is in-between Acquire and Display block. As in Figure 1, input and output of Transfer block is digital data. It acts as a data compressor at the transmitting side and as a data expander at the receiving side (Display Block). This is because sending digital data without compression is a bad idea. Figure 1 describes a scenario. Here a person shoots an image using digital camera and transfers the image to hard disk in JPEG file format. Later it is opened and displayed on the computer screen. Here camera functions as acquire block as well as transmitter-side of Transfer Block. Computer functions as a receiver-side Transfer block as well as display block.

Figure 1 Digital Camera and Computer Interface

The transfer of digital data between Transfer Sub-blocks may occur through communication channel. This may be through wired (cables) or wireless medium. By nature all communication channels are noisy. It means that the data sent over the channel will be corrupted (1 becomes 0 or 0 becomes 1). A channel is considered good, if out of one million bits, one bit goes corrupt (1 out of 1000000). Various measures are taken to minimize or eliminate the impact of noise. Adding extra bits to detect as well as to correct the corrupt bits is one such measure. In CD and DVD Reed-Solomon Codes are extensively used. For further information search Google with the keyword “Channel Coding.” One may wonder when CD becomes a communication channel. Normally transfer of data from point 'A' to 'B' takes place via a channel and takes finite time (order of milliseconds). If the transfer of data takes near-infinite time then it can be considered as stored. Thus from this view point, transmission of data and storage (in CD or DVD) are functionally same.

Why compression?

The photons are converted into charge by image sensor. All the charges are read-out and they form a electrical signal. This analog signal is sampled and quantized to produce digital signal. Because of spatial sampling, resultant data is voluminous. Thus each image sensor's photon accumulation is represented in a pixel as a digital value.

Image is made up of rows and columns of pixels. The row and columns of data can be represented in matrix form. Programmers consider image as an array. Each pixel may require one bit to multi-bytes to represent digital data. A pixel from two coloured image (say black and white) requires only one bit to represent. A gray scale image uses 8 bits to represent shades of gray. Black and White TV images falls under this (Gray image) category. Colour image is composed of red, green and blue colour. Each colour shade requires 8 bits. Thus 24 bits (3 bytes) are required for each pixel. A HDTV size frame (image) possesses 1920 x 1080 pixels and requires 6075 KB (1920 x 1080 x 3) size storage. A one minute video requires 8900 MB (6075 KB*25*60). Thus half a minute video will gobble one DVD. It requires 170 DVDs for single movie. One may wonder, how a entire Hollywood movie (nearly 90 to 100 minutes) is put inside a DVD. The answer is compression. The main objective of this example is to make ourselves to realize the mammoth data size of image.

Solution: Remove Redundancy

There is a high amount of correlation exists between pixels in continuous tone images (typical image from digital camera). Thus one can guess a pixel value by knowing the values of neighbouring pixels. Put in another way the difference between a pixel and its neighbours will be very minimum. Engineers exploit this feature to compress a entire Hollywood movie into a DVD.

Redundancy can be classified into interpixel redundancy, temporal redundancy and psychovisual redundancy. Temporal (time) redundancy exists in video only and not in image. Our eyes are very sensitive to gray scale variation than colour variation. This is an example for psychovisual redundancy. By reducing redundancy high compression can be achieved. Transform coding converts the spatial domain signals (image) into spatial frequency domain signals. In the spatial frequency domain, first few coefficients contain large amplitude (value) and rest of the coefficients contains very small amplitude. In Figure 2, bar height represents the value. White colour bars represent the pixels and pink colour bars represent DCT coefficients. The real compression occurs by proper quantization of coefficient amplitude. The low frequency components i.e. first few coefficients are mildly quantized. The high-frequency coefficients i.e. rest of coefficients is severely quantized and outcome reaches near zero value. High frequency signals are highly attenuated (suppressed) but not eliminated. The feel of image crispness arises due to the presence of high spatial frequency components. Once high frequency signals are removed from an image become blurred. In JPEG, colour images are mapped into luminance and two chrominance layers (YCbCr) and Cb and Cr layers (psychovisual) are highly quantized to achieve high compression.

Figure 2 Spatial domain and Spatial Frequency domain. Courtesy [1] hdtvprimer.com

The quantized coefficients are coded using Variable Length Code (VLC) and then sent to receiver or put into storage device. In the VLC, highly occurring code is alloted with fewer bits and rarely occurring codes are alloted with more number of bits. Very good example for VLC is Morse code. Well known Save Our Souls (SOS) signal is represented as dot dot dot dash dash dash dot dot dot (...---...). In English language S and O are frequently occurring so they given shorter code. Less occurring letters like X, U will have longer code. Huffman code is a VLC, that provides very good compression.

In the receiver side VLC are decoded. The reverse operation of quantization occurs and transformed coefficients are again reconverted into spatial signals to produce the reconstructed image. Severity of quantization and file size are directly proportional. Image quality and quantization severity are indirectly proportional. The quantization makes an irrecoverable loss of signal i.e. it is impossible to recover original signal from quantized signal. For our eyes compressed JPEG image and the original image are practically indistinguishable.

Compression of images using quantization of spatial frequency coefficients is called lossy compression. This method is not permitted for medical images and scanned legal documents. Thus lossless compression is used. A image with 100 KB file size can be compressed into 5 KB file size using lossy compression. But with lossless compression one can achieve only 35 KB file size. Lossy and lossless compression is possible with JPEG. Advanced version of JPEG is JPEG2000. In the JPEG2000 Wavelet transform is used instead of Discrete Cosine Transform (DCT).

Spatial Domain Compression

The transform coding poorly performs on cartoon images with limited colours and line art images. The exploitation of correlation can be carried out in spatial domain itself. VLC can be used to used to compress this sort of images. But underlying source probability of image is required for efficient compression. To overcome this problem dictionary codes are used. All ZIP compression application use dictionary coding. This coding method was developed by Limpel and Ziv in way back in 1977 and the method was named as LZ77. Next year, LZ78 arrived. Later Welsh modified LZ77 to make it much more efficient. It was named LZW. In 1980 Graphics Image Format (GIF) was introduced in Internet. It extensively used LZW. Few years later people come to know that LZW is a patented technique. This sent jittery among Web developers and users. Programmers came out with alternate image standard called Portable Network Graphics (PNG) to subdue GIF dominance. PNG uses LZ77 technique and patent free. In dictionary coding, encoders search for patterns and then patterns are coded. Longer the patterns better the compression.

Knowledge on Information Theory is required to understand to evaluate various VLC. Information theory is an application of probability theory. What is information? If a man bites a dog then it is news. This is because chance of event occurrence is very low and instills interest to read. Put it in another way information value is very high. (Pl. don't confuse with computer scientist usage of information)

Digital content duplication is a simple affair. So, content creators were forced find some ways and means to curb piracy. Digital Watermarking is one such solution to curb piracy. Here copyright information is stored inside the image. Presence of TV logo on television programmes is a very good example. Invisible watermarking schemes are also available. Steganography is a art of hiding text in images. Functionally digital watermarking and steganogragphy is similar but their objectives are totally different.

Note
The objective of this post to give overview of Transfer block. For more information please Google highlighted phrases.

Source

1. What is exactly atsc [Online]. Available http://www.hdtvprimer.com/issues/what_is_atsc.html

Saturday, 30 November 2013

Abstract DIP model - Part I

A comprehensive all encompassing abstract model for Digital Image Processing (DIP) come across my mind. Let me put forth my thoughts in a lengthy manner to span several posts. This post is the first part of the series.

Introduction

Normally DIP is studied as a stand alone subject. Learners misunderstand the subject and associate it with compression, compression and compression. I personally feel it should be studied as “part of a whole.” Then only the real face of DIP can be perceived. My abstract model is an outcome of 'part of a whole' philosophy. As it lacks the academic rigour, model is not suited for scholarly publication. But model may be helpful to gain insights and dispel myths about DIP.

Engineers are expected make products to improve the quality of life of human beings. They are expected to use scientific knowledge in product making. The product are made in industry and sold in the market. The required level of knowledge about industry and market is not taught in the curriculum. This severely hampers engineers' thinking. Hard liners may counter argue in following way. “Part of whole thinking” will dilute engineering. If a student wants to learn about market, let him do an MBA.

The abstract model contains four sections viz.; Acquire, Transfer, Display and Interpret. In practice images are captured and then either stored or transferred. Later they are either printed on paper or shown on a screen and the images are interpreted by human brain with the help of eyes. What is new in this model is human brain is brought to the fore and not human eye. One may wonder, why human eye is not given the due credit or put in other way, why human brain's role in seeing is given undue importance in this model. Whether it is a sensational article written to draw more visitors? Please read the article further and I assure you all your anxieties will tend to cease.

Acquire

The responsibility of acquire section extends from acquiring reflected light from the subject that is shot till conversion of captured light into electrical signals. It has four subsections viz. Lens, Sensor, Read-out electronics and A-to-D converter. Lenses collect the light that is reflected from the subject and focus it on the sensor. Array of sensors ('n' rows x 'm' columns) are used to in camera to capture images. Number of sensors in the array and resolution of image is directly proportional. Sensors can be categorized into CMOS and CCD type. We all know numerous photons forms a light ray. Photon impinges on the sensor's photosite (i.e. light sensitive area), and the electrons in the valence band moves to conduction band. This causes flow of electrons and forms current in the sensor. This phenomenon is called photoelectric effect. The stored charges in the sensors can be treated as tiny capacitors (we know junction capacitance in diode can be treated as capacitor). In a sensor, only 40 % of area is covered by photosensitive material. Remaining area is filled with amplifiers and noise reduction circuits [1]. The charge stored in tiny capacitors (actually sensors are built using MOS transistor), has to be read out before they get discharged (similar to working of dynamic RAM). Faster reading-out is required for higher resolution images. Then read-out voltage signals are amplified and converted into digital signals (or data). I guess higher the resolution leads to lesser the A-to-D conversion time pixel. For detail discussion refer [2], [3]. Figure 1 beautifully explains the concept of read-out [3]. Line sensor arrays (1 x m) are used in photocopying (Xerox) machines. Here a stick that contains row of sensors moves from top of the page to the bottom of the page to collect the pixel information. In thermal systems only single pixel sensors (1 x 1) are available.

Figure 1. Photon collection by photosite and read-out

The above paragraph would have provided the functioning of light capturing in a superfluous way. Technical details are trimmed to minimum level so as to highlight the principle of light capture. Knowledge on optics and machining is very important to fabricate lenses. The power of DSLR camera hinges on powerful lenses. Good knowledge on micro electronics is absolutely essential to understand the functioning of sensor, read-out amplifier and A-to-D converter. To design and fabricate reasonable good resolution acquiring subsystem, a sound knowledge on Very Large Scale Integration (VLSI) and knowledge on related software tools are essential. In reality subjects like optics, microelectronics and VLSI are taught even without veiled referenced to camera or scanner systems.

The technology has reached to such a stage that even entry level camera (low priced camera) is capable of taking 10 Mega pixel resolution images. When film based camera reigned, photography was a costly hobby. So very few bought the camera. To acquire digital colour image requires three filters namely red, green, and blue. Use of three filters is costly and instead single sensors are used to cut down the cost. For that 'Bayer Patterns' are used. When Bedaprata Pain [4] and his team developed affordable CMOS active pixel sensor, digital camera become affordable and today every mobile phone is embedded with a camera.

Product Market

The next level of innovation will be in improving usability of camera and not in cost cutting. As the cost comes down heavily quantum of profit will also comes down. To maintain the profit industries go in for volume. Let Camera Company named ABC sells 1000 camera for the price of Rs. 5000. Let the profit be Rs. 500. The net profit is Rs. 5,00,000 (1000 camera x Rs 500). If the same company sells 10000 camera for the price of Rs. 3000 then the net profit is Rs 30,00,000 (10000 camera x Rs 300 as profit). Profit has increased many folds. This logic go well until everyone acquires a camera. After that ABC has find innovative ways to keep the net profit same.

The ultimate aim of the camera manufacturing companies can be put in this way “even a moron should take pictures like a professional photographer.” As we all know we have huge number of amateurs and very few good photographers. The improve the market size of costly DSLR (Digital Single Lens Reflex) camera, industries should target the huge amateur base. But general public neither have patience nor time to become like a professional. To bridge the skill gap lot of intelligence is added in the camera.

Market need satisfying algorithms

Face detection algorithms are used to help to shoot proper pictures by amateurs. Earlier this feature was available in point-and-shoot cameras. Nowadays this feature is extended to professional models like DSLR cameras. Most of us, are unable to set proper ISO, aperture and shutter speed for the required shot. That is why auto-focus and auto-lighting cameras sprung up. But there is a lot scope for improvement in these cameras. Next, amateurs' hands are not stable at the time of taking shot and invariably it results in shaky pictures. This can be corrected by using “image restoration” class of image processing algorithms. Sometimes enough lighting may not be available at the time of shooting or extraneous light may fall on the subject. These errors can be partially corrected using image editing softwares like Photoshop and GIMP. Photoshop is the most popular commercial image editing software and GIMP (GNU Image Manipulation Program) is a open and free software. Lot of image processing algorithms will be deployed in ensuing intelligent camera.

Source
1. How Digital Cameras Work, [Available Online], http://www.astropix.com/HTML/I_ASTROP/HOW.HTM
2. Digital Processing Techniques, [Available Online], http://www.astropix.com/HTML/J_DIGIT/TOC_DIG.HTM
3. ZEISS Microscopy Online Campus | Microscopy Basics | Understanding Digital Imaging, [Available Online], http://zeiss-campus.magnet.fsu.edu/articles/basics/digitalimaging.html
4. Bedabrata Pain - Wikipedia, the free encyclopedia. [Available Online], http://en.wikipedia.org/wiki/Bedabrata_Pain

Thursday, 31 October 2013

Television and Movies – Visual Fast Food?

Everyday we encounter lot of pictures. Pictures appear in television, cinema, newspaper, magazine or the Web. Pictures are used to convey emotions and messages. Except a few, most of us take things for granted and spend our scarce resource (thinking) on odd or rare events. For example until Sir Isaac Newton, the falling of apple from a tree was considered the norm and people simply consumed the fallen apple. Likewise viewing pictures are taken as a very usual thing and we skirt to think about it. In this post, the discussion will be on the “Role of pictures in our life.”

It will better to define first and then get into the essay. Pictures can be classified eye-captured pictures, device-captured pictures and synthetic picture. If I physically visit the Amazon jungle and enjoy the beauty through my own eyes then I call it eye-captured picture. If I see the Niagara Falls in a movie or television or magazine then I call it as device-captured picture. A picture that is created by artistic rendition with or without computers is called as synthetic picture.

Two hundred years back pictures mean almost eye-captured pictures only. Rich people only had the opportunity to own synthetic pictures (paintings). Commoners who live in big cities like Rome would have enjoyed the Michelangelo paintings in the ceiling of the Sistine Chapel. Colour photograph started appearing after 1861. It helped to capture portrait of a person or natural landscape with less effort and time. Earlier times painters performed this task. Thus human painter was substituted by colour camera. But it was no way an easy task produce multiple copies. First colour illustrations appeared in newspaper in 1934 in UK. To have a glimpse of old colour photographs refer [1]. Colour television emerged in the year 1960 in USA. After 1980 World started seeing lot of device-captured pictures. I can fairly assume everyone see TV for two hours per day. The amount of pictures in print medium (newspaper, magazine) is relatively less. On the Web, picture is more than print medium but lesser than the TV. Let us conclude in a day average device-captured pictures viewed is for two and half hours (two hour tv plus half an hour Web and print). Within a span of two hundred years, the time spent on device-captured pictures rose from near zero to 150 minutes.

A cursory glance of “150 minutes of device-captured picture viewing” looks like a trivia. At most it may amuse people and make people feel proud of technological superiority. Broadcast media (TV and movie) is a medium that transcends the distance. For example seeing a war, seeing piranha fish present in Amazon rivers, and seeing skiing in Alps mountain range with naked eye is a rarity for a common man living in India. Thus one is able to have a near real experience in battle field, jungles and sky scrapers without moving from their physical place.

A coin has two sides. Likewise the ability to “transcend distance and take part important events in world” has profound positive and negative effects. Without picture, visualizing Amazon jungle with textual description is nearly impossible. Our knowledge has tremendously increased with the rise of access to pictures. People in India know very well about US President Barack Obama, Osama bin Laden, Bruce Lee, Hollywood celebrities, Kangaroo, Niagara Falls, and Eiffel Tower all because of pictures. Learning medicine, architecture, archaeology and many fields become easier because of availability of pictures. Forensic experts are able to identify criminals without physically visiting the crime spot. Surveillance camera captured pictures which help us to prevent crime as well as to capture the criminals.

The negative sides are “we are conditioned to see what we want to see” and the gap between device-captured and eye-captured pictures is very high. When viewing a TV and movie, we see the world through the eyes of content creator (director of movie). In one sense our freedom is lost. As watching movies acts as a medium of escape, we voluntarily subject our-self for loss of freedom. Thus it becomes easy to mass brain-wash the so called modern man than their ancestors. Next, a person in India via TV can live in America for few hours per day. So the distinction between real and reel shrinks and confuses person's thinking ability.

The important point is we see extremes in device-captured images. First principle of journalism states that “If a dog bites a man then it is not news and if a human bites a dog then it is news.” Mathematically it means lower the probability of occurrence, higher the probability to be published. That is why we see six-pack males like Arnold Schwarzenegger and Sylvester Stallone, handsome Leonardo DiCaprio and beautiful hour-glass females. Fig.1 contains the still from the movie Titanic and it is very romantic. Seeing this kind of romantic encounters with naked eye is almost impossibility. Thus for 150 minutes we see what is not possible to see with naked eye.

Figure 1. A romantic scene from the movie Titanic Courtesy: Internet

Studies have established long hours of TV viewing affects children's ability to learn, their retention capability and socializing skills. The impact of camera-captured pictures in humans has to be documented with scientific data. Seeing a picture is not an independent task of eye alone. It is an outcome of close coordination between eye and brain. Thus it will be better to say “We Perceive” than “We See.” When we encounter optical illusions our brain fails to interpret the incoming visual signal from the eye properly. When ever meditation or prayer is performed, we normally close our eyes. This helps us to cut down the distraction as well as reduce the work load of brain. Seeing something actually makes our brain to give priority to process on the incoming visual signal. That is why when we sit in a park or watching TV makes us to feel as if we are getting rid of our problems. Actually, the brain starts to process visual signals rather than pondering on the problem.

We have a high intake of processed food (fast food) compared to our ancestors. Wide spread prevalence of life style diseases like diabetes, obesity are linked to processed food. Similar in lines we have high intake of camera-captured pictures compared to our ancestors. Will it creates any problem for us?

Before we wind up we will have quick recap of what we have discussed in this post.

Eye-captured, Camera-captured and Synthetic pictures
Camera-captured images are different from eye-captured pictures
Camera-captured picture from 18th century, transcends distance, captures extreme events
Amount of camera-captured pictures is 150 minute per day and 200 years back almost zero minutes.
With camera-captured images voluntary brain-washing is carried out.
Cognitive load on brain due to camera-captured pictures is high.
Camera-captured picture = Fast food

Source
1. Colour images from 1930s unveiled – Daily Record [Available Online] http://www.dailyrecord.co.uk/news/scottish-news/colour-images-from-1930s-unveiled-1276428

Acknowledgement
Grammatical correction was carried out by a final year engineering student.

Friday, 28 February 2014

Abstract DIP Model - Part III (Rise of Display Devices)

Friday, 31 January 2014

Abstract DIP Model - Part III (Science of colour)

Tuesday, 31 December 2013

Abstract DIP model - Part II

Saturday, 30 November 2013

Abstract DIP model - Part I

Thursday, 31 October 2013

Television and Movies – Visual Fast Food?

Total Pageviews