A to Z of Digital Image Processing

Friday, 31 January 2014

Abstract DIP Model - Part III (Science of colour)

An abstract model for DIP came across my mind. It contains four sections viz; Acquire, Transfer, Display and Interpret. This post is third part of the series. First and second part discussed Acquire and Transfer blocks. Please refer November 2013 post in my blog for detailed introduction.

A colour picture is mapped into matrix of numbers, compressed and stored as an image file. At the time of display on the computer screen, files are decompressed and numbers are remapped into pixels. In the case of printing, numbers are remapped into dots. Thus display block is connected with remapping of numbers into pixels. To understand the functioning of the block one should know following things; Science of colour, Screen technologies, Printing technologies and Mapping algorithms. In this post only science of colour will be discussed.

Science of Colour

Light enters into eye via cornea and it is focused by lens and reaches the retina. It lies on the inside of eye wall. It contains two types of photoreceptive cells called cones and rods. Six to seven million cones lie near the central portion of retina called fovea. The maximum absorption occurs at 430, 530 and 560 nm and they can be referred as red, green and blue cones. Approximately 65% of cones are sensitive to red, 33% are sensitive green and remaining 2% is sensitive to blue color. But blue cones are extremely sensitive. Rods are around 70 million in numbers and they are spread all over retina. The rods are very sensitive to light and low levels of illumination are sufficient to function. Cones require bright light to function. That is why a brightly coloured flower in the daylight appears colourless in the moon light.

Visible range exists between 400 nm to 700 nm. The entire visible spectrum is divided into 10 nm wide band and represents spectral power. Thus a collection of 31 bands may be used to describe a particular spectral density distribution. Or else simply use red, green and blue primaries to represent a colour. Isaac Newton said, “Indeed rays, properly expressed, are not colored.” In the real world only Spectral Power Distributions (SPDs) exist and colour is perceived by our eyes and brains [1]. In image processing we use simple red, green and blue to describe colour rather than 31 bands SPD.

Any image acquiring device (still or video camera) as well as image display device should have the same spectral response as the human eye. With prevailing technologies, spectral responses of devices can be made as close to human visual system. Radiance means amount of light is emitted from a source. It is objectively measurable quality. Luminance means light intensity perceived by eye. Infra red (IR) radiance can be measured but IR ray luminance will be zero as eyes can’t perceive IR.

CIE MODEL

The science of colorimetry tries to find the relationship between SPD and perceived color. Way back in 1931 International Colour Commission (in French it is Commission Internationale de L’Éclairage (CIE)) developed a tristimulus model that have X, Y and Z primaries. In this Y corresponds to Luminance of the light. These XYZ tristimulus primaries are developed by colour matching experiment. X, Y and Z form a colour volume. If the luminance component is ignored then it becomes a two-dimensional picture. Representation of colour without the luminance component is created by following way.

x=X/(X+Y+Z) and y= Y/(X+Y+Z)

A plot between x and y is done and it comes in the shape of shark-fin as in Fig. 1. This is called chromaticity diagram. It is the desired two-dimensional plot without the presence of luminance.

Figure 1. The chromaticity diagram (shark-fin shaped)

contains colour gamut (triangular shaped) of a device.

Camella red is seen in white colour. Image courtesy: Wikipedia

Gamut: A triangle inside the chromaticity diagram provides a fair idea about the colour reproduction capability of display or capture device. The gamut is device dependent entity and size varies among devices. Larger gamut is always preferable. At present no device is capable to capture or display all colours shown in chromaticity diagram. For example, Camellia flower shown in Fig. 2, lies outside the gamut of given device [2]. It is shown in the Fig.1 as white spot. As a result, the display device fails to faithfully reproduce the original colour of camellia flower. It will use nearest equivalent colour to represent the flower.

Figure 2. The Camella flower
Image courtesy: www.techmind.org

RGB MODEL

Later Red, Green and Blue colours are used as primaries and it is called RGB model. This colour space is used for display devices. Any colour can be created in display devices by adding red, green and blue colour in a proportion. This is called Additive Mixing. The normalized values of primaries range from 0 to 1. Thus it forms a perfect colour cube. This colour model is device dependent. It implies a when a RGB value is passed to display devices they need not give guarantee to produce the same colour. Slight variation in colour is expected. Using RGB colour space it is not possible to create all colours. But the colours produced by RGB model are sufficient for practical purposes.

A variety of devices that is used to capture image and display follow power law equation i.e light intensity is proportional to some power of signal amplitude or pixel value. The exponent component value varies from 1.8 to 2.5. Pre-multiplying by inverse of exponent is called gamma correction. Appearance of image in a Cathode Ray Tube (CRT) screen will be darker than the original image without the involvement of gamma correction. Introduction of gamma makes RGB colour space as non-linear. To differentiate linear and non-linear RGB a suffix apostrophe is added with each colour component. Thus R' represents gamma corrected red colour. Many books fails to understand the important difference and use R and R' interchangeably.

Hewlett-Packard (HP) and Microsoft developed a new colour space, sRGB, which is specifically suited for operating systems and Internet. In sRGB the gamma value is 2.2. It is not a perfect colour space as CIE's XYZ. But it is representative of the majority of the devices on which colour is viewed by average computer user. The new colour system helps to create substantial degree of consistency among the various devices [3].

Figure 3. (a) Additive colours (Light) (b) Subtractive colours (Pigments)

CMYK MODEL

The RGB colour model that is based on additive mixing of colours is suited for printing purposes. Printing press produces an image by reflective light. This falls into category of colour generation by subtractive process. Refer Fig. 3 for further understanding. The secondary colours that are used in printing are Cyan, Magenta and Yellow. Cyan pigment will absorb red light and reflect all other colours. Likewise magenta will absorb green and yellow will absorb blue. Black colour may be produced by cyan, magenta and yellow. But it will be muddy-looking black. Carbon black that is used to produce black in printing is cheaper than coloured pigments. These are the two reasons that make four colour model as standard. The colours are cyan, magenta, yellow and black (CMYK) [4].

OTHER MODELS

The RGB and CMYK are the primary colour models used for display and printing. Few other models like YUV, YIQ, CIELAB, CIELUV, HSI, HLS, and HSV color models are used for specific applications.

YUV MODELS

Television broadcasting carries colour pictures from one place to another and projects the pictures on Television screens. Colour reproduction in screens is performed using additive mixing. Thus obvious choice of colour space should be RGB. But Television and Video systems seldom use the RGB as it is bandwidth inefficient. Next, when colour television broadcast was introduced in way back in 1950s there were lots of B&W TV sets. To cater to monochrome TVs the colour broadcast was made backward compatible. It means old B&W TVs can receive colour TV signals but B&W signals only projected on the screen. Our eye is sensitive to luminance variations than chrominance variations. Thus RGB colours were transformed to YUV colour space. Where Y stand for luminance and U and V contain colour information. Video engineers piggy-backed colour information on the TV composite signal to conserve bandwidth. Thus colour TV signals consumed same bandwidth as monochrome signals (6 to 7 MHz). This was an engineering marvel. The YUV space is used by PAL (Phase Alternation Line, Europe and Asia), NTSC (National Television System Committee, USA), and SECAM (French system) colour TVs [5],[6].

The conversion equations between R'G'B' and YUV is given below

Y = 0.299R ́ + 0.587G ́ + 0.114B ́

U= – 0.147R ́ – 0.289G ́ + 0.436B ́

V = 0.615R ́ – 0.515G ́ – 0.100B ́

YIQ colour space is a variant of YUV and used in NTSC systems. Here I stand for In-phase and Q stands for quadrature. Digital Video standard ITU-R BT.601 (International Telecommunication Union - Recommendation) uses YCbCr which again a scaled version of YUV. Here Cb and Cr represent chrominance signals. The High Definition Television (HDTV) uses ITU-R BT.709 standard. The primaries used in this colour space closely correspond to the contemporary monitors.

Source

A Guided Tour of Color Space [Online] http://www.poynton.com/papers/Guided_tour/abstract.html
Introduction to colour science [Online] http://www.techmind.org/colour/
sRGB: A Standard for Color Management - NEC SpectraView - spectraview.nec.com.au/wpdata/files/40.pdf (PDF, 1.1 MB)
Rafael C. Gonzalez and Richard E.Woods, Digital Image Processing, 2nd edition, Pearson Education.
Keith Jack, Video Demystified: A Handbook for the Digital Engineer 4th edition, Newnes Publishers
Noor A. Ibraheem, Mokhtar M. Hasan, Rafiqul Z. Khan, Pramod K. Mishra, "Understanding Color Models: A Review", ARPN Journal of Science and Technology, vol. 2, no. 3, April 2012, pp.265 -- 275.

Tuesday, 31 December 2013

Abstract DIP model - Part II

An abstract model for DIP came across my mind. It contains four sections viz; Acquire, Transfer, Display and Interpret. This post is second part of the series. Please refer earlier post (November 2013) for detailed introduction and to know about Acquire block.

Transfer Block:

It is in-between Acquire and Display block. As in Figure 1, input and output of Transfer block is digital data. It acts as a data compressor at the transmitting side and as a data expander at the receiving side (Display Block). This is because sending digital data without compression is a bad idea. Figure 1 describes a scenario. Here a person shoots an image using digital camera and transfers the image to hard disk in JPEG file format. Later it is opened and displayed on the computer screen. Here camera functions as acquire block as well as transmitter-side of Transfer Block. Computer functions as a receiver-side Transfer block as well as display block.

Figure 1 Digital Camera and Computer Interface

The transfer of digital data between Transfer Sub-blocks may occur through communication channel. This may be through wired (cables) or wireless medium. By nature all communication channels are noisy. It means that the data sent over the channel will be corrupted (1 becomes 0 or 0 becomes 1). A channel is considered good, if out of one million bits, one bit goes corrupt (1 out of 1000000). Various measures are taken to minimize or eliminate the impact of noise. Adding extra bits to detect as well as to correct the corrupt bits is one such measure. In CD and DVD Reed-Solomon Codes are extensively used. For further information search Google with the keyword “Channel Coding.” One may wonder when CD becomes a communication channel. Normally transfer of data from point 'A' to 'B' takes place via a channel and takes finite time (order of milliseconds). If the transfer of data takes near-infinite time then it can be considered as stored. Thus from this view point, transmission of data and storage (in CD or DVD) are functionally same.

Why compression?

The photons are converted into charge by image sensor. All the charges are read-out and they form a electrical signal. This analog signal is sampled and quantized to produce digital signal. Because of spatial sampling, resultant data is voluminous. Thus each image sensor's photon accumulation is represented in a pixel as a digital value.

Image is made up of rows and columns of pixels. The row and columns of data can be represented in matrix form. Programmers consider image as an array. Each pixel may require one bit to multi-bytes to represent digital data. A pixel from two coloured image (say black and white) requires only one bit to represent. A gray scale image uses 8 bits to represent shades of gray. Black and White TV images falls under this (Gray image) category. Colour image is composed of red, green and blue colour. Each colour shade requires 8 bits. Thus 24 bits (3 bytes) are required for each pixel. A HDTV size frame (image) possesses 1920 x 1080 pixels and requires 6075 KB (1920 x 1080 x 3) size storage. A one minute video requires 8900 MB (6075 KB*25*60). Thus half a minute video will gobble one DVD. It requires 170 DVDs for single movie. One may wonder, how a entire Hollywood movie (nearly 90 to 100 minutes) is put inside a DVD. The answer is compression. The main objective of this example is to make ourselves to realize the mammoth data size of image.

Solution: Remove Redundancy

There is a high amount of correlation exists between pixels in continuous tone images (typical image from digital camera). Thus one can guess a pixel value by knowing the values of neighbouring pixels. Put in another way the difference between a pixel and its neighbours will be very minimum. Engineers exploit this feature to compress a entire Hollywood movie into a DVD.

Redundancy can be classified into interpixel redundancy, temporal redundancy and psychovisual redundancy. Temporal (time) redundancy exists in video only and not in image. Our eyes are very sensitive to gray scale variation than colour variation. This is an example for psychovisual redundancy. By reducing redundancy high compression can be achieved. Transform coding converts the spatial domain signals (image) into spatial frequency domain signals. In the spatial frequency domain, first few coefficients contain large amplitude (value) and rest of the coefficients contains very small amplitude. In Figure 2, bar height represents the value. White colour bars represent the pixels and pink colour bars represent DCT coefficients. The real compression occurs by proper quantization of coefficient amplitude. The low frequency components i.e. first few coefficients are mildly quantized. The high-frequency coefficients i.e. rest of coefficients is severely quantized and outcome reaches near zero value. High frequency signals are highly attenuated (suppressed) but not eliminated. The feel of image crispness arises due to the presence of high spatial frequency components. Once high frequency signals are removed from an image become blurred. In JPEG, colour images are mapped into luminance and two chrominance layers (YCbCr) and Cb and Cr layers (psychovisual) are highly quantized to achieve high compression.

Figure 2 Spatial domain and Spatial Frequency domain. Courtesy [1] hdtvprimer.com

The quantized coefficients are coded using Variable Length Code (VLC) and then sent to receiver or put into storage device. In the VLC, highly occurring code is alloted with fewer bits and rarely occurring codes are alloted with more number of bits. Very good example for VLC is Morse code. Well known Save Our Souls (SOS) signal is represented as dot dot dot dash dash dash dot dot dot (...---...). In English language S and O are frequently occurring so they given shorter code. Less occurring letters like X, U will have longer code. Huffman code is a VLC, that provides very good compression.

In the receiver side VLC are decoded. The reverse operation of quantization occurs and transformed coefficients are again reconverted into spatial signals to produce the reconstructed image. Severity of quantization and file size are directly proportional. Image quality and quantization severity are indirectly proportional. The quantization makes an irrecoverable loss of signal i.e. it is impossible to recover original signal from quantized signal. For our eyes compressed JPEG image and the original image are practically indistinguishable.

Compression of images using quantization of spatial frequency coefficients is called lossy compression. This method is not permitted for medical images and scanned legal documents. Thus lossless compression is used. A image with 100 KB file size can be compressed into 5 KB file size using lossy compression. But with lossless compression one can achieve only 35 KB file size. Lossy and lossless compression is possible with JPEG. Advanced version of JPEG is JPEG2000. In the JPEG2000 Wavelet transform is used instead of Discrete Cosine Transform (DCT).

Spatial Domain Compression

The transform coding poorly performs on cartoon images with limited colours and line art images. The exploitation of correlation can be carried out in spatial domain itself. VLC can be used to used to compress this sort of images. But underlying source probability of image is required for efficient compression. To overcome this problem dictionary codes are used. All ZIP compression application use dictionary coding. This coding method was developed by Limpel and Ziv in way back in 1977 and the method was named as LZ77. Next year, LZ78 arrived. Later Welsh modified LZ77 to make it much more efficient. It was named LZW. In 1980 Graphics Image Format (GIF) was introduced in Internet. It extensively used LZW. Few years later people come to know that LZW is a patented technique. This sent jittery among Web developers and users. Programmers came out with alternate image standard called Portable Network Graphics (PNG) to subdue GIF dominance. PNG uses LZ77 technique and patent free. In dictionary coding, encoders search for patterns and then patterns are coded. Longer the patterns better the compression.

Knowledge on Information Theory is required to understand to evaluate various VLC. Information theory is an application of probability theory. What is information? If a man bites a dog then it is news. This is because chance of event occurrence is very low and instills interest to read. Put it in another way information value is very high. (Pl. don't confuse with computer scientist usage of information)

Digital content duplication is a simple affair. So, content creators were forced find some ways and means to curb piracy. Digital Watermarking is one such solution to curb piracy. Here copyright information is stored inside the image. Presence of TV logo on television programmes is a very good example. Invisible watermarking schemes are also available. Steganography is a art of hiding text in images. Functionally digital watermarking and steganogragphy is similar but their objectives are totally different.

Note
The objective of this post to give overview of Transfer block. For more information please Google highlighted phrases.

Source

1. What is exactly atsc [Online]. Available http://www.hdtvprimer.com/issues/what_is_atsc.html

Saturday, 30 November 2013

Abstract DIP model - Part I

A comprehensive all encompassing abstract model for Digital Image Processing (DIP) come across my mind. Let me put forth my thoughts in a lengthy manner to span several posts. This post is the first part of the series.

Introduction

Normally DIP is studied as a stand alone subject. Learners misunderstand the subject and associate it with compression, compression and compression. I personally feel it should be studied as “part of a whole.” Then only the real face of DIP can be perceived. My abstract model is an outcome of 'part of a whole' philosophy. As it lacks the academic rigour, model is not suited for scholarly publication. But model may be helpful to gain insights and dispel myths about DIP.

Engineers are expected make products to improve the quality of life of human beings. They are expected to use scientific knowledge in product making. The product are made in industry and sold in the market. The required level of knowledge about industry and market is not taught in the curriculum. This severely hampers engineers' thinking. Hard liners may counter argue in following way. “Part of whole thinking” will dilute engineering. If a student wants to learn about market, let him do an MBA.

The abstract model contains four sections viz.; Acquire, Transfer, Display and Interpret. In practice images are captured and then either stored or transferred. Later they are either printed on paper or shown on a screen and the images are interpreted by human brain with the help of eyes. What is new in this model is human brain is brought to the fore and not human eye. One may wonder, why human eye is not given the due credit or put in other way, why human brain's role in seeing is given undue importance in this model. Whether it is a sensational article written to draw more visitors? Please read the article further and I assure you all your anxieties will tend to cease.

Acquire

The responsibility of acquire section extends from acquiring reflected light from the subject that is shot till conversion of captured light into electrical signals. It has four subsections viz. Lens, Sensor, Read-out electronics and A-to-D converter. Lenses collect the light that is reflected from the subject and focus it on the sensor. Array of sensors ('n' rows x 'm' columns) are used to in camera to capture images. Number of sensors in the array and resolution of image is directly proportional. Sensors can be categorized into CMOS and CCD type. We all know numerous photons forms a light ray. Photon impinges on the sensor's photosite (i.e. light sensitive area), and the electrons in the valence band moves to conduction band. This causes flow of electrons and forms current in the sensor. This phenomenon is called photoelectric effect. The stored charges in the sensors can be treated as tiny capacitors (we know junction capacitance in diode can be treated as capacitor). In a sensor, only 40 % of area is covered by photosensitive material. Remaining area is filled with amplifiers and noise reduction circuits [1]. The charge stored in tiny capacitors (actually sensors are built using MOS transistor), has to be read out before they get discharged (similar to working of dynamic RAM). Faster reading-out is required for higher resolution images. Then read-out voltage signals are amplified and converted into digital signals (or data). I guess higher the resolution leads to lesser the A-to-D conversion time pixel. For detail discussion refer [2], [3]. Figure 1 beautifully explains the concept of read-out [3]. Line sensor arrays (1 x m) are used in photocopying (Xerox) machines. Here a stick that contains row of sensors moves from top of the page to the bottom of the page to collect the pixel information. In thermal systems only single pixel sensors (1 x 1) are available.

Figure 1. Photon collection by photosite and read-out

The above paragraph would have provided the functioning of light capturing in a superfluous way. Technical details are trimmed to minimum level so as to highlight the principle of light capture. Knowledge on optics and machining is very important to fabricate lenses. The power of DSLR camera hinges on powerful lenses. Good knowledge on micro electronics is absolutely essential to understand the functioning of sensor, read-out amplifier and A-to-D converter. To design and fabricate reasonable good resolution acquiring subsystem, a sound knowledge on Very Large Scale Integration (VLSI) and knowledge on related software tools are essential. In reality subjects like optics, microelectronics and VLSI are taught even without veiled referenced to camera or scanner systems.

The technology has reached to such a stage that even entry level camera (low priced camera) is capable of taking 10 Mega pixel resolution images. When film based camera reigned, photography was a costly hobby. So very few bought the camera. To acquire digital colour image requires three filters namely red, green, and blue. Use of three filters is costly and instead single sensors are used to cut down the cost. For that 'Bayer Patterns' are used. When Bedaprata Pain [4] and his team developed affordable CMOS active pixel sensor, digital camera become affordable and today every mobile phone is embedded with a camera.

Product Market

The next level of innovation will be in improving usability of camera and not in cost cutting. As the cost comes down heavily quantum of profit will also comes down. To maintain the profit industries go in for volume. Let Camera Company named ABC sells 1000 camera for the price of Rs. 5000. Let the profit be Rs. 500. The net profit is Rs. 5,00,000 (1000 camera x Rs 500). If the same company sells 10000 camera for the price of Rs. 3000 then the net profit is Rs 30,00,000 (10000 camera x Rs 300 as profit). Profit has increased many folds. This logic go well until everyone acquires a camera. After that ABC has find innovative ways to keep the net profit same.

The ultimate aim of the camera manufacturing companies can be put in this way “even a moron should take pictures like a professional photographer.” As we all know we have huge number of amateurs and very few good photographers. The improve the market size of costly DSLR (Digital Single Lens Reflex) camera, industries should target the huge amateur base. But general public neither have patience nor time to become like a professional. To bridge the skill gap lot of intelligence is added in the camera.

Market need satisfying algorithms

Face detection algorithms are used to help to shoot proper pictures by amateurs. Earlier this feature was available in point-and-shoot cameras. Nowadays this feature is extended to professional models like DSLR cameras. Most of us, are unable to set proper ISO, aperture and shutter speed for the required shot. That is why auto-focus and auto-lighting cameras sprung up. But there is a lot scope for improvement in these cameras. Next, amateurs' hands are not stable at the time of taking shot and invariably it results in shaky pictures. This can be corrected by using “image restoration” class of image processing algorithms. Sometimes enough lighting may not be available at the time of shooting or extraneous light may fall on the subject. These errors can be partially corrected using image editing softwares like Photoshop and GIMP. Photoshop is the most popular commercial image editing software and GIMP (GNU Image Manipulation Program) is a open and free software. Lot of image processing algorithms will be deployed in ensuing intelligent camera.

Source
1. How Digital Cameras Work, [Available Online], http://www.astropix.com/HTML/I_ASTROP/HOW.HTM
2. Digital Processing Techniques, [Available Online], http://www.astropix.com/HTML/J_DIGIT/TOC_DIG.HTM
3. ZEISS Microscopy Online Campus | Microscopy Basics | Understanding Digital Imaging, [Available Online], http://zeiss-campus.magnet.fsu.edu/articles/basics/digitalimaging.html
4. Bedabrata Pain - Wikipedia, the free encyclopedia. [Available Online], http://en.wikipedia.org/wiki/Bedabrata_Pain

Thursday, 31 October 2013

Television and Movies – Visual Fast Food?

Everyday we encounter lot of pictures. Pictures appear in television, cinema, newspaper, magazine or the Web. Pictures are used to convey emotions and messages. Except a few, most of us take things for granted and spend our scarce resource (thinking) on odd or rare events. For example until Sir Isaac Newton, the falling of apple from a tree was considered the norm and people simply consumed the fallen apple. Likewise viewing pictures are taken as a very usual thing and we skirt to think about it. In this post, the discussion will be on the “Role of pictures in our life.”

It will better to define first and then get into the essay. Pictures can be classified eye-captured pictures, device-captured pictures and synthetic picture. If I physically visit the Amazon jungle and enjoy the beauty through my own eyes then I call it eye-captured picture. If I see the Niagara Falls in a movie or television or magazine then I call it as device-captured picture. A picture that is created by artistic rendition with or without computers is called as synthetic picture.

Two hundred years back pictures mean almost eye-captured pictures only. Rich people only had the opportunity to own synthetic pictures (paintings). Commoners who live in big cities like Rome would have enjoyed the Michelangelo paintings in the ceiling of the Sistine Chapel. Colour photograph started appearing after 1861. It helped to capture portrait of a person or natural landscape with less effort and time. Earlier times painters performed this task. Thus human painter was substituted by colour camera. But it was no way an easy task produce multiple copies. First colour illustrations appeared in newspaper in 1934 in UK. To have a glimpse of old colour photographs refer [1]. Colour television emerged in the year 1960 in USA. After 1980 World started seeing lot of device-captured pictures. I can fairly assume everyone see TV for two hours per day. The amount of pictures in print medium (newspaper, magazine) is relatively less. On the Web, picture is more than print medium but lesser than the TV. Let us conclude in a day average device-captured pictures viewed is for two and half hours (two hour tv plus half an hour Web and print). Within a span of two hundred years, the time spent on device-captured pictures rose from near zero to 150 minutes.

A cursory glance of “150 minutes of device-captured picture viewing” looks like a trivia. At most it may amuse people and make people feel proud of technological superiority. Broadcast media (TV and movie) is a medium that transcends the distance. For example seeing a war, seeing piranha fish present in Amazon rivers, and seeing skiing in Alps mountain range with naked eye is a rarity for a common man living in India. Thus one is able to have a near real experience in battle field, jungles and sky scrapers without moving from their physical place.

A coin has two sides. Likewise the ability to “transcend distance and take part important events in world” has profound positive and negative effects. Without picture, visualizing Amazon jungle with textual description is nearly impossible. Our knowledge has tremendously increased with the rise of access to pictures. People in India know very well about US President Barack Obama, Osama bin Laden, Bruce Lee, Hollywood celebrities, Kangaroo, Niagara Falls, and Eiffel Tower all because of pictures. Learning medicine, architecture, archaeology and many fields become easier because of availability of pictures. Forensic experts are able to identify criminals without physically visiting the crime spot. Surveillance camera captured pictures which help us to prevent crime as well as to capture the criminals.

The negative sides are “we are conditioned to see what we want to see” and the gap between device-captured and eye-captured pictures is very high. When viewing a TV and movie, we see the world through the eyes of content creator (director of movie). In one sense our freedom is lost. As watching movies acts as a medium of escape, we voluntarily subject our-self for loss of freedom. Thus it becomes easy to mass brain-wash the so called modern man than their ancestors. Next, a person in India via TV can live in America for few hours per day. So the distinction between real and reel shrinks and confuses person's thinking ability.

The important point is we see extremes in device-captured images. First principle of journalism states that “If a dog bites a man then it is not news and if a human bites a dog then it is news.” Mathematically it means lower the probability of occurrence, higher the probability to be published. That is why we see six-pack males like Arnold Schwarzenegger and Sylvester Stallone, handsome Leonardo DiCaprio and beautiful hour-glass females. Fig.1 contains the still from the movie Titanic and it is very romantic. Seeing this kind of romantic encounters with naked eye is almost impossibility. Thus for 150 minutes we see what is not possible to see with naked eye.

Figure 1. A romantic scene from the movie Titanic Courtesy: Internet

Studies have established long hours of TV viewing affects children's ability to learn, their retention capability and socializing skills. The impact of camera-captured pictures in humans has to be documented with scientific data. Seeing a picture is not an independent task of eye alone. It is an outcome of close coordination between eye and brain. Thus it will be better to say “We Perceive” than “We See.” When we encounter optical illusions our brain fails to interpret the incoming visual signal from the eye properly. When ever meditation or prayer is performed, we normally close our eyes. This helps us to cut down the distraction as well as reduce the work load of brain. Seeing something actually makes our brain to give priority to process on the incoming visual signal. That is why when we sit in a park or watching TV makes us to feel as if we are getting rid of our problems. Actually, the brain starts to process visual signals rather than pondering on the problem.

We have a high intake of processed food (fast food) compared to our ancestors. Wide spread prevalence of life style diseases like diabetes, obesity are linked to processed food. Similar in lines we have high intake of camera-captured pictures compared to our ancestors. Will it creates any problem for us?

Before we wind up we will have quick recap of what we have discussed in this post.

Eye-captured, Camera-captured and Synthetic pictures
Camera-captured images are different from eye-captured pictures
Camera-captured picture from 18th century, transcends distance, captures extreme events
Amount of camera-captured pictures is 150 minute per day and 200 years back almost zero minutes.
With camera-captured images voluntary brain-washing is carried out.
Cognitive load on brain due to camera-captured pictures is high.
Camera-captured picture = Fast food

Source
1. Colour images from 1930s unveiled – Daily Record [Available Online] http://www.dailyrecord.co.uk/news/scottish-news/colour-images-from-1930s-unveiled-1276428

Acknowledgement
Grammatical correction was carried out by a final year engineering student.

Sunday, 29 September 2013

Second Avatar of Stereoscopy

Recently I was watching a promotional video of palatial hotel from YouTube video server. That video clip had 3D viewing option. It stirred my curiosity and selected the option. All I was able to see was a blurred video as I did not possess required 3D glass. I searched in Google and I found that YouTube automatically converts short video clips that have a resolution of 1080p (1920 x 1080 progressive mode) from 2012 onwards [1]. If I had watched with 3D glass I would have virtually visited the hotel rather than seeing. Depth perceivable in 3D creates a new experience. The right technical word for 3D movie is stereographic movie.

The YouTube video clip made me nostalgic. My first stereoscopic experience was way back in 1984. As a boy I had an opportunity to view “My Dear Kuttichaathan” (in Tamil language 'kutti' means small, 'chathan' means shaitan or genie) movie. Exhibitors collected extra fee for 3D glasses and at the end of the movie they got back the eye glass. I was really shocked when arrows from silver screen tried to poke my eyes and pleasantly surprised when bunch of roses and cone ice-cream popped out of screen. No doubt the movie was a block buster. After a gap of 25 years I watched a 3D movie. It was none other than James Cameron's Avatar movie and my children were my fellow viewers. My children enjoyed to the core. I liked the theme of the movie but the 3D effects did not create an 'awe' in me. I realized I have become old.

After 'My Dear Kuttichaathan' few 3D movies came to tap the emerging 3D market. I saw one or two. I was not impressed and public too shared my opinion. Slowly 3D popularity declined. Production cost of stereographic movies were high compared to normal movies and production work-flow has to be modified to suit 3D movies [2]. Stereographic movies required two projectors instead of one and both of them have to be synchronized. Directors were not able to effectively use the 'depth' to convey their story to audience. Consumers required to wear an eyeglass and safe return of the eyeglass was their duty. Whole lot of extra efforts among stake holders for few pop-ups was not worthy. After a lull period of 25 years Avatar movie created the frenzy. One may be perplexed why there was a long gap of quarter century and why there is a 3D frenzy now? Answers to these questions will come out when we dwell into past and do some reasoning.

As for as India is concerned before the color TV penetration, 'movie going' was the prime pass time activity. Entire family went to movie halls and films are also produced to cater the needs of entire family. In late 1990s TV and satellite broadcasting glued the family to the drawing room. Youth (15 to 30 years of age) become major customer base for films. Automatically movie content were made to suit the audience. Youngsters like stunning visuals. Thus 3D become a apt to tool to make youth to come to theatres.

The next threat for theatre came in the form of VCD (Video Compact Disk). Prior to VCD Video Home Systems (VHS) was the norm. It used magnetic tape to store the analogue signals. VHS player had lot of mechanical components and regular maintenance was required. Copying from Master tape to another was cumbersome. The quality of copied content quality were inferior to the Master. Thus piracy was kept at a bay. Thus VHS never challenged the dominance of theatres. In contrast, the VCD carried digital signals and VCD player had more electronics and less mechanical components. As VCD market expanded prices of VCD player started falling. Pirated VCD making was a simple task. Thus just released movies were available in pirated VCD and family watched in their television. This was death blow to theater owners and in turn to movie industry.

The VCD threat was countered by producing movies with spectacular visuals (ex. Matrix movie fight scenes were talk of the down) and with surround sound systems like Dolby. This discouraged movie patrons to view movies on their Television sets. Stereoscopy produces stunning visuals to draw the crowd to the theatres and curtails piracy.

A still from the movie 'Avatar'

Avatar movie grossed box office collection of two billion dollars, which is a huge sum in even in Hollywood [3]. Right technology and sizable market emerged in late 2000 and arrival of Avatar movie ushered a new chapter in stereoscopic movie industry. Cinema producers realized 3D is a untapped potential and started releasing animation movies. The number of 3D theaters started exploding after 2000. In the year 2007 it was 1300, in the year 2009 it reached 9000 and at present 45000. Most of the theaters are constructed in China. In the year 2005 Hollywood produced only five 3D movies, in 2009 it was 20 and in 2012 it almost doubled [4]. In the year 2012, out of 15 highest-grossing films nine were stereoscopic movies. Quarter of the revenue is generated from USA and three quarters comes from rest of the world. Rising economies like China, India contributes a lot. As movie industry falls under 'high risk – high reward' category and 3D technology becomes a safe bet.

Seven reasons for rise for 3D

It introduces a illusion of depth which produces a new experience. In normal movies shadow acts as surrogate for depth.
It suits well with youngsters, who prefer stunning visuals than a emotional roller-coaster. Next they are experience conscious. So price is not a barrier.
Amount of money grossed from a stereoscopic movie is huge. The failure rate is less. Thus 3D movie is a safe bet for film producers.
Cost of stereoscopic movie ticket is 30 percent higher than normal movies. This makes a hole in movie patrons but helps theatre owners to fill their coffers.
Stereoscopic movies curtails piracy.
Digital projection technology go well with 3D movies.
Digital production is very cost effective for 3D movies.

Problems with 3D

Film is a visual art that helps to tell a story. A good story will make the audience get hooked to the characters of the movie. That is why classical movies like Ben Hur, Five Men Army, Mackenna's Gold and Bridge on the river Kwai are still touches our hearts. A good movie should have a judicious mix of stunning visuals and emotions (ex. valour, sacrifice). Yesteryear directors were not sure of 3D medium's effectiveness in story telling and simply avoided the medium.

Sometimes the presence of depth of field may become a source of distraction. For example in nude photography, there are photographers who still use B&W (Black and White) film stock instead of colour film stock. They claim B&W medium helps to appreciate the shape of female body. The faithful reproduction of flesh tone by color film really distracts the viewers and photographers are unable to convey their intention. Once visual effects artist commented in a public meeting that “When Science gets in Art goes out (from movies)”. Over indulgence on technology may actually spoil story telling capability.

3D movies favours themes that are based on mythology, magic (ex. Harry Potter), adult, horror and cartoons. Thus movie goers are transported away from reality for 90 minutes. Thus 3D can be regarded as entertainment medium than a visual art medium.

Our eyes has to focus properly on screen to feel the depth. Those who fail to focus get head ache and other related ailments. Visual discomfort and visual fatigue are studied extensively by scientists to improve the 3D movie going experience [5].

Summary

It is a visual rich medium and toning down may be necessary to tell a story compelling way.
It is genre limited. It is well suited for mythology, magic, horror and cartoons.
Visual fatigue and visual discomfort has to be studied well for wide acceptance among public.

Source
[1] Official Blog: How we’re making even more 3D video available on YouTube [Online] http://youtube-global.blogspot.in/2012/04/how-were-making-even-more-3d-video.html
[2] Casting a magic spell, [Online] http://www.thehindu.com/thehindu/mp/2003/05/15/stories /2003051500260100.htm
[3] Avatar (2009 film) - Wikipedia, the free encyclopedia, [Online] http://en.wikipedia.org/wiki/Avatar_(2009_film)
[4] 3 Signs That 3D Movies Are The Way Of The Future | Business Insider India [Online] http://readbi.in/st9nJY
[5] M. Lambooij and W. IJsselsteijn , “Visual Discomfort and Visual Fatigue of Stereoscopic Displays: A Review,” Journal of Imaging Science and Technology, vol. 53, no. 3 pp. 030201–030201-14, Mar. 2009. [Download]
http://www.cs.sfu.ca/CourseCentral/820/li/material/source/papers/Visual-discomfort-09.pdf