Monday, 31 December 2012

Video Compression Basics

In olden days video transmission and storage was in analog domain. Popular analog transmission standards were NTSC, PAL and SECAM. Video tapes were used as a storage media and VHS and Betamax standards were adopted. Later video transmission and storage moved towards digital domain. Digital signals are immune to noise and power requirement to transmit is less compared to analog signals. But they require more bandwidth to transmit them.  In communication engineering Power and Bandwidth are scarce commodities.  Compression technique is employed to reduce bandwidth requirement by removing the redundancy present in the digital signals. From mathematical point of view it decorrelates the data. Following case study will highlight the need for compression. One second digitized NTSC video requires data rate of 165 Mbits/sec. A 90 minute uncompressed NTSC video will generate 110 Giga bytes. [1]. Around 23 DVDs are required to hold this huge data. But one would have come across DVD’s that contain four 90 minute movies. This is possible because of efficient video compression techniques only.

Television (TV) signals are combination of video, audio and synchronization signals. General public when they refer video they actually mean TV signals. In technical literature TV signals and video are different. If 30 still images (Assume each image is slightly different from next image) are shown within a second then it will create an illusion of motion in the eyes of observer. This phenomenon is called 'persistance of vision’. In video technology still image is called as frame. Eight frames are sufficient to show illusion of motion but 24 frames are required to create a smooth motion as in movies.

Figure 1 Two adjacent frames   (Top)  Temporal redundancy removed  image (Bottom)

Compression can be classified into two broad categories. One is transform coding and another one is statistical coding. In transform coding Discrete Cosine Transform (DCT) and Wavelet Transforms are extensively used in image and video compression. In source coding Huffman coding and Arithmatic coding are extensively used. First transform coding will be applied on digital video signals. Then source coding will be applied on coefficients of transform coded signal. This strategy is common to image and video signals. For further details read [2].

In video compression Intra-frame coding and Inter-frame coding is employed. Intra-frame coding is similar to JPEG coding. Inter-frame coding exploits the redundancy present among adjacent frames. Five to fifteen frames will form Group of Pictures (GOP).  In the figure GOP size is seven and it contains one Intra (I) frame and two Prediction (P) frame and four Bi-directional Prediction (B) frames. In  I frame  spatial redundancy alone  is exploited and it very similar to JPEG compression. In P and B frames both spatial and temporal (time) redundancy is removed. In the figure 1, Temporal redundancy removed image can be seen. In the figure 2, P frames are present in 4th and 7th position. Fourth position P1 frame contains difference between Ith frame and 4th frame. The difference or prediction error is only coded. To regenerate 4th frame, I frame and P1 frame is required.  Like wise 2nd frame uses prediction error between I, P1, and B1 frames. The decoding sequence is I PPBBBB4. (Check with a book)

Figure 2 Group of Pictures (GOP)

One may wonder why GOP is limited to 15 frames. We know presence of more number of P and B frames results in much efficient compression. The flip side is if there is an error in I frame then dependant P and B frames cannot be decoded properly. This results in partially decoded still image (i.e. I frame) shown to viewer for the entire duration of GOP. For 15 frames one may experience still image for half a second. Beyond this duration viewer will be annoyed to look at still image. Increase in GOP frame size increases decoding time. This time will be included in latency calculation. Real-time systems require very minimum latency.

In typical soap opera TV episodes very low scene changes occur within a fixed duration. Let us take two adjacent frames. Objects (like face, car etc) in the first frame would have slightly moved in the second frame. If we know direction and quantum of motion then we can move the first frame objects accordingly to recreate second frame. Idea is simple to comprehend but implementation is very taxing.  Each frame will be divided into number of macroblocks. Each macroblock will contain 16x16 pixels (in JPEG 8x8 pixels are called Block that is why 16x16 pixels are called Macroblock). Choose macroblock one by one in the current frame (in our example, 2nd frame in Figure 1) and find ‘best matching’ macroblock in the reference frame (i.e. first frame in Figure 2). The difference between the best matching macroblock and chosen macroblock is called as motion compensation. The positional difference between two blocks is represented by motion vector. This process of searching best matching macroblock is called as motion estimation [3].


Figure 3 Motion Vector and Macroblocks

       
         A closer look at the first and second frame in the figure 1 will offer following inferences. (1) There is a slight colour difference between first and second frame (2) The pixel located at 3,3 is the first frame is the 0,0 th pixel in the second frame. 
         In figure 3 a small portion of frame is taken and macroblocks are shown. In that there are 16 macroblocks in four rows and four columns.

Group of macroblocks are combined together to form a Slice.

Further Information:  
  • Display systems like TV, Computer Monitor incorporates Additive colour mixing concept. Primary colours are Red, Green and Blue. In printing Subractive colour mixing concept is used and the primary colours are Cyan, Magenta, Yellow and Black (CMYK). 
  • Human eye is more sensitive to brightness variation than colour variation. To exploit this feature YCbCr model is used. Y-> Luminance Cb -> Crominance Blue Cr-> Crominance Red. Please note Crominance Red  ≠  Red
  • To conserve bandwidth analog TV systems uses Vestigial Sideband Modulation a variant of Amplitude Modulation (AM) and incorporate Interlaced Scanning method.

Note: This article is written to make the reader to get some idea about video compression within a short span of time. Article is carefully written but guarantee cannot be given for accuracy. So, please read books and understand the concepts in proper manner.
Sources:
[2]  Salent-Compression-Report.pdf, http://www.salent.co.uk/downloads/Salent-Compression-Report.pdf  (PDF, 1921 KB)
[3]  Iain E. G. Richardson, “H.264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia,” John Wiley & Sons Ltd, 2003. (Examples are very good.  ISBN 0-470-84837-5)

Wednesday, 28 November 2012

Multicore Software Technologies


Powerful multicore processors arrived in the market, but programmers with sound knowledge on hardware to harness its full potential were in short fall. Obvious solution is to produce good number of programmers with sufficient knowledge on multicore architecture. Another solution is to create software that will convert the code meant for single processor into a multicore compatible code. In this article second method will be discussed in detail.  Reasons for first method’s non-feasibility are left to readers as exercise.

It is a well known fact that User friendliness and code efficiency don’t go hand in hand. For example, factorial program written in assembly language will produce smallest executable or binary (in windows .exe) file. Same algorithm implemented in higher languages (Fortran, Matlab) will produce large sized file. One can expect moderate binary file size in C. Writing a program in assembly language is tedious and in Matlab it is easy.



A graph that shows relationship between effort required to write a program and computational speedup achieved by that programming language is shown above. Essence of previous paragraph is beautifully presented in the graph. Multicore software tools like Cilk++, Titanium, VSIPL++ have the user riendliness and at the same time they are able to produce efficient applications. Is it not like "Have cake and eat it too" situation?  Let us hope, it will not take much time to reach coveted 'Green ellipse' position.

OpenMP (Open Multi-Processing) is an open standard and it is supported by major computer manufacturers. Code written in Fortran, C/C++ languages can be converted into code compatible to multicore processors. Multicore compilers are available in Windows, Linux and Apple Mac operating systems. Advantages of OpenMP is it is easy to learn and compatible with different multicore architectures. Software tools like  Unified Parallel C, Sequoia, Co-array Fortran, Titanium, Cilk++, pMatlab and Star-P are available as an alternative to OpenMP. CUDA, BROOK+, OpenGL are available to cater Graphics Processing Unit (GPU) based systems.

Name                           Developed by                  Keywords   Language extension 
.
Unified Parallel C (UPC)    UPC Consortium                  shared                     C
Sequoia                           Stanford University                 inner, leaf                 C
Co-Array Fortran                        ---                                   ---                           Fortran
Titanium                                    ---                                  ---                          Java
pMatlab MIT Lincoln Laboratory          ddense, *p                 MATLAB
Star- P                               Interactive Supercomputing          ---                        Matlab,
                                                                                                                           Python
Parallel Computing     Mathworks Inc                       spmd, end                   Matlab
Toolbox

--- Data not available

A multicore compatible code is developed in following way. Code is written in a high level language and it is compiled. This helps to rectify any error in the program. Next, code is analyzed and where ever parallelism is exhibited that section of code is marked with special keyword (see above table) by the programmer. It is again compiled with multicore-software tool. Software tools will automatically insert necessary code to take care of memory management and data movement. In a multicore environment, operating system creates a Master thread at the time of execution. Thereafter master thread takes care of the execution and where ever special keyword is spotted then threads are forked (i.e. created) and given to separate core. After the completion of job, threads are terminated.

  1.  #pragma omp parallel for \
  2.  private(n) \
  3. shared (B, y, v)
  4.  for (int n=0; n < K; n++)
  5.  y[n] = B*cos(v*n);

In this paragraph, a few line sample code presented above, will be explained. Above program have syntax similar to C. Steps 4 and 5 is sufficient to generate a cosine signal.  First three lines are there to parallelize the step 4 and 5. In the step 1, ‘#pragma’ is a pre-processor directive in C/C++. ‘omp’ stands for OpenMP software and ‘parallel for’ states that for loop is going to be parallelized. In the step 3, the ‘shared’ states which are the variable can be placed in global space and all the cores can access them. Amplitude ‘B’ and array name ‘y’ is placed in the global space.  Each core should maintain its own ‘n’ in its private space to generate proper cosine signal.

Program tailored to multicore will have generally two sections. One section performs the task and another section has target architecture specific information like no. of cores etc. All software assumes a model and for further details read [1]. Multicore hardware architecture can be classified into two broad categories viz; homogeneous (ex. x86 cores) and heterogeneous (ex. GPU).  Software tools are built to suit any one of the architecture only.

Middleware

Days were gone when only supercomputers having multi-processor system. Today even embedded system (ex. Smart phone) uses multicore hardware.  Embedded system is a combination of hardware and software. Typically any change in hardware will require some changes in software. Likewise any up gradation in software require change in hardware. To overcome this problem “Middleware” were concept was introduced.

MIT Lincoln Laboratory developed a middleware named Parallel Vector Library (PVL) suited for real-time embedded Signal and Image processing systems.   Likewise VSIPL++ (Vector Signal and Image Processing Library) was developed and maintained by High Performance Embedded Computing Software Initiative (HPEC-SI).  VSIPL++ is suited for homogeneous architecture. For heterogeneous architecture PVTOL (Parallel Vector Tile Optimizing Library) is used.  Here also programs are compiled, and then mapped to multicore systems.

Source

1. Hahn Kim, and R. Bond. “Multicore software technologies” IEEE Signal Processing Magazine, Vol. 26 No.6, 2009 pp. 80-89. http://hdl.handle.net/1721.1/52617 (PDF,629 KB)

2. Greg Slabaugh, Richard Boyes, Xiaoyun Yang, "Multicore Image Processing with OpenMP", Appeared in IEEE Signal Processing Magazine, March 2010, pp134-138. But it can be downloaded from  http://www.soi.city.ac.uk/~sbbh653/publications/OpenMP_SPM.pdf (PDF, 1160 KB)

Tuesday, 30 October 2012

Multicore Processors and Image Processing - II (GPU)



       Evolution of GPU has humble origin. In Earlier days multichip 3D rendering engines were developed and used as add-on graphics accelerator card in Personal iomputer. Slowly all the functionalities of the engine were fused into one single chip. Likewise processing power steadily increased and in 2001 it blossomed into  Graphics Processor Unit (GPU). Pioneer in GPU, NVIDIA introduced GeForce 3 in 2001. Then came GeForce 7800 model and in 2006, GeForce 8800 was available in the market. Present day GPU are capable to perform 3D graphics operations like transforms, lighting, rasterization, texturing, depth testing and display.

        NVIDIA has introduced Fermi GPU in its' GPGPU fold. It consists of multiple streaming multiprocessors (SMs) which are supported by cache, host interface, Gigathread scheduler and DRAM interfaces. Each SM consists of 32 cores; each of them can execute one floating point or integer operation per clock. Each core supported by 16 load/store units, four special function units, a 32K register file, 64K Random Access Memory. Fermi GPU adheres to IEEE 754-2008 floating point standard, which means that it offers high precision results. Fermi supports fused multiply-add (FMA) feature. [2]


GPU Market

    Most of  the technical blogs and websites cite John Peddie Research as their source for their data. I have also done the same thing. In this blog two data sources [4, 5] are given and as usual their data does not match with other. In ref. [5] market share based on discrete add-on board alone is considered. If it is integrated with CPU then Intel Inc. will have share in the market.

Total units sold in the 2nd Quarter (April to June) of 2012 is 14.76 million [5].

Company           Market Share
                           in percentage
AMD                       40.3
NVIDIA                  39.3
MATROX                 0.3
S3                             0.1

Differences between conventional CPU and GPU
  •  Typical CPU banks on speculative execution like cache and branch prediction. Speculative optimization strategy will pay off for code that has high data locality. This assumption may not fit well for all algorithms.
  •  CPU maximizes single threaded performance by increasing raw clock speed. This results in hotter transistors, more current leakage from transistors and more cost to manufacture.
  • Metric used for conventional CPU is raw clock speed. If performance is measured with  metrics like GFLOPS (Giga Floating Point Operations per Second) per dollar or power usage in Watts then results are not impressive. For example Tesla GPU is eight times more powerful than Intel Xeon processor in terms of GFLOPS but they cost more or less same.
  • In CPU most of the chip area is devoted to speculative execution.  Core i7 processor (Quad core) that is based on Intel’s Nehalem microarchitecture is fabricated using 45 nm technology. In Core i7, Only 9% of chip area is occupied by integer and floating point execution units. Remaining area is devoted to DRAM controller, L3 cache etc. But in a GPU most of the chip area is devoted to execution units.
  • GPU can never replace CPU. Parallism exibiting codes can be ported to GPU and efficiently it can be executed.

Intel's multi core chips like Atom, Core 2, Core i7 (Nehalem Architecure) processors and Xeon W5590 (Quad core, based on Nehelem Architecture) are all optimized for speculative execution. [2]

Programming GPU

General Purpose GPU (GPGPU) are the graphics-optimized GPUs which were commissioned to perform non-graphics processing. One needs in-depth knowledge in GPU hardware and software skill to run a algorithm in GPGPU. This feat is not possible to perform by typical software programmers. To enable the programmers to exploit power of GPU, NVIDIA developed CUDA (Compute Unified Device Architecture) tool kit, which helps software developers to focus on their algorithm rather than spend their valuable time in mapping algorithm to hardware, thus improving productivity. CUDA is available for C and Fortran programming languages. The next generation CUDA (code named Fermi) supports languages like C, C++, FORTRAN, Java, MATLAB and Python. CUDA tool kit is taught in more than 200 colleges through out the world. NVIDIA says it has sold more than 100 million CUDA-capable chips.

GPU will have several cores. For example, NVIDIA Tesla C2070 has 448 cores. Algorithms that are to be executed in GPU is partitioned into “host code” and “device code”. Host code will have one thread which is persistent through out the execution of algorithm. Where ever multiple operations are performed then, that portion is marked as device code by the programmer. At the time of execution of this region, multiple threads will be created (technical term is forked) and GPU cores will execute the code chunk. After completion, threads will be destroyed automatically. In GPU literature they use the term kernel for thread. They use technical terms like thread block, warp (32 parallel kernel together), grid (refer page 19 of  [2]).

NVCC is CUDA based C compiler developed by NVIDIA. Portland Group (PGI) has developed CUDA based Fortran compiler. GPU programming is not confined to CUDA toolkit alone. Software developers have come with their own packages to handle GPUs. OpenCL is a open GPU programming standard developed by Kronous Group, which is the same group that developed OpenGL. DirectCompute is a Microsoft product. HMPP (Hybrid Multicore Parallel Programming) workbench was developed by French based CAPS enterprise.

Image Processing Case Study

In CT scan, series of X-ray images are taken over a human body. Then 3D-images are reconstructed using the two dimensional X-ray images. Reconstruction is highly computational intensive. So obviously GPU are deployed for computation. It is reported that NVIDIA's GeForce 8800 GPU process 625 projections, each projection having a size of 1024 x 768, to produce 512x512x340 reconstructed volume size in 42 seconds. If medical-grade GPU is used then it can be reduced to 12 seconds. I presume 512x512x340 means 340 frames with a dimension of 512x512 pixels. Medical-grade GPU should have 32-bit precision end-to-end to produce accurate results. [3]

 Source

[1] White paper on “Easy Start to GPU Programming” by Fujitsu Incorporation, http://globalsp.ts.fujitsu.com/dmsp/Publications/public/wp-Easy-Start-to-GPU-Programming.pdf, (PDF, 280KB).
[2]  A white paper on “NVIDIA’s Fermi: The First Complete GPU Computing Architecture”  by Peter N. Glaskowsky,  http://cs.nyu.edu/courses/spring12/CSCI-GA.3033-012/Fermi-The_First_Complete_GPU_Architecture.pdf, (PDF, 1589 KB).
[3] White Paper on “Current and next-generation GPUs for accelerating CT reconstruction: quality, performance, and tuning” http://www.barco.com/en/products-solutions/~/media/Downloads/White papers/2007/Current and next-generation GPUs for accelerating CT reconstruction quality performance and tuning.pdf . 122KB
[4] http://www.jonpeddie.com/publications/market_watch/
[5] http://www.techspot.com/news/49946-discrete-gpu-shipments-down-in-q2-amd-regains-market-share.html



Sunday, 30 September 2012

Multicore Processors and Image Processing - I


         Quad-core based desktops and laptops have become order of the day. These multi-core processors after long years of academic captivity have come to the limelight. Study of multi-core and multi-processors, come under the field of High Performance Computing.
          Two or more processors are fabricated in a single package then it is called multi-core processor. Multi-processors and multi-process (single processor but multiple application are running simultaneously) are different from multi-core processors. Quad (four) and Duo (two) are commonly used multi-core processor by general public. With prevailing fabrication technology, race to increase raw clock power beyond 3GHz is nearing the end.  Further increase in computing power, is possible with deploying Parallel computing concept.  This is the reason why, all the manufacturers are introducing multi-core processors.

           Image processing algorithms are exhibits high degree parallelism.  Most of algorithms have loop(s) and they iterate through a pixel, row or an image region.  Let us take an example loop, which has to iterate 200 times.  Single processor will iterate 200 times and in a quad-core each processor will iterate 50 times only. Obviously quad-core will finish the task much faster than single processor. To achieve the stated speed, programs have to be slightly modified and multi-core optimized compilers are required.


Image size         Time to execute in (ms)
                    
        (Single-core)   (Multi-core)
256x256                   18                          8
512x512                   65                        21
1024x1024             260                        75

       Amount of time needed to twist  Lenna image is given in the above table. Lenna image with 1024x 1024 takes 260ms in single core and 75ms in multi-core to perform twist operation. From this it is very clear, in a single-core processor, as image size increases execution time increases exponentially but in multi-core processor in goes in linear fashion [1].

        Algorithms will exhibit either fine grain or medium grain or course grain parallelism. Smoothing, sharpening, filtering and convolution image processing functions operate on entire image and they are classified as fine-grain systems.  Medium grain parallelism is exhibited by Hough transform, motion analysis and the functions operate on part of an image.  Position estimation and object recognition comes under the class of course grain and parallelism exhibited is very less [2].  Algorithms can be split into memory-bound and CPU bound algorithms. In CPU bound algorithms no of cores helps to achieve linear-speed up subject to Amdahl's law [3].

       Multiprocessors are developed by Intel and AMD are very popular. Following table content got from [3].
Multicore Processor        No of core    Clock speed
Intel Xeon E5649                    12                2.53GHz
AMD Opteron 6220               16                3.00GHz
AMD Phenom X4(9550)          4                2.20GHz

        Intel Xeon was launched in 2002.  Intel introduced simultaneous multiple threading capability and named it as Hyper Threading. It has also launched Core 2 Duo T7250, which is a low voltage mobile processor runs at 2GHz. Sony’s Playstation 3 has a very powerful multi-core processor.  This is called Cell Broadband Engine and it is developed jointly by Sony, Toshiba and IBM.  It has a 64bit PowerPC processor connected by a circular bus to eight RISC (Reduced Instruction Set Computing) with 128 SIMD (Single Instruction Multiple Data) architecture based co-processors. SIMD is well suited  to exploit  fine grain parallelism.


In the part two series GPU and OpenMP Application Program Interface (API) wil l be discussed.


Source

[1] Greg Slabaugh, Richard Boyes, Xiaoyun Yang, "Multicore Image Processing with OpenMP", Appeared in IEEE Signal Processing Magazine, March 2010, pp134-138. But it can be downloaded from  http://www.soi.city.ac.uk/~sbbh653/publications/OpenMP_SPM.pdf (PDF, 1160KB )
[2] Trupti Patil, "Evaluation of Multi-core Architecture for Image Processing,"  MS Thesis,  year 2009, at Graduate School of Clemson University (PDF, 1156KB) www.ces.clemson.edu/~stb/students/trupti_thesis.pdf

[3] OpenMP in GraphicsMagick,  http://www.graphicsmagick.org/OpenMP.html


Note
  • Please read [1], as it is insightful and language used is simpler compared to standard technical articles.
  • Warped Lena and above table was adapted from [1] and not copied.  GIMP software was used to create the warped effect on the Lenna image.

Saturday, 22 September 2012

RoboRetina


        RoboRetina™ image sensor is capable of producing satisfactory images even in non-uniform illumination conditions. Existing cameras requires uniform illumination to produce satisfactory results. Photographers vary camera's shutter speed to capture brightly illuminated or poor lit scenes. Amount of light falls on the image sensor (in olden days it was film) proportional to duration of time shutter is open. Thus sun-lit scenes need short shutter opening duration. In natural light conditions both bright and shadow regions (poor lit) will be available simultaneously. But camera can be made to capture either bright region or dark region and not both. Our eyes are able to adjust to natural light or non-uniform light with ease. This feat is barely noticed by us, until this article is read.



A surveillance camera system that monitors airport will fail to detect persons lurking under the shadows because of non-uniform illumination. Intrigue Technologies Incorporation having head quarters in Pittsburgh, Pennsylvania, USA has come out with RoboRetina™ image sensor that tries to mimic our human eye. They have developed a prototype with a resolution of 320x240. This is capable of seeing things under the shadow. They have used standard CMOS fabrication process to build the prototype. The 320x240 resolution is sufficient for a robot mounted with RoboRetina to navigate on a cloudy weather. Brightness adaptation operation is carried out without the use of traditional number crunching. This feature will amuse as all, as we are conditioned to think performing digital processing only.  Array of photoreceptors is called image sensor. In the prototype each photoreceptor is added with an analog circuit that is stimulated by the light and they control the functioning of photoreceptor.

Silicon based integrated chips that tries to mimic working of eye is called Neuromorphic vision sensors. The term 'neuromorphic engineering' was coined by Mr.Carver Mead in mid 1980. He was working at California Institute Technology, Pasadena, USA. Analog circuit based vision chips were developed by University of Pennsylvania, in Philadelphia, USA and Johns Hopkins University, in Baltimore, USA. Analog circuit present in the chip vary the sensitivity of the detector depending upon the light fallen on the detector. This concept only is extended in RoboRetina. Here light falls on the surrounding detectors also play a vital role in sensitivity adjustment of photo detector. Success of RoboRetina depends on the accurate estimation of illumination field.

Around 2005 itself Intregue's eye was available as an Adobe Photoshop plug-in. It was rightly named 'Shadow Illuminator'. Medical X-ray images were taken as input and software was able to reveal unclear portions of medical image. Photographers use this software to do correction in their photos which is technically called as 'Enhancement'. Software that does not use RoboRetina technique produces "halo" effect on the sharp discontinuities.

        CEO of Intrigue Mr. Vladimer Brajovic is an alumnus of The Robotics Institute at Carnegie Mellon University, Pittsburgh, USA. RoboRetina got the Frost & Sullivan Technology Innovation Award for the year 2006. Frost & Sullivan [4] is a growth consulting company with more than 1000 clients all over the world. This award is a feather in the cap for Intrigue Technology. After the emergence of neuromorphic sensor concept RoboRetina is the first breakthrough. Let us hope this will lead to the Autonomous Vision System revolution which will greatly enhance the performance of automotive systems, surveillance systems and unmanned systems.


Source: 

[1] Intrigue Technologies, The Vision Sensor Company, Press Release, http://www.intriguetek.com/PR_020407.htm
[2] Robotic Vision gets Sharper by Prachi Patel Predd - IEEE Spectrum March 2005  http://spectrum.ieee.org/biomedical/imaging/robotic-vision-gets-sharper
[3] Photo Courtesy: https://www.intrigueplugins.com/moreInfo.php?pID=CBB
[4] Frost & Sullivan, http://www.frost.com

Thanks:
       I want to personally thank Mr. B. Shrinath for emailing  me 'RoboRetina'  article, that was published in spectrum online.

Wednesday, 12 September 2012

Biscuit Inspection Systems

         We all know if a company wants to stay in business then it has to manufacture quality products. Visual inspection of products is well known method to check quality. Earlier days trained human beings were used for inspection. Nowadays machine vision systems are employed. The reasons are many. It can work throughout day and night without a sign of fatigue. It can surpass the human inspection speed. It can have a wide dynamic range camera which will help to differentiate even a small change in colour. With human inspection system to check all the manufactured products  is not cost effective.  Few samples from a batch of products are taken and quality testing on the samples are carried out. Statistical methods are employed to estimate the amount of failed products from the failed samples. But with online one can check each product individually [1].

Consumers expect high quality biscuits to have consistent size, shape, colour and flavour. Size and shape improves the aesthetics of biscuits. Colour and flavour has role on the taste of biscuits. Electronic nose can be used detect flavour. In tea processing industry electronic nose is employed and reported in scientific papers. Articles related to employability of electronic nose in biscuit manufacturing are to be found in Internet. It is a common knowledge that any biscuit that is over baked will be in dark brown in colour and under baked will posses light brown colour. It is technically called as 'Baking Curve'. Image processing techniques are used find the shades of biscuit colour and classification is carried out by artificial neural networks. This method was developed in way back of 1995 by Mr. Leonard G. C. Hamey [8]. In a typical cream sandwiched biscuit, top and bottom layers are biscuits and middle layer is made up of filling like cream or chocolate. Cost of filling is more than the biscuits. Over filling means less profit for the company. So much care is taken to maintain correct size of biscuit and filling height.


In a typical production line, every minute 30 rows (120 biscuits form a row) of baked biscuits passes on a conveyor and all of them has to be inspected. This results in checking of 3600 biscuits per minute. Length, width and thickness are measured with an accuracy of ± 0.17 mm. In addition to this check for cracks and splits are carried out. If a biscuit fails to qualify the then it is discarded. 

In a typical biscuit inspection system, three cameras will be used to grab the image of moving biscuits which are illuminated by special fluorescent lights. Grabbed images are processed to get size and shape. A fourth camera that is in 45 degree angle used to capture the laser light that falls on the biscuits.  When multiple laser line images are combined it gives a 3D shape of biscuit. [2, 4]. For sample inspection pictures go to [7] and download the pdf file.  The cameras are required to operate in a 45 degree centigrade ambient temperature. The captured images are transferred via GigE to the inspection room which is 100m away from the baking system. Special software displays the captured images on the computer screen with necessary controls. The images are stored for four years.

                                   
List of Vision system manufacturers
o Machine Vision Technology in United Kingdom [2] ,
o Hamey Vision Private Limited in Australia [4]
o Q-Bake from EyePro systems [6]

In India way back in 2002, CEERI (Central Electronics Engineering Research Institute) present in CSIR Madras complex developed a Biscuit Inspection system with a budget of Rs. 20.7 Lakhs. (1 Lakh =100,000 and   Rs. 50 approx. 1US$). It got the fund from Dept. of Science and Technology, Govt. of India and partnered with Britannia Industries, Chennai to get requirements [5].

Source
3. Biscuit Bake Colour Inspection System - Food Colour Inspection, http://www.hameyvision.com.au/biscuit-colour-inspection.html
4. Simac Masic,  http://www.simac.com 
5. CMC News, July – December, 2002, http://www.csirmadrascomplex.gov.in/jd02.pdf 
6. Q-Bake, Inspection Machine for Baked Goods, http://www.eyeprosystem.com/q-bake/index.html 
8. Pre-processing Colour Images with a Self-Organising Map:Baking Curve Identification and Bake Image Segmentation,  http://www.hameyvision.com.au/hamey-icpr98-som-baking.pdf

Courtesy
I want to thank Dr. A. Gopal, Senior principle scientist, CEERI, CSIR Madras Complex, Chennai, who gave a lecture on Biscuit inspection system in National level workshop on “Embedded Processors and Applications” held at SRM University, Chennai on 31-Aug-2012. He inspired me to write this article.






Thursday, 30 August 2012

Mirasol Display


Qualcomm engineers have developed MEMS (Micro Electro Mechanical System) based Mirasol display. It mimics the way butterfly or peacock feathers produce brilliant, iridescent and shimmering colours. Light reflected from an image, for instance paper is more appealing to human eye than the backlit displays [1]. Due to the use of light reflection, mirasol display's readability does not diminish even in the presence of direct sunlight. A report from Pike research [2] states that mirasol is more energy efficient than other display device technologies. Mirasol displays are capable of displaying video. The only hiccup is cost. Let us hope it will come down in future. I was introduced to mirasol display by an article published in MIT's Technology Review magazine [7].

Microprocessors, memory chips diligently follow Moore’s law, so in short duration we get phenomenal capacity improvement. In the case of display system, improvement is in snail's pace [4]. Matured technologies like Liquid Crystal Displays (LCD) and Light Emitting Diodes (LED) are lit from back. The display market is dominated by back lit (60%) and transflective (40%) type LCDs. Combination of back lit and reflective technology is called transflective. The remaining displays use OLED and they constitute 5 percentage of entire display market [5].  Above discussed technologies consumes more energy than the reflective type. E-ink's Triton is another reflective display technology that rivals mirasol [3] with good colour capability. Earlier e-ink based ereaders like Kindles and Nooks were limited to black and white only.


Interferometric Modulator (IMOD) is the building block of mirasol display. IMOD is made up of top thin-film and height adjustable (deformable) reflective membrane supported by a transparent substrate. Incident light is reflected from thin-film and as well as from reflective layer. Depending upon the height (i.e. distance between thin film and reflective membrane) constructive interference and destructive interference occurs. So few colors are amplified and others are diminished due to destructive interference. For example, if red colour gets constructive interference then that spot will appear as red. This arrangement can be thought of a optical resonator. IMOD can take only two states or positions.Height can be adjusted(either minimum level or maximum level) by applying voltage between the reflective layers. When all RGB subpixels are in minimal position then ultra-violet ray only will be reflected and other colours are lost in interference. As humans cannot perceive ultra-violet, it appear as a black dot for them. The deformation required will be in the range of few hundred nanometers and time taken will be in the range of microsecond. Due to this only displaying a video is perfectly possible.

A typical mirasol display will be 5.3" (measuring the diagonal of the screen) with 800x480 with 223 pixels per inch. With same dimension but with XGA resolution screens are also available in the market.  But the cost is not pocket friendly. Following products use mirasol displays: Kyobo ereader, Hanvon C18, Bambook Sunflower, Koobe Jin Yong reader, and Bichrome displays. In one blog it was mentioned Kyobo ereader has stopped using mirasol. One has to check its veracity.

Environment friendliness of the product is not measured by usage power consumption alone. The entire lifecycle of the product starting from mining the ore for minerals, manufacturing, assembly, packaging, shipping and ending with disposal, amount of energy used is monitored and noted. In usage and life cycle analysis IMOD based mirasol displays outperform conventional LCDs and LEDs [5]. It is estimated that there are four billion mobiles devices in the year 2008, with LCD and OLED displays. If all of them switched to IMOD display then 2.4 Terawatt-hour of power can be conserved per year. It also noted only 10% of light generated reaches the human eye in LCD and remaining is absorbed by components present in the system itself.

Source:


  1.  Mirasol display, http://www.mirasoldisplays.com
  2.  Pike Research, http://www.pikeresearch.com 
  3. Qualcomm's Mirasol Display Could Mean New Color Nooks and Kindles, by Sascha Segan, http://www.pcmag.com/article2/0,2817,2400889,00.asp
  4. Mirasol Display Technology Could Be the Screens of the Future, http://www.tomshardware.com/news/mirasol-mems-e-ink-display-screens,14867.html
  5.  Energy Efficient Displays for Mobile Devices, Published 4Q 2009, (Pike Research - Energy Efficient Displays_Final.pdf) http://www.mirasoldisplays.com/sites/default/files/resources/doc/Pike%20Research%20-%20Energy%20Efficient%20Displays_Final.pdf
  6. (Picture Courtesy) Qualcomm Mirasol display for color e-readers inspired by butterflies http://www.robaid.com/bionics/qualcomm-mirasol-display-for-color-e-readers-inspired-by-butterflies.htm 
  7. MIT Technology Review magazine, http://www.technologyreview.com/magazine/


Saturday, 25 August 2012

GigE Vision

       GigE Vision  is a camera standard for real-time machine vision.  Automated Imaging Association (AIA)  developed this standard and was released in May 2006. Within a span of four years the number of units shipped was comparable to the rival 'Firewire' and 'Camera link'  standards.  Camera link is from AIA and firewire is Apple's version of IEEE 1394 standard. After the inception of GigE Vision, revision 1.1 and 1.2 was released. In 2011 Gig E Vision 2.0 was released. It supports 10 GigE, IEEE 1588 Precision Time Protocol, JPEG, JPEG2000 and H.264 image compression standards.

GigE Vision Merits

  • Supports common camera control interface GenICam. European Machine Vision Association (EMVA) has developed GenIcam.
  • It has plug & play, high data transfer rate and low cost cabling. All the above helps system integrators a lot.
  • It has wide range of camera for various applications
  • cable length supported is around 100m. This feat is not possible by other standards like Firewire, USB3, camera link and coaXpress.


Camera capture system
          Real time applications  do not  necessarily need  ultra fast acquisition. But images should be acquired and processed within the stipulated time.  Reliability of  a real time system depends upon the parameters like jitter and latency. Latency is normally understood as time delay.  Here it means the time taken to complete a task from start to finish. Jitter gives the time variation when  the same task is repeated multiple times.

         Camera capture system consists of a PC with Network Interface Card (NIC), camera and Ethernet link to connect the PC and camera.  Hardware or software trigger can initiate camera to capture image. As expected, hardware trigger has  lesser latency.  Camera-head process the trigger and start the sensor to accumulate incoming light and convert into electrical charges. These accumulated charges are converted into digital and to be placed in the camera buffer memory. This process is called 'readout'. Entire buffer content is transferred to the PC by breaking them into small chunks and adding Ethernet header for each chunk. NIC receives the packet and raise an interrupt to CPU. If CPU is not busy then it will process the packet  and put the chunk into the computer memory. The time taken from start of trigger to reception of last packet of the image is included to calculate the latency.

GigE Standard
          Single GigE camera connected to PC via direct Ethernet link or multiple GigE camera can be connected to PC through an Ethernet switch. Avoid using hub to multiple cameras

          A dedicated wire or electronic signal which is directly connected to input pin of the camera can act as a hardware trigger. To avoid false start,  trigger debouncing method is incorporated. The price we pay for safety is one microsecond latency. An application software can send a trigger via camera configuration channel and it has lesser responsiveness than camera pin. If a software trigger comes from an application that runs on a non real-time operating systems (ex. Microsoft Windows)  then jitter may vary from few hundredth of microseconds to few milliseconds. So it better to avoid software trigger mode. There are three types of exposures  viz. free running mode, horizontal synchronous mode, reset mode and jitter varies from one frame to one pixel depending upon the type of exposure. Latency of camera is depends on exposure time and sensor readout time. Biggest contributor of latency will be readout time. A 60 frame per second camera takes 16ms to do readout.

           The normal size of Ethernet frame (A packet in physical layer is called frame) will be 1500 bytes. Jumbo packets with a size of 9000 to 16000 bytes are available. A chunk inside a frame will be called as payload. Then GVSP (GigE Vision Stream Protocol) header, UDP (User Datagram Protocol) header, IP header and at last Ethernet header are added to payload. Appended four byte Cyclic Redundancy Code (CRC) will help to detect any errors that creped while the packet was in transit. 8000 byte sized packet will take 16.3 microsecond to get transferred over network.

            Without the involvement of CPU, transfer of data from NIC to memory can be accomplished using Frame Grabber. It contains powerful Direct Memory Access (DMA) engine that helps to reduce latency and jitter to a minimum. Fortunately or unfortunately GigE standard do not have frame grabber. GigE software driver takes care of the role of frame grabber. So choice of GigE software driver plays a vital role in the performance.

 Performance Improvement Tips
  • Few network adapter allow 'interrupt moderation'. This instead of raising an interrupt for every packet arrival, it waits for certain number packets to arrive then it raises an interrupt. This helps to reduce CPU overload.
  • 9000 byte sized jumbo packets are best even though networks may support 16000 byte size jumbo packets. The reason is CRC calculation above 9000 bytes is very cumbersome.
  •  Increase the receiver buffer size as much as possible. This in turn will reduce CPU usage.


A typical GigE camera will have physical dimension of  5cm x 3cm  x 7cm , with 1400 x 1024 image resolution capable taking 75 frames per second (fps). Image exposure duration will be 100 microseconds. The data can be transported over 100m using CAT-5e or CAT-6 cables. It will have a mass of around 120 grams. Monochrome, colour and high speed cameras are available.

Source:

Monday, 13 August 2012

Digital Visual Interface


         Digital video generated by computers are converted into analog signals (Red, Green, Blue video signals) by video graphics card and fed to CRT monitor. As present day plasma, LCD flat panels are digital in nature, generated analog signals are once again converted into digital and fed to display devices. This method is inefficient due to following reasons. First, Digital-to-Analog and Analog-to-Digital process causes loss of image quality. Second, a digital interface can make entire conversion process as well as associated hardware redundant.  A low cost, universally accepted as well as versatile digital interface evolved and it was called Digital Visual Interface (DVI). This was extended for high end devices and called as High-Definition Multimedia Interface (HDMI).


Before getting into the details of DVI technology we have to learn about the need for the technology.

Resolution Name                         Pixel Resolution
Video Graphics Array (VGA)             640 x 480
WVGA                                              854 x 480
Super VGA (SVGA)                         800 x 600
Extended Graphics Array (XGA)     1024 x 768
WXGA                                           1280 x 768
Super XGA (SXGA)                      1280 x 1024
WSXGA                                        1600 x 1024
Ultra XGA                                     1600 x 1200
High Definition TV (HDTV)           1920 x 1080
Quad XGA (QXGA)                     2048 x 1536

Table 1. Resolution Name and Pixel Resolution (Ref. [1], [3])

Resolution name and other details are specified by Video Electronics Standards Association (VESA). The monitor refreshing rates available are 60Hz, 75Hz and 85Hz. Higher the refreshing rate is always better. Now we will calculate the amount of data digital interface has to carry from the computer to display device. 

Data carried = No of horizontal pixels x No of vertical pixels x refreshing rate x Blanking

For a monitor with SXGA resolution and 85Hz refreshing rate, will generate 55 Mega pixels(Mp) data per second for one colour. For three colours it will be 155 Mp per second. This will amount to whopping 1.6 Gbps data rate (155 M pixels and each pixel with 10 bit representation; Yes, 10 bits). Beyond two Gbps it is not possible to send through twisted pairs. This phenomenon is called "Copper Barrier". Data generated by QXGA monitor with 85 Hz refreshing rate is 350 Mp per second. The required bit rate exceeds the copper barrier. So two links are used instead of one. Coaxial cables, Waveguide are other transmission media that can handle two Gbps data rate with ease. But they are expensive compared to twisted pair. In DVI 1.0 specification they have not mentioned the term "twisted pair" explicitly. This term is used in the reference material [1].




                   
In April 1999 DVI 1.0 specification was released by Digital Display Working Group (DDWG).  Its Promoters are Intel, Compaq, Fujitsu, HP, IBM, NEC and Silicon Image. Transition Minimized Differential Signaling (TMDS) technology used in DVI was developed by Silicon Image Inc and connecters were developed by Molex Inc. The first digital standard "Plug and Play" was developed by Video Electronics Standards Association (VESA).  Few years later, "Digital Flat Panel" interface was developed by consortium of Compaq Corporation and its associates. Due to various reasons both standards were not very successful. DVI is backward compatible with analog VGA, Plug and Play and Digital Flat Panel.


DVI have two types of connectors namely DVI-Integrated (DVI-I) and DVI-Digital (DVI-D). 29 pin DVI-I have allotted five pins for analog video and 24 pins for two digital video links. Analog video pins are Red, Green, Blue, Horizontal sync and analog ground. Digital video pins can be grouped into data channels and control signals. There are six pair of data channels to carry  two R’, G', B' colour signals. The difference between RGB and R'G'B' will be discussed in upcoming blog post. Remaining 12 pins carry clock signals and other things. 24 pin DVI-D is designed to carry digital video only. 

TMDS is an electrical technology used to transmit data from computer to display device. Twisted pairs are susceptible to noise and electromagnetic interference (EMI). In differential signaling, one and zero are encoded not in absolute terms but in relative terms. This makes them to be immune to noise. A sharp spike in one twisted pair can create an EMI in adjacent twisted pair. So it becomes necessary to reduce the steep transition in signals.  This is done at the cost of 25 percent increase bit representation (10 bits instead of 8 bits). Earlier to TMDS, Low Voltage Differential Signaling (LVDS) was used in digital interface standards. LVDS was developed by National Semiconductors to transfer data between notebook computer’s CPU to LCD display. This is optimized for short cable length and not for long lengths.

References
  1. “White paper on DVI”, by Infocus Incorporation, available Online from http://electro.gringo.cz/DVI-WhitePaper.pdf
  2.  DVI specification from DDWG,  available Online from http://www.ddwg.org/lib/dvi_10.pdf
  3.  Keith Jack, “Video Demystified: A handbook for the digital engineer”, 5th edition, Publishers:- Newnes , 2007. ISBN: 978-0-7506-8395-1, Indian reprint 978-81-909-3566-1. Rs. 800.
  4.  Pin diagrams of DVI, available Online from http://www.te.com/catalog/Presentations/dvipresentation.pdf

Tuesday, 24 July 2012

Super Hi-Vision

         British Broadcasting Corporation will test broadcast London 2012 Olympics footages in the latest Super Hi-vision (SHV) television format. This format has 16 times the resolution of the existing High Definition TV format and with 22.2 multichannel surround sound. This provides a amazing picture quality and viewers feel a strong sense of reality. One can watch Olympics in Super Hi-vision theaters in  BBC Broadcasting House in London , BBC Pacific Quay in Glasgow and  National Media Museum, in Bradford.  SHV was developed by NHK (Nippon Hōsō Kyōkai) in English it is Japan Broadcasting corporation.


This ultra high definition televsion format has 4000 scanning lines and it contains 7680x4320 pixels per frame. SHV camera uses 8 megapixel CCD camera. It uses four channels Green1, Green2, Red, Blue, instead of traditional Red, Green, Blue channels. Thus around 32 mega pixel data is generated for every frame. It uses MPEG-2 video compression format with 4:2:2 sampling format. AVC/H.264 codec are reconfigured to transport SHV signals to mobile devices. SHV has a angular resolution 40 to 50 degrees, which gives the sense of realness. It has 60 frames per second instead of the conventional 25 frames in PAL and 30 frames in American NTSC colour system.

It needs 24 speakers to create a 3D spatial impression to the viewers. This helps to augment the sense of reality. It has 9 speakers in upper layer frequency, 10 speakers for middle layer frequency, three speakers in lower layer frequency and at last two speakers for  low frequency effect . It uses 48 KHz sampling rate, 24 bit Pulse Code Modulation which results in 28 Mbps or Dolby-E systems having 7 Mbps.

Researchers at NHK used to phychological methods to figure out the relationship between sense of reality verses the viewing angle. They found  higher the viewing angle more the feel the reality. Viewing angle below 40 fail to provide any reality feel. From 40 to 80 sense of reality increases with the viewing angle. Increase in viewing angle beyond 80, do not dramatically increase the 'feel'.

In 1953 televisions had a screen size was 12 inch.  Fourteen inch colour  TV's  emerged after 1960. Size grew to 20 inch in 1975 and it reached 29 inch in 1990. Beyond this size conventional Cathode Ray Tube (CRT) cannot be manufactured. So 50 inch LCD or LED Flat screens were developed in 2006. Viewing distance and resolution has a direct relationship. Optimal viewing distance of conventional TVs are four to six times the diagonal of the TV. For a TV having a diagonal size of 21 inch, needs a viewing distance of seven feet.  Thus size of the room limits the size of TV screen.  So it becomes necessary to increase resolution to have bigger screens at home.

An uncompressed SHV signals require around 50 Gbps and with digital compression bit rate requirement will be between 200 Mbps to 400 Mbps.The present 12 GHz systems can handle upto 52 Mbps. So they move to 21GHz range of frequency (21.4 GHz-22.0 GHz). But at this frequency rain act as a spoilsport. Non-real time broadcasting is used to combat rain attenuation effects.

 Khushu National Mueseum's  SHV theater was inaugurated in October 2005.  This  is the first time  SHV  system was deployed for public use.

SOURCE:
  • Steps Towards the Practical use of Super Hi-vision by M. Maeda et al from NHK Science and technical Research laboratories.
  • "Super Hi-Vision -  research on future ultra HDTV system", article by Masayuki Sugawara, NHK, EBU Technical Reveiw - 2008 Q2
  • S.Sakaida, N. Nakajima, A. Ichigaya, and M. Kurozumi, "The Super Hi-Vision Codec," Proceedings of ICIP 2007, pp. 21-24.
  • Transmission Techniques for Broadcast Satellites in the 21-GHz Band aiming for "Super Hi-Vision" Broadcasting, Broadcast Technology No.24, Autumn 2005 pp 8--13
  • http://www.bbc.co.uk/blogs/bbcinternet/2012/07/super_hi_vision_ultra_hd.html

Saturday, 14 July 2012

Embedded Vision Systems

       Microsoft Kinect product is used as  novel input interface for 'XboX 360' game console.  Kinect is a perfect example for embedded vision system. Eight million Kinect sold, just within two months of the launch. It shows the  power of embedded vision system.
     Embedded Vision can be defined as a microcontroller based system that incorporate vision sensor (ex. camera) and able to understand the environment through the sensor. Digital camera is a microcontroller based system that contain vision sensor . Outcome of digital camera will be pictures. But camera is incapable of interpret the pictures it took. So digital camera is NOT a embedded vision system. System on Chip (SoC), Graphical Processing Unit(GPU) , Digital Signal Processor(DSP) and Field Programmable Gate Array (FPGA) can be used in place of Microcontrollers. General purpose personal computer strictly No-No. Smart phones, tablet computers and surveillance system can be upgraded to embedded vision system.
Applications:
  • To find a child that is struggling in swimming pool.
  • To find intruder(s)
  • To detect whether lane change has occurred  or not and if occurred then warn driver of automobile.
Embedded vision system will carry out following three functions
1. Image acquisition and optimization
  •  Noise reduction, image stabilization and colour space correction
  • Outcome of optimization stage, need not be aesthetically pleasing pictures but they should be easily processable by further stages.
2.  Building objects through pixels
  •  First level operations used are  Image  filtering, Haar filters, Edge detection, Histogram, Optical flow, Erosion and dilation, Thresholding.
  • Second level operations used are Connected component labelling, contour tracing, clustering and hough transform

     3. Object analysis and interpretation
    • object movement tracking, classification, obstacle detection
    • Kalman filters as predictive filters, hidden markov models correlation, finite state models, neural networks All the above operations are  computation intensive. Extensive DSP algorithms are also used.                                                                                                                                                         
    Embedded Vision Alliance  is a organisation that look into every aspect of embedded vision.  It's website address is  http://www.embeddedvision.com/ . In the website go to 'Industry analysis' in that go to 'Market Analysis'. This section seems to be very informative. 'News' section gives a reasonable  amount of information.  'Technical Articles' section needs registration. Most of the website content, directs us where information available than providing them. There are no advertisement section. Website has professional look. It is worth visiting the site.

     The following link gives how much importance IEEE gives embedded vision technology.
    A note on DSP
    linear filtering is an convolution operation. After the advent of Fast Fourier Transform (FFT) it became desirable to transform the signal into frequency domain and multiply with desired frequency response (In time domain it is called impulse response)  and transform back the resultant signal in time domain. If it is a image signal will be transformed to spatial-frequency domain and after multiplication, resultant image will be converted back to spatial domain.  FFT was proposed by Coolie and Tukey in 1965. It was too mathematical. In 1967 Tom Stockham and Charlie Rader gave flow graph representation for FFT. I thing it is called 'Butterfly diagram' nowadays. 
    Courtesy:
    • Eye Robot: embedded vision the next big thing in DSP, by Brian Dipert and Amit  Shoham,  IEEE  Solid State Circuits Magazine,  Vol. 4, No. 2, Spring 2012. [doi : 10.1109/MSSC.2012.2193077]
    • A note on DSP from the above magazine issue page number 36
    • Special thanks to Mr. B. Srinath