Sundar Pichai, Chief Executive Officer (CEO) of Google introduced a new technology called Google Lens in the Google IO developer Conference. It was held in the period 17 to 19th of May 2017 at San Francisco, California state, United States of America. Google IO developer conference is an annual event. It showcases technical in-depth sessions focused on building Web, mobile, and enterprise applications pertaining to Google products and open sources such as Android, Chrome and Chrome OS [1].
During the demo of Google Lens technology, Mr. Sundar Pichai captured the photo of a flower with a Lens technology enabled Smartphone [2]. The Lens was able to identify the flower. Armed with Lens enabled smart phone, an English man can translate Japanese bill board and understand the content. Take a snapshot of business's storefront and Google Lens will give associated information about the store. In essence Google Lens can understand a picture and act accordingly. Thus Google has provided a new way of interaction with the mobile device. The Google is planning to integrate this technology to Google Assistant. It is expected to be released as an app in upcoming Smartphones.
Fig. 1. Snapshot of Google Lens enabled Smartphone. Courtesy [2] |
The Google Lens technology grabbed my attention and searched for more technical information. But to my surprise, content in print and digital media looks like a transcript of Sundar Pichai's speech. His speech was not intended to educate general public but instill interest in Lens technology. I strongly feel media has to make the content palatable to the general public rather than just disseminating the content. Deconstruction of technology is very important otherwise public will treat technology as a magic. Then they may equate technologists with mythical heroes. This exercise will not help Scientific Spirit (rationality, asking questions like why, how and what) to flourish. This post original intention was to deconstruct the Lens technology. With limited resources (time, knowledge, and presentation skills) it seems to be near impossible. So, it was decided to provide the evolution of Internet pertaining to Google Lens technology.
The Defense Advanced Research Projects Agency (DARPA) is an agency of the U.S. Department of Defense responsible for the development of Internet technologies. The Internet technology was transferred to US universities. Computer science students liked to work on computer networking. As general public branded them as ‘nerds’, they formed a virtual society and posted their content on Internet. Tim Berners-Lee working in CERN laboratory came with a proposal called World Wide Web (WWW). He was able to connect documents across the computers. For this he developed a concept called 'hypertext’. In electronic documents, under the hypertext a line will be running and invariably hypertext will be in blue colour.
Falling price of computers and emergence of WWW created a craze among the public and lot of them started posting their content. Unless I know the address of the document I cannot access it. This created a need to index documents in the net. Initially scientist tried to mimic library system. It failed miserably. So, they moved to a concept called ‘Search Engine’. Th search engine will access a document in the Internet and try to figure out important words or phrases in that. These important words are called ‘keywords’. This keywords and associated documents are stored in a large database. Thousands of Web pages may be devoted to a popular actress. If a user types the name of the actress (keyword for the search engine) then the search engine cannot give all the 1000 pages present in its database. Instead it will offer best pages among them. This technique is called 'Page ranking'. PhD students Larry Page and Sergey Brin developed a search engine called ‘Google’ at Stanford University. Users felt that Google page ranking scheme is works well and made it as defacto search engine. Now, people use "Google it" instead of telling “search it in the Internet”.
Success of Google lies in the monetization of their search engine facility. Google search result contained sponsored links and list of Web links pertaining to the keyword typed by the user. Whenever user clicks the sponsored link Google got commission from the sponsor of the link. Companies was able to focus on indented audience. It was a good case of Win-Win situation. Instead of limiting itself to provide list of documents associated with users' search keyword (say actress name), it started offering links that contain image, video of the actress. . When Google access a Web page, it parses the text to find the keyword and at the same time it collects the images present in the Web page. It will make a low resolution version of the images and try to associate it with the text. That is why when someone types the name of the actress it automatically displays photos in a rectangular grid.
As research progressed, scientists were able to parse an image like text and extracted important features. This method is called 'bag of visual words' [3]. For extraction of features Scale Invariant Feature Transform (SIFT) is extensively used. (Processing an image and extracting feature will fall under Computer Vision and not Digital Image Processing (DIP). Processes like conversion of high resolution image into low resolution or converting a colour image into gray image fall within the domain of DIP). Extracted visual features and the associated row resolution image and links to images are stored in a database. So, when we upload an image as a search keyword to Google, it can offer similar images as result. This facility is called ‘reverse image search’. Much of the public is unaware of this. Actually one can upload a unknown female singer's photo (say Turkish) and get details about her (why not male singer?). Please try one time!
In the mean time cellular phone concept got popularized. Then the concept of Smartphone came and grabbed the attention of technophiles. Smartphone is actually a small computer with communication capability (data and voice). Later the Smartphone were added with lot of frills like GPS (Global Positioning Systems). This helps to navigate with using Smartphone. The Google maps service with GPS helps a lot to navigate comfortably in a unknown city. Google introduced ‘Android’ Operating System (OS) meant for Smart Phones. It was a variant of UNIX OS. It became very popular. They started developing Apps (applications or executable programs or softwares) for their OS and encouraged third parties to develop Apps. Google started ‘Play Store’ platform for the third party vendors to upload their apps.
Google started focusing on improving the interaction between mobile (i.e. Smartphone) user and mobile. They developed an app called 'Google Assistant' which can interact and offer timely inputs to users. For this Google extensively uses Artificial Intelligence (AI) techniques. The AI uses algorithms that are computationally very intensive. So, Google has developed Integrated Chips specially. They are not going to sell it instead they will offer AI services to third party for a fee.
In a course of 50 years of time Internet technology has changed its functionality a lot. First it was a technology used to protect defense assets (information). Later it became a curious toy for programming geeks. It also helped to create a virtual community among educated youngsters. Social media owes lot to them. Around the year 2000, Web became a much sought after investor destination. After the arrival of Smartphone, 3G (Third Generation) mobile standards and falling prices created a new wave. People with limited reading and writing skills are actively share photos, content and views using Apps like Facebook and Twitter. Internet has become an necessity like water and electricity for a common man. Now, image recognition and AI is integrated into smart phone. Let us wait was few more years to find what changes it brings into the society.
References
[1] Google I/O - Wikipedia | https://en.wikipedia.org/wiki/Google_I/O
[2] Google Lens announced at Google I/O 2017 [Video] |https://www.youtube.com/watch?v=mh0ifHt3LEM
[3] Recognizing and Learning Object Categories | http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html