Thursday, June 16, 2016

Highlights and new discoveries in computer vision, machine learning, and AI (May 2016)

In the fifth issue of this monthly digest series you can learn about who or what is Parsey McParseface, highlights from ICRA and ACM CHI, the next generation of Siri, and much more.

Viv: Beyond Siri

At TechChrunch Digest, Siri creator Dag Kittlaus gave the first public demonstration of a new AI voice assistant primed and ready to surpass the capabilities of his previous famous program in terms of complexity and understanding. Kittlaus promised that the virtual system would "breathe life into the inanimate objects of our life through conversation."

Unlike Siri, Viv will not be tied to the iPhone alone, but could potentially feature in everything from cars to TVs, with Viv Labs already partnered with approximately 50 other companies, including Uber and GrubHub. Kittlaus hopes it will be "the intelligent interface for everything".

Kicking the presentation with a cliched, general question about the weather, Kittlaus soon moved on to putting complex questions to Viv, such as "will it be warmer than 50 degrees near the Golden Gate Bridge after 5pm the day after tomorrow?" which it could handle with ease. Not only could it deal with these queries quickly, it could also process a request to make a payment of $20 to one of his friends through Venmo.

While the time period for a proper Viv roll-out was not given, early integrations can be expected later in the year.

Tech giants Apple, Google, Amazon, Facebook and Microsoft, have all invested heavily in artificially intelligent assistants in recent years giving us a preview of what is to come. Viv is being marketed as the new frontier in how we will use our devices and interact with the digital world. While the time period for a proper Viv roll-out was not given, early integrations can be expected later in the year.

See the full unveiling for yourself:

Sources: TechChrunch, NewsTalk.

ICRA 2016

This year's edition of IEEE 2016 International Conference on Robotics and Automation (ICRA) was held from 16–21 May in Stockholm, Sweden. ICRA is one of the leading international forums for robotics researchers to present their work. As usual, the latest robotic inventions were often tightly coupled with AI and machine learning capabilities. This year, plenary talks featured Tamar Flash (Weizmann Inst. of Science), presenting on Controlling motor behavior: humans, brains and robots; Fredrik Gustafsson (Linköping U.) presenting on Project Ngulia: from Phone to Drone; Roberto Cingolani (Istituto Italiano di Tecnologia) presenting on Nanotechnology and materials science for humanoids; Leslie Pack Kaelbling (MIT) presenting on Intelligent Robots Redux; and Claire Tomlin (UC Berkeley) presenting on Safe Learning in Robotics.

Some highlights also included Karl Iagnemma (MIT) discussing the implications of moving autonomous driving research from university labs to corporate product development timelines. Iagnemma also presented a case study about the formation of nuTonomy, an MIT spin-off focused on software development for fully autonomous passenger vehicles. Ken Goldberg (UC Berkeley) addressed the problem of robust grasping in robotics, and how we could leverage deep learning and cloud robotics to close the vast gap between human and robot dexterity. Lastly, Hong Qiao (Chinese Academy of Science) spoke about brain-inspired robotic vision, motion and planning. Qiao has developed a visual recognition system that could be combined with a model for movement control calibration to achieve robust visual recognition, fast response times, high-precision movements, and flexible learning capabilities.

Source: ICRA.

SyntaxNet: Parsey McParseface

Google added a new tool to its SyntaxNet framework to help computers parse and understand English sentences, dubbed "Parsey McParseface" by its engineers.

SyntaxNet is a framework for what's known in academic circles as a syntactic parser, which is a key first component in many natural language understanding (NLU) systems. Given a sentence as input, it tags each word with a part-of-speech (POS) tag that describes the word's syntactic function, and it determines the syntactic relationships between words in the sentence, represented in the dependency parse tree. These syntactic relationships are directly related to the underlying meaning of the sentence in question.

The structure shown in the image above encodes the fact that Alice and Bob are the subject and object respectively of saw, in addition that Alice is modified by a relative clause with the verb reading, that saw is modified by the temporal modifier yesterday, and so on. The grammatical relationships encoded in dependency structures allow us to easily recover the answers to various questions, for example whom did Alice see?, who saw Bob?, what had Alice been reading about? or when did Alice see Bob?.

One of the main problems that makes parsing so challenging is that human languages show remarkable levels of ambiguity. SyntaxNet applies neural networks to the ambiguity problem. An input sentence is processed from left to right, with dependencies between words being incrementally added as each word in the sentence is considered. At each point in processing many decisions may be possible—due to ambiguity—and a neural network gives scores for competing decisions based on their plausibility.

Parsey McParseface achieves state-of-the-art performance on POS tagging, dependency parsing and sentence compression, as outlined in Andor et al. (arXiv pre-print). The code is available as part of SyntaxNet on GitHub.

Sources: Google Research Blog, VR-Zone.


A team of researchers at the University of Washington developed an artificial hand that is built by robotics and trained by artificial intelligence. So far, it looks like it may be the most sophisticated artificial hand ever created.

Amazon released its Deep Scalable Sparse Tensor Network Engine (DSSTNE, or "Destiny") into open-source. Amazon engineers built DSSTNE to solve deep learning problems at Amazon's scale. DSSTNE is built for production deployment of real-world deep learning applications, emphasizing speed and scale over experimental flexibility. (GeekWire)

Summer is also time for student challenges and competitions. FIRST (for Inspiration and Recognition of Science and Technology) Championship hosted a 4-day event with more than 20,000 students competing from around the globe, in front of more than 40,000 spectators. (RoboHub)

ACM CHI, the premier Human-Computer Interaction conference was held 7–12 May in San Jose, the heart of Silicon Valley. The conference included panels and papers on robotics and AI, as well as the next generation of interfaces that could impact robotics and the internet of things.

Google revealed an alternative to CPUs and GPUs, dubbed a Tensor Processing Unit (TPU), which it says will advance its machine learning capability seven years into the future. (PCWorld)

It appears drones will be utilised further in public health, potentially revolutionising transportation networks and connecting people to what they need. In a TED blog post, Matternet, UNICEF and Malawi’s Ministry of Information hosted hands-on sessions and a Community Demo Day so the locals could see how the drones work and learn what they'd be carrying. These demos also stressed the importance of HIV awareness, getting tested, and taking antiretrovirals. (TED)

Google has partnered up with Fiat Chrysler Automobiles (FCA) to build 100 custom versions of their hybrid minivans for Google's self-driving systems, including the computers that hold Google's self-driving software, and the sensors that enable the software to see what's on the road around the vehicle. (Google)

Roughly at the same time, Google also sold Boston Dynamics, the robotics startup it bought in 2013, to Toyota. After Andy Rubin, the exec who led the robotics division, left Google in 2014, the company has been struggling to find a new permanent leader of the division and to fulfill Rubin's ambitious vision of creating the first wave of consumer robotics products. (BusinessInsider)

IBM announced that it will offer NVIDIA's Tesla M60 GPU accelerator as a cloud instance in a move aimed at speeding up virtual desktop applications for computer aided drafting and similar graphics-intensive software. The partnership between IBM and Nvidia revolves around workloads for analytics, deep learning and artificial intelligence as well as graphics intensive applications. (zdnet)

And finally, an exciting collaborative initiative in autonomy and robotics between MIT and Lockheed Martin was launched. Teams will focus on innovations and how to improve human/machine teaming and navigation in complex environments. This could be a great indication of more industry/academic joint partnerships in robotics to come. (RoboHub)

Next month

Anything I missed? Sound away in the comment section! Have something of interest or want your discovery to be considered for next month's issue? Let me know via mbeyeler (at) uci (dot) edu.