The idea of a personal robot assistant, able to effortlessly understand spoken (and unspoken) human intents and efficiently act on them while delivering a breezy quip, has been a staple of science fiction.
The 1940s had Zolo to scare away officious mailmen and refresh bouquets, while the Jetsons had Rosie to deal with prickly bosses. HAL 9000, the most evil red light in filmdom, may not have been keen to “open the pod bay doors” but it could still belt out a mean rendition of Daisy Bell.
Last week at its Worldwide Developers Conference (WWDC), Apple announced a raft of new features for its software-based intelligent personal assistant, Siri – a real-life approximation of this once-imagined future.
Apple lauded Siri as the standout feature of its iPhone 4S last October, showcasing several uses, including setting reminders and appointments, searching the web and answering the age-old question: “Should I carry an umbrella today?”
The app stole a march over the stilted spoken command interfaces of all mobile platforms including that of Apple’s iOS until then, with the seemingly effortless manner in which it understood natural spoken language.
A year before its star turn, Siri had actually debuted on the iOS platform as a standalone app that integrated with various web services and made it possible to locate restaurants and book tables with spoken language commands.
Siri’s functionality can be roughly broken down into three parts:
- speech recognition
Speech recognition involves making sense of voice patterns and converting them into spoken phrases. This means separating the user’s voice from background noise and accurately translating it into words from a language.
Reasoning not only requires recognising the intent of the words but also the context in which they were spoken. A simple command such as “Give Mum a ring” requires the assistant to understand that the action required is to make a phone call to a contact called “Mum” and not to present an actual ring.
Delegation requires firing a specific handler – in this case, the app that actually makes the phone call – with the task of executing the action.
If things go wrong – for example, there is no contact called “Mum” – the assistant should be able to inform its human with a simple response and try to get more data to fulfil the request.
Speech recognition and reasoning require matching voice patterns against databases and running extensive statistical analysis – tasks that require computing power and memory in excess of that provided by the processor in iPhone 4S.
Therefore, Siri requires an active wireless internet connection over which it transmits data to Apple’s servers where most of the processing is performed. Naturally, data usage is higher on a Siri-enabled iPhone.
Siri is able to carry a conversation with its users and provide the semblance of a human personality. It’s also seemingly able to engage in witty repartee to answer questions about the meaning of life (“42”), suggest places to hide a body and report its owner to the Intelligent Agents' Union for harassment.
Siri has been anthromorphised, by Apple as well as others, as a female and is repeatedly referred to as “she”. On the flip side, Siri has been sometimes dismissed as a gimmick, and has been the subject of lawsuits that accuse Apple of overselling the capabilities of a feature that is still officially beta.
At its introduction, Siri was able to interact with the iPhone’s native apps as well as external apps such as the “answer engine” Wolfram Alpha for executing web searches. At this year’s WWDC event, Apple announced the additional features it was bringing to Siri as part of its iOS version 6 upgrade.
These include actions such as opening any app on the phone, updating Facebook statuses and sending tweets, making reservations for restaurants in the US, and providing information about sports scores in the US. But there is still no method for non-Apple developers to use Siri in their apps.
Apple’s competitors have not been resting either. Samsung recently introduced S-Voice, a Siri-like feature, as part of its new Galaxy SIII smartphone while other apps such as Evi aim to provide competing services on the iPhone.
Will Siri’s achievements disappear in time, like tears in rain?
Siri’s features are built on decades-long research in computer science in areas such as natural language processing, machine learning, distributed computing and artificial intelligence.
These are still active areas of investigation and recent advances should enable future software assistants to improve their knowledge of context in spoken language and be able to engage in effortless conversation without resorting to canned responses. Apple itself uses the data gathered from Siri users to continually improve its capabilities.
Intelligent assistants are important to help users navigate and work in a world that’s drowning in information pouring in, like tears in rain, from a vast array of sources. Their role will become more prominent as their capabilities improve.
The science fiction vision of sentient, knowledgeable robot helpers is not impossible to realise. When that happens, we may have to speak about their dreams as well.
Srikumar Venugopal is a Lecturer in Computer Science and Engineering at University of New South Wales. This article was orignally published on The Conversation on June 19. Republished with permission.