In less than two months, Apple’s virtual assistant, Siri, has insinuated herself into western culture. This has been less because of Apple’s marketing and more due to the public’s general interest in the concept and Siri’s potential.
Of course, the fact Siri refused to advise people of locations of abortion clinics helped increase her notoriety. Siri’s reproductive health oversight was declared an unintentional lapse by Apple; it was taken as a sign, by comedian Stephen Colbert, that Siri is actually ultra-conservative.
Colbert reinforces his claim with the fact Siri can’t understand “foreign” accents.
Since Siri’s launch, developers have managed to reverse-engineer the way Siri communicates with Apple’s servers. Through this, they have been able to get some insight into how Siri works and provide a mechanism to extend her capabilities.
Recently, developer Pete Lamonica created a piece of software called SiriProxy. Once SiriProxy is installed on a computer connected to a local network, an iPhone can be reconfigured to talk to SiriProxy instead of Apple’s servers.
SiriProxy can then intercept replies the servers send back to the phone and carry out a whole range of activities, from switching lights off and on in a room (see video below) to unlocking and starting a car.
Lamonica has shown that most of the processing of Siri takes place on Apple’s servers. Siri packages up the audio you record (when you ask her a question) and sends it to the server for interpretation. It is for this reason that Siri (and all speech recognition functionality) will not even start if there is no active internet connection.
Because of this, Siri does not need very much processor power and so Apple’s decision not to make it available on the iPhone 4 and iPhone 3GS is more about marketing (to make the iPhone 4S more desirable) than a lack of processing power. (On a slight tangent, because Siri uses the internet to send voice, heavy use when roaming internationally might not be advisable.)
On receiving the audio, the server sends text commands back, telling Siri what to display and what to say. Siri has text-to-speech capabilities and can also interact with a limited range of applications.
SiriProxy has a range of “plugins” that can intercept the commands Apple sends back and then run custom code to carry out a seemingly limitless range of actions.
In a conference speech, (the company) Siri’s original technical architect, Tom Gruber, explained Siri’s origins and the way the application works. In fact, Siri was pretty accurately foretold by Apple in 1987 with a concept called the “Knowledge Navigator”.
The Knowledge Navigator video (see above) portrayed an academic talking to his personal assistant. The essential elements of Siri’s current capabilities were all foretold in the video. In it, the academic gets a list of appointments and details of waiting messages.
He then uses the assistant to help prepare his afternoon’s lecture on deforestation of the Amazon rainforest (even this was prescient of the whole climate change debate). The preparation includes collaboration with a colleague over a videoconference and real-time data analysis and visualisation. This latter interaction, however, is sadly still science fiction.
Apple faces a herculean challenge to further develop and enhance Siri. It has been two years since Tom Gruber’s presentation that basically demonstrated all of the Siri functionality found in the iPhone 4S. Even bringing the Siri functionality available in the US market to the rest of the world presents a significant challenge. The difficulties in this are not necessarily as obvious as they would seem.
Locating services, as the glitch with abortion clinics has shown, is fraught with nuance. This nuance is not just a question of language and geography. Translating text is one thing; interpreting language in the context of local society and culture is much more difficult. Avoiding upsetting your customers and governments is even harder still.
The challenge of expanding Siri to world markets to reach parity with the US features will make innovating its capabilities that much harder. In this respect, allowing other developers to provide services at the back-end of Siri is Apple’s only hope of progressing Siri’s potential.
But, as the video above dramatically illustrates, TellMe (at least) is so hopeless in comparison that Microsoft will really need to go back to the drawing board or look for existing technology elsewhere.
In the meantime, comedians are busy exploring Siri’s potential capabilities. One scenario is played out by the College Humor site in a sketch (warning: some bad language) where Siri gets between a husband and wife having an argument. Acting like the discrete English butler, Siri’s attempts to mollify the argument sadly fail.
There’s no doubt Siri’s capabilities, even in the area of marriage guidance, will only get better.