…or is it “are devices listening” to you?
Well yes. There’s always a simple answer to these questions and it’s yes. Breaking it down to the basics, pretty much all of our devices are listening because we’ve enabled Siri or Google or Alexa or… to respond to a keyword command like “Hey Siri” such that they can present us, in the case of Siri, with a spooky HAL 9000ish blob to indicate it’s listening (because it wasn’t before honest) and ready to perform our bidding.
Sometimes things can go amiss with the design itself as we have seen in the press.
In May 2018 poor Danielle from Seattle (ironically the home of Amazon) found that a conversation she was having was recorded and sent to a contact.
Essentially what really happened was he Amazon echo woke up because it heard something that sounded like Alexa. Like, oh I don’t know, her daughter’s name Alexa? I don’t know if that’s the case but Alexa is actually a pretty common normal name to use as the trigger to activate the artificial intelligence driven slave toy.
Anyhow, a series of unfortunate misunderstandings followed where Alexa misheard a “send message” request and then then answers to questions the echo was posing out loud to indicate a contact and confirmation of the entire command. Seems unlikely but if you’re having a conversation and Alexa is just eaves dropping waiting for a word that sounds like anything phoenetically similar in your entire contact list, there is a good chance it’ll hear what it wants to hear.
The confirmation word was simply… “right” or perhaps “yes” or anything positive sounding which would certainly be a part of normal speech.
I think the real issue is the activation word. The devices have to assume they are involved in the conversation if they have been directly addressed. After this, it’s just far too easy to say words that sound like instructions if it happens to be officially listening.
Poor Danielle. She told KIRO_TV that she felt “invaded” and said she’s “never plugging that devices in again because she can’t trust it.” It’s a robot Danielle. To quote Kyle Reese in the Terminator “It can’t be bargained with. It can’t be reasoned with. It doesn’t feel pity, or remorse, or fear. And it absolutely will not stop, ever, until it has ordered you a taxi, or played some soft jazz.”
You’ll notice that all of the activation words are 3 syllables or perhaps more. Alexa, Hey Siri, Ok Google are just a few and there is a reason for this.
There was a paper released in August last year called “Skill Squatting Attacks on Amazon Alexa” from the University of Illinois. It’s a somewhat technical ready bit the highlights are fascinating.
What is skill squatting? The paper is an analysis of misinterpretations in Amazon Alexa to consider how an adversary could leverage these systematic interpretation errors.
What is a “skill”. Think of a skill as an app on your phone that responds to voice input. The example they use is the Lyft L Y F T as in the competitor to Uber. Their Alexa skill allows you to say “Alexa, ask Lyft for a ride” and it will use the skill to tell you where the nearest driver is much like the app would visually.
The act of skill squatting is the study of dialect and associated phonemes. Phonemes are essentially the bits that make up pronunciation.
A personal example I noticed is something I have a bit of fun with my wife about. I’m Canadian and she’s British. I noticed one day that the two words source (as in what is the source of the noise) and sauce (this is a delicious mushroom sauce) sound the same, when she says them. The source of the sauce sounds the same as
the sauce of the source.
That’s the problem with single syllable words. As part of the study they analysed and quantified a dictionary of words ranging in complexity.
Words like Dandelion, Forecast, Serenade had an almost 100% accuracy in how the speech recognition interpretted the words, while single syllable words like Bean (that like coffee bean) Calm, and Coal had a 0% percent accuracy. Coal (as in a Coal miner) is one they singled out of specifically troubling. If you have a skill designed to Call, the wrong accent or specifically a British accent would sound like Coal and a skill associated with that word could squat on your Alexa to reroute your request to a malicious entity.
Keep in mind they only used speakers from across North American where, as much as there is variable in accent and dialect, they didn’t even get started tackling Glaswegian where Apple’s Siri was notoriously useless when it first launched.
Here’s a few examples of possible squats.
Skill – Squatted Skill
Boil an Egg – Boyle an Egg
Main Site Workout – Maine Site Workout
Quick Calm – Quick Com
Bean Stock – Been Stock
Test Your Luck – Test Your Lock
Comic Con Dates – Comic Khan Dates
Mill Valley Guide – No Valley Guide
Full Moon – Four Moon
It all comes down to the problem of Homonyms or to be more precise Homophones or words that sound alike. Speech recognition is getting better at working out known homophones but can struggle at ones which are derived from changes in dialect or regional pronunciation.
So hopefully Danielle can understand, if she were listening to this podcast, how jumping head first into speech recognition based laziness might currently backfire.
Now down to the major question though. Are these devices listening to us when we haven’t summoned up our digital butler?
There have been a few pieces of research on this. Of course there was even the latest FaceTime debacle where you could create a conference call and activate the 2nd party’s audio and video even if they denied the call. That’s a different thing altogether and just a major bug in the software design and not anything intentionally surreptitious. I find myself defending software quite a bit because I know the complexity of it, however, there are ways to determine risks in architecture and it’s not like Facebook don’t have the budget.
Getting back to the final point, do our devices listen to us even when we haven’t issued the magic trigger phrase? Yes.
If you can imagine the phone needs to be listening in a sort of stand-by mode such that it can activate and do our bidding. Imagine it like a real butler who is around all the time while you’re discussing your personal feeling with your partner, discussing a holiday with friends or singing into your hair brush at full volume.
You might even think of it as a small army of butlers or servant. Downton Abbey style. All of which know far more about you that you may acknowledge. Apps like Facebook for example have access to this “non-triggered” data.
You’ll notice that the iPhone is extremely clever at adding events or or contact details found in emails. It’s a bit like this.
Of course, Facebook denies this but Google are a bit more open about it. The was a Vice article by Sam Nichols on June 4th 2018 in which this was put to a test.
The author, Sam, repeated some phrases near the phone for a almost a week like “I’m thinking about going back to uni” in addition to “I need some cheap shirts for work”. Seems pretty obvious as potential for advertising triggers. If you’d written that into a status, you can bet you’d start seeing related adverts.
For him, he started seeing adverts for university courses almost the next day. A further conversation about running out of data resulting in 20GB data plan adverts as well.
The experiment seemed to prove that, if this was Downton Abbey, our servants would conveniently erect a pop-up marmlade shop as we were leaving the estate to buy marmalade, or perhaps a discount waste coat and top hat combination just prior to the approaching social. Ok I admit to haven’t never watched Downton Abbey but you get the idea.
So far the general consensus seems to be that just like our Facebook status, photos, websites we visit and general online existence which has already given up such a clear profile of our existance, it seems that trigger words used near the phone are also feeding the machine.
Keep in mind though it’s probably only really clear words based on the study mentioned at the beginning of this. Words like Gigabit, Holiday and University are clear and accurate and will probably be reflected in tailored advertising compaigns.
How do we avoid this, well… do we really care at the moment? Is it a slippery slope or, is it a loophole that will soon be closed. It does trends that with new technology comes later regulation so there is every possibility that should abuse or negligence surround such activity, rules will be made to prevent or control it. Until then, at least you know where you stand which is, if you want the perks of technology like smart phones and the internet and social media, remember that, and we’ve said this so many times before, you are the product being sold to advertisers with whom we have a symbiotic relationship. You need stuff, companies with stuff need you.
If you want context of what it would be like without such personally specific advertising, try watching normal television. Not Netflix or a streaming service. Watch normal TV with advertising. How many of those adverts do you actually care about. It’s so noticeably off target in comparison with Facebook or Google adverts it almost makes you wonder why anybody still pays for that kind of advertising.