Some thought-provoking ideas about how LinkedIn figures out who to suggest to you
This question came up recently in my circle of colleagues about how LinkedIn chooses who to recommend to you for connecting. The question was prompted when LinkedIn suggested connecting with someone the person had only had email connection with and about an Airbnb opportunity. So how did LinkedIn know to make that recommendation? No one had connections in common, no looking at each other’s profiles, no other apparent interaction…
Check out this account by someone familiar with algorithms for preferences (not someone from LinkedIn, but some very good guesses).
Roughly, LinkedIn connections algorithm may work something like this.
Anything you do on LinkedIn’s site is tracked.
You look at Joe Smith’s profile (regardless of whether you try to connect to him), you get a “10 points” in the score box for “Jane knows Joe Smith” you also get “1 point” in the score box for “Jane knows someone with the name Joe Smith”
You upload your contact list to LinkedIn, you get 100 points for each person in the list.
These are first order estimates – they depend on your actions on the site.
Ok, now 2nd order LinkedIn suggestion effects:
When you looked at Joe Smith’s profile, you got points. Joe – through no action of his own – also got points – let’s say 5 points for “Joe Smith knows Jane Doe” and 0.5 points for “Joe Smith knows someone named Jane Doe”.
Oh, both Joe and Jane live in Los Gatos CA — give them each a point in their buckets.
Hey, they both link to Toastmasters, or mention Toastmasters on their profile, another point.
Wow – they both worked at XYZ in 2001, and 2002 – give them a point for each year.
At some point, the “guess” that “Jane Doe knows Joe Smith” has enough points – so they float it by you in the “People you might know” column.
If you click it – more points. You don’t click it – remove a point — you continue not click it, points go down, and it disappears from the list.
Now things start to get interesting – now we can throw some “Machine Learning” into it.
You can imagine some processor thinking – hey, when we show Jane a Toastmaster, she clicks on it 30% of the time, vs only 12% of the time for non-Toastmasters. She must be really active in this Toastmaster thing – so instead of giving her a point for “toastmasters” in common with someone, let’s use two points.
Oh, and she never clicks on “XYZ people” – lets only use 1/2 point there.
Humm, and she never seems to click on “guesses” with less than 30 points, so instead of 25 points (which is good for lots of people) we’ll use 30 for Jane.
Now let’s add 3rd order effects for LinkedIn suggestions.
Dave is now a member of Linked In.
Dave knows Joe Smith. Dave knows Jane. That’s support for the idea of “Jane knows Joe Smith” – let’s give that idea a point.
Oh! Jackpot! Dave uploaded his contact list! Joe Smith is there, Jane is there – they gotta know each other !!
Let’s improve the algorithm with some clustering effects:
Jane knows Dave, Louis, Alan, and Helen.
Joe Smith knows Dave, Alan, and Helen.
Alan knows Jane, Louis, Helen, and Dave
That’s a pretty good cluster — 1 point for the idea that Jane knows Joe.
And, hey, while I’m at it, 1 point for Louis knows Joe.
Making your head spin yet? No? Ok, 4th order effects:
Jane knows Helen. Helen knows Mark. Mark went to the same high school as Joe.
Another point for Jane knows Joe.
Hey, Mark just edited his profile, he went to both high school and college with Joe !
That’s another point for Jane knows Joe.
More history effects for LinkedIn suggestions:
Remember I uploaded my contact list, 10 paragraphs ago?
Well, Luddite Louis SC was in that list — but wasn’t a member of LinkedIn. But he just joined! Yummy! Since Dave knows Louis, let’s ask Louis if he knows Dave (this is how they can make good guesses immediately after you join the site). Oh, and next time Dave shows up at the site, remember to tell him that Louis is now a member!
Ok, 5th order effects – take 4th order, and add another person to the chain.
Decrease the point scores, ‘cus it’s getting to be real fuzzy around here.
Humm, Jane let us know her Twitter handle. We buy the data feed from Twitter, let’s see who she mentions in her tweets – those are good clues about who she knows.
(While some people might consider this spying — it’s all disclosed as part of the terms in conditions in the click-through license that she agreed to when signing up for LinkedIn and Twitter – so it must be OK, right ? Geez, it’s right there on page 23 of 543, that’s practically on the cover sheet!)
Oh, WOW, you mean I can just fetch a page from Facebook and get names of people Jane knows?
Hey, Jane just sent a contact request to Peter (via the LinkedIn mail interface) – and she mentioned the name Joe Smith! Wow, she’s practically MARRIED to Joe! I gotta ask her if she knows him!
Let’s take the person in question, who wondered “How did LinkedIn know that I exchanged emails with the Airbnb person?” Ok, so I offer my place on AirBnB – and Random Jane contacts me about it? What’s the first thing I do? Check Random Jane’s reputation – likely on LinkedIn. LinkedIn will remember “Dave was interested in Random Jane” and conclude “Maybe Random Jane knows Dave”.
I doubt LinkedIn is spying on your email — except for email you send USING LinkedIn’s site.
However, if you give LinkedIn permission to access your contact list on Google, Yahoo, Hotmail, etc, they likely keep a copy of your password – and may actually fetch it again a month later. (And, almost certainly, this would be spelled out in the Terms of Service that no one reads).
And even if you told Facebook “don’t share my info with LinkedIn” — bugs show up all the time, particularly with Facebook privacy controls – and once the info is copied, it’s “in the wild” and will be preserved by someone.
So that’s rough account of how LinkedIn finds people to recommend to you for connecting… What did you learn from this? I, for one, am wondering how all the rest of my social media worlds are connected – raises a lot of interesting questions…
UPDATE: June 13, 2014 LinkedIn has lost in a court ruling to allow a case to proceed regarding emailing a sequence of 3 emails to a person’s contacts to solicit them to join LinkedIn. The judge indicated that it was reasonable for a person to believe they were agreeing to a single invitation but not 2 more follow up invitations to their contacts. This comes on top of a Facebook lawsuit about using people’s names without their permission – they lost that one as well.
However, LinkedIn won the round in this case regarding some of the possible sources of connection referrals mentioned in my blog entry here. A quote from the article by MediaPost mentioned below:
“The users also alleged that the social networking service violated a federal wiretap law by “hacking” into their email accounts to harvest their contacts’ addresses. Koh sided with LinkedIn on that point, ruling that users were able to “opt out of the harvesting process” before LinkedIn uploaded the information.”
UPDATE: February 13, 2015 Looks like LinkedIn was unable to get all the charges in their class action suit dismissed and they’re in the middle of settling:
The social networking company and lawyers for a group of consumers say in a status report that they “have accepted a mediator’s proposal for a class action settlement subject to reaching agreement on remaining material terms and execution of a written settlement agreement.”
“The agreement calls for the company to pay $10 to each LinkedIn user who submits a claim; if more people than expected file claims, LinkedIn will add up to another $750,000 to the fund.”
LinkedIn will be changing some of its practices as a result of the settlement, including more disclosure about what it does when you click on allowing LinkedIn access to your email address books. Buyer beware from this point on…
We use technologies like cookies to store and/or access device information. We do this to improve browsing experience and to show (non-) personalized ads. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.