7,500 Faceless Coders Paid in Bitcoin Built a Hedge Fund’s Brain
|
Richard Craib is a
29-year-old South African who runs a hedge fund in San Francisco. Or
rather, he doesn’t run it. He leaves that to an artificially intelligent
system built by several thousand data scientists whose names he doesn’t
know.
Under the banner of a startup called
Numerai, Craib and
his team have built technology that masks the fund’s trading data before
sharing it with a vast community of anonymous data scientists. Using a
method similar to
homomorphic encryption, this tech works to ensure that the
scientists can’t see the details of the company’s proprietary trades,
but also organizes the data so that these scientists can build machine
learning models that analyze it and, in theory, learn better ways of
trading financial securities.
“We give away all our data,” says Craib, who studied
mathematics at Cornell University in New York before going to work for
an asset management firm in South Africa. “But we convert it into this
abstract form where people can build machine learning models for the
data without really knowing what they’re doing.”
He doesn’t know these data scientists because he recruits them online
and pays them for their trouble in a digital currency that can preserve
anonymity. “Anyone can submit predictions back to us,” he says. “If they
work, we pay them in bitcoin.”
So, to sum up: They aren’t privy to his data. He isn’t privy to them. And
because they work from encrypted data, they can’t use their machine learning
models on other data—and neither can he. But Craib believes the blind can lead
the blind to a better hedge fund.
Numerai’s fund has been trading stocks for a year. Though he declines to say
just how successful it has been, due to government regulations around the
release of such information, he does say it’s making money. And an increasingly
large number of big-name investors have pumped money into the company, including
the founder of Renaissance Technologies, an enormously successful “quant” hedge
fund driven by data analysis. Craib and company have just completed their first
round of venture funding, led by the New York venture capital firm Union Square
Ventures. Union Square has invested $3 million in the round, with an additional
$3 million coming from others.
Hedge funds have been
exploring the use of machine learning algorithms for a while now, including
established Wall Street names like Renaissance and Bridgewater Associates as
well as tech startups like Sentient Technologies and Aidyia. But Craib’s venture
represents new efforts to crowdsource the creation of these algorithms.
Others are working on
similar projects, including Two Sigma, a second data-centric New York hedge
fund. But Numerai is attempting something far more extreme.
The company comes across as some sort of Silicon Valley gag: a tiny startup
that seeks to reinvent the financial industry through artificial intelligence,
encryption, crowdsourcing, and bitcoin. All that’s missing is the virtual
reality. And to be sure, it’s still very early for Numerai. Even one of its
investors, Union Square partner Andy Weissman, calls it an “experiment.”
But others are working on similar technology that can help build machine
learning models more generally from encrypted data, including
researchers at Microsoft. This can help companies like Microsoft better
protect all the personal information they gather from customers. Oren Etzioni,
the CEO of the Allen Institute for AI, says the approach could be particularly
useful for Apple, which is pushing into machine learning while taking a hardline
stance on data privacy. But such tech can also lead to the kind of AI
crowdsourcing that Craib espouses.
On the Edge
Craib dreamed up the idea while working for that financial firm in South
Africa. He declines to name the firm, but says it runs an asset management fund
spanning $15 billion in assets. He helped build machine learning algorithms that
could help run this fund, but these weren’t all that complex. At one point, he
wanted to share the company’s data with a friend who was doing more advanced
machine learning work with
neural
networks, and the company forbade him. But its stance gave him an idea.
“That’s when I started looking into these new ways of encrypting data—looking
for a way of sharing the data with him without him being able to steal it and
start his own hedge fund,” he says.
The result was Numerai. Craib put a million dollars of his own money in the
fund, and in April, the company announced $1.5 million in funding from a group
that included Howard Morgan, one of the founders of Renaissance Technologies.
Morgan has invested again in the Series A round alongside Union Square and First
Round Capital.
It’s an unorthodox play, to be sure. This is obvious just when you visit the
company’s website, where Craib describes the company’s mission in a short video.
He’s dressed in black-rimmed glasses and a silver racer jacket, and the video
cuts him into a visual landscape reminiscent of The Matrix. “When we
saw those videos, we thought: ‘this guy thinks differently,'” says Weissman.
As Weissman admits, the question is whether the scheme will work. The trouble
with homomorphic encryption is that it can significantly slow down data analysis
tasks. “Homomorphic encryption requires a tremendous about of computation time,”
says Ameesh Divatia, the CEO of Baffle, a company that’s building encryption
similar to what Craib describes. “How do you get it to run inside a business
decision window?” Craib says that Numerai has solved the speed problem with its
particular form of encryption, but Divatia warns that this may come at the
expense of data privacy.
According to Raphael Bost, a visiting scientist at MIT’s Computer Science and
Artificial Intelligence Laboratory who has explored the use of machine learning
with encrypted data, Numerai is likely using a method similar to the one
described by Microsoft, where the data is encrypted but not in a completely
secure way. “You have to be very careful with side-channels on the algorithm
that you are running,” he says of anyone who uses this method.
Turning Off the Sound at a Party
In any event, Numerai is ramping up its effort. Three months ago, about 4,500
data scientists had built about 250,000 machine learning models that drove about
7 billion predictions for the fund. Now, about 7,500 data scientists are
involved, building a total of 500,000 models that drive about 28 billion
predictions. As with
the crowdsourced
data science marketplace Kaggle, these data scientists compete to build the
best models, and they can earn money in the process. For Numerai, part of the
trick is that this is done at high volume. Through a statistics and machine
learning technique called stacking or ensembling, Numerai can
combine the best of myriad algorithms to create a more powerful whole.
Though most of these data scientists are anonymous, a small handful are not,
including Phillip Culliton of Buffalo, New York, who also works for a data
analysis company called
Multimodel Research, which has a grant from the National Science Foundation.
He has spent many years competing in data science competitions on Kaggle and
sees Numerai as a more attractive option. “Kaggle is lovely and I enjoy
competing, but only the top few competitors get paid, and only in some
competitions,” he says. “The distribution of funds at Numerai among the top 100
or so competitors, in fairly large amounts at the top of the leaderboard, is
quite nice.”
Each week, one hundred scientists earn bitcoin, with the company paying out
over $150,000 in the digital currency so far. If the fund reaches a billion
dollars under management, Craib says, it would pay out over $1 million each
month to its data scientists.
Culliton says it’s more difficult to work with the encrypted data and draw
his own conclusions from it, and another Numerai regular, Jim Fleming, who helps
run a data science consultancy called the Fomoro Group, says much the same
thing. But this isn’t necessarily a problem. After all, machine learning is more
about the machine drawing the conclusions.
In many cases, even when working with unencrypted data, Culliton doesn’t know
what it actually represents, but he can still use it to build machine learning
models. “Encrypted data is like turning off the sound at the party,” Culliton
says. “You’re no longer listening in on people’s private conversations, but you
can still get very good signal on how close they feel to one other.”
If this works across Numerai’s larger community of data scientists, as
Richard Craib hopes it will, Wall Street will be listening more closely, too.
https://www.wired.com/2016/12/7500-faceless-coders-paid-bitcoin-built-hedge-funds-brain/