Screen Shot 2015-11-18 at 4.55.47 PM

RICHARD CRAIB IS a 29-year-old South African who runs a hedge fund in San Francisco. Or rather, he doesn’t run it. He leaves that to an artificially intelligent system built by several thousand data scientists whose names he doesn’t know.

Under the banner of a startup called Numerai, Craib and his team have built technology that masks the fund’s trading data before sharing it with a vast community of anonymous data scientists. Using a method similar to homomorphic encryption, this tech works to ensure that the scientists can’t see the details of the company’s proprietary trades, but also organizes the data so that these scientists can build machine learning models that analyze it and, in theory, learn better ways of trading financial securities.

“We give away all our data,” says Craib, who studied mathematics at Cornell University in New York before going to work for an asset management firm in South Africa. “But we convert it into this abstract form where people can build machine learning models for the data without really knowing what they’re doing.”

He doesn’t know these data scientists because he recruits them online and pays them for their trouble in a digital currency that can preserve anonymity. “Anyone can submit predictions back to us,” he says. “If they work, we pay them in bitcoin.”

The company comes across as a Silicon Valley gag. All that’s missing is the virtual reality.

So, to sum up: They aren’t privy to his data. He isn’t privy to them. And because they work from encrypted data, they can’t use their machine learning models on other data—and neither can he. But Craib believes the blind can lead the blind to a better hedge fund.

Numerai’s fund has been trading stocks for a year. Though he declines to say just how successful it has been, due to government regulations around the release of such information, he does say it’s making money. And an increasingly large number of big-name investors have pumped money into the company, including the founder of Renaissances Technologies, an enormously successful “quant” hedge fund driven by data analysis. Craib and company have just completed their first round of venture funding, led by the New York venture capital firm Union Square Ventures. Union Square has invested $3 million in the round, with an additional $3 million coming from others.

Hedge funds have been exploring the use of machine learning algorithms for a while now, including established Wall Street names like Renaissance and Bridgewater Associates as well as tech startups like Sentient Technologies and Aidyia. But Craib’s venture represents new efforts to crowdsource the creation of these algorithms. Others are working on similar projects, including Two Sigma, a second data-centric New York hedge fund. But Numerai is attempting something far more extreme.

On the Edge

The company comes across as some sort of Silicon Valley gag: a tiny startup that seeks to reinvent the financial industry through artificial intelligence, encryption, crowdsourcing, and bitcoin. All that’s missing is the virtual reality. And to be sure, it’s still very early for Numerai. Even one of its investors, Union Square partner Andy Weissman, calls it an “experiment.”

But others are working on similar technology that can help build machine learning models more generally from encrypted data, including researchers at Microsoft. This can help companies like Microsoft better protect all the personal information they gather from customers. Oren Etzioni, the CEO of the Allen Institute for AI, says the approach could be particularly useful for Apple, which is pushing into machine learning while taking a hardline stance on data privacy. But such tech can also lead to the kind of AI crowdsourcing that Craib espouses

Craib dreamed up the idea while working for that financial firm in South Africa. He declines to name the firm, but says it runs an asset management fund spanning $15 billion in assets. He helped build machine learning algorithms that could help run this fund, but these weren’t all that complex. At one point, he wanted to share the company’s data with a friend who was doing more advanced machine learning work with neural networks, and the company forbade him. But its stance gave him an idea. “That’s when I started looking into these new ways of encrypting data—looking for a way of sharing the data with him without him being able to steal it and start his own hedge fund,” he says.

The result was Numerai. Craib put a million dollars of his own money in the fund, and in April, the company announced $1.5 million in funding from Howard Morgan, one of the founders of Renaissance Technologies. Morgan has invested again in the Series A round alongside Union Square and First Round Capital.

It’s an unorthodox play, to be sure. This is obvious just when you visit the company’s website, where Craib describes the company’s mission in a short video. He’s dressed in black-rimmed glasses and a silver racer jacket, and the video cuts him into a visual landscape reminiscent of The Matrix. “When we saw those videos, we thought: ‘this guy thinks differently,’” says Weissman.

As Weissman admits, the question is whether the scheme will work. The trouble with homomorphic encryption is that it can significantly slow down data analysis tasks. “Homomorphic encryption requires a tremendous about of computation time,” says Ameesh Divatia, the CEO of Baffle, a company that building encryption similar to what Craib describes. “How do you get it to run inside a business decision window?” Craib says that Numerai has solved the speed problem with its particular form of encryption, but Divatia warns that this may come at the expense of data privacy.

According to Raphael Bost, a visiting scientist at MIT’s Computer Science and Artificial Intelligence Laboratory who has explored the use of machine learning with encrypted data, Numerai is likely using a method similar to the one described by Microsoft, where the data is encrypted but not in a completely secure way. “You have to be very careful with side-channels on the algorithm that you are running,” he says of anyone who uses this method.

Turning Off the Sound at a Party

In any event, Numerai is ramping up its effort. Three months ago, about 4,500 data scientists had built about 250,000 machine learning models that drove about 7 billion predictions for the fund. Now, about 7,500 data scientists are involved, building a total of 500,000 models that drive about 28 billion predictions. As with the crowdsourced data science marketplace Kaggle, these data scientists compete to build the best models, and they can earn money in the process. For Numerai, part of the trick is that this is done at high volume. Through a statistics and machine learning technique called stacking or ensembling, Numerai can combine the best of myriad algorithms to create a more powerful whole.

Though most of these data scientists are anonymous, a small handful are not, including Phillip Culliton of Buffalo, New York, who also works for a data analysis company called Multimodel Research, which has a grant from the National Science Foundation. He has spent many years competing in data science competitions on Kaggle and sees Numerai as a more attractive option. “Kaggle is lovely and I enjoy competing, but only the top few competitors get paid, and only in some competitions,” he says. “The distribution of funds at Numerai among the top 100 or so competitors, in fairly large amounts at the top of the leaderboard, is quite nice.”

Each week, one hundred scientists earn bitcoin, with the company paying out over $150,000 in the digital currency so far. If the fund reaches a billion dollars under management, Craib says, it would pay out over $1 million each month to its data scientists.

Culliton says it’s more difficult to work with the encrypted data and draw his own conclusions from it, and another Numerai regular, Jim Fleming, who helps run a data science consultancy called the Fomoro Group, says much the same thing. But this isn’t necessarily a problem. After all, machine learning is more about the machine drawing the conclusions.

In many cases, even when working with unencrypted data, Culliton doesn’t know what it actually represents, but he can still use it to build machine learning models. “Encrypted data is like turning off the sound at the party,” Culliton says. “You’re no longer listening in on people’s private conversations, but you can still get very good signal on how close they feel to one other.”

If this works across Numerai’s larger community of data scientists, as Richard Craib hopes it will, Wall Street will be listening more closely, too.



It took Francis Galton several years to figure out that correlation and regression are not two concepts – they are different perspectives on the same concept: whenever the correlation between two scores is imperfect, there will be regression to the mean …

Causal explanations will be evoked when regression is detected, but they will be wrong because the truth is that regression to the mean has an explanation but does not have a cause.

Diarmid Weir Says:

September 10, 2011 at 11:04 am e

Each individual will have different opportunity costs, predicting their choices is not possible.

They don’t need to be predicted for competitive pressure to bring the price down to the point at which they bite.

Given that competition exists. Given that competition lowers prices. We are still left with individuals who will make choices. You cannot predict those choices.

…that window of opportunity grows ever smaller, measured in hours perhaps.

I think you sadly underestimate the wiles of capitalists and the limits to most individual consumers’ capacity to process information!

Social media hardly concerns itself with complex information. Social media is gossip. Gossip is easy to process as it is vacuous. However, in the marketplace where products gain or lose reputation, gossip trumps advertising. Gossip originates from friends and family, and we know they are trustworthy and well informed!

Egalitarianism is a myth, and not even desirable, never mind achievable.

Can a myth be desirable or achievable?

It can certainly be desired.

I think this statement is a bit incoherent, to say the least. What do you mean by egalitarianism, anyway? Do you believe that any level of inequality should be quite acceptable to everybody, as long as it occurs without violating ‘property rights’?

Individuals are not equal. That is partly genetic, partly environment. Either way, some will always outperform others and gain proportionately more. As long as they do not violate the property rights of others pursuing this end, that is perfectly acceptable, and will in point of fact improve conditions for the less capable along the way.

Does it actually matter what you believe given that violent redistribution will have taken place long before one man (and it would of course be a man!) owns and controls everything?

We have already discussed the proposition that the free market can provide all the checks and balances currently performed by government, far more efficiently and equitably than government.

As this is clearly not possible, the calculation of a case probability is not possible.

Ouh, you Keynesian, you!

There you go, trying to hurt my feelings again.

I did put probability in quotes for a reason. You can’t tell me that a serious entrepreneur doesn’t apply some sort of subjective probability to any risky undertaking.

Of course he does. He can call it whatever he wants. However unless it is based on a class probability, it is not a probability at all. I must have missed the “”.

Otherwise he can have no special skill worth considering. He/she is just more impulsive than the rest of us. Now this probability may be nowhere near accurate, but it does help to determine his/her target profit. To find out if that’s fair or not will take a bit of negotiation – which requires equality of power and information on both sides.

There is no fair/unfair. If he achieves his target, or exceeds his target, then obviously he has forecast accurately/luckily and obtains his profit. If it is truly excessive, which if he is the first to market with a new product is likely/possible, competition will soon enter the market reducing the profit margins. This is the nature of the free market, unless of course government intervenes and grants him a patent which extends his high profit margin into a monopoly gain.

Note that your acceptance of this sort of uncertainty coupled with your scepticism over social/economic prediction completely removes any possibility whatsoever of claiming any benefit for a purely free-market system. You are yet again left with only your a priori property rights for solace.

Not at all.

Then both LIDL and Tesco are in trouble.

So you see – adjusting prices to production costs does make a difference.

Of course it makes a difference to production. But not to value. It was value that we were talking about.

The Kirk Report completed a survey of Hopes & Fears with the results:



Mr Kirk also had some conclusions.

In my experience, usually the things we fear the most are the things that impact the market the least. I’ve seen this happen time and time again. Much like life, the things we don’t know to worry about is usually what we should be worried about the most. Likewise, the things we hope from the market, usually work against us like they have for many who’ve been sidelined or short since last March and who are hoping for a correction now.

I would have to totally disagree with Mr Kirk. In this case, the group, have pretty much hit the nail on the head.

Government is the central player, first in creating the crisis, second in prolonging the crisis, through, debt, debasement and economic mismanagement. As such, the market is all about government. Score one for the crowd.

As to hopes, the crowd are at least consistent, a correction, would increase volatility certainly.

Expectancy is your profit percentage per win multiplied by your win rate minus your loss percentage per loss multiplied by your loss rate. I will use an example of Expectancy from Dr. Van K. Tharp’s Book: Trade your way to Financial Freedom:

Expectancy = (Probability of Win * Average Win) – (Probability of Loss * Average Loss) Expectancy = (PW*AW) less (PL*AL)

PW is the probability of winning and PL is the probability of losing.
AW is the average gain (win) and AL is the average loss

So let’s do an example using another basic approach (assume $12,500 per position, a $100,000 portfolio using 1% equity risk):

If my trades are successful 40% of the time and I realize an average profit of 20% but I lose an average of 5%, my expectancy is $625 per trade.

(0.4 * $2,500) – (0.6 * $625) = $1,000-$375 = $625

I lose 60% of the time yet I show a profit of $625 per trade. If I have a system that produces 65 trades per year, I would realize an annual gain of $40,625 (hypothetical scenario). A 40% gain on the original $100,000 (minus all commissions, fees, taxes and compounding).

Let’s look at the calculation one more time using only percentages:
PW: 40%
AW: 20%
PL: 60%
AL: 5%
(40% * 20%) – (60% * 5%) = 5.00%

What this tells me is that I have a positive expectancy of 5% or $625 per trade from the original $12,500. It doesn’t mean that I will make $625 on every single trade but my system will average a profit of $625 per trade over the course of a year with a combination of winners and losers. I can always make more trades or fewer trades in a year so my total profit will be adjusted accordingly.