quantitative


Screen Shot 2015-11-18 at 4.55.47 PM

RICHARD CRAIB IS a 29-year-old South African who runs a hedge fund in San Francisco. Or rather, he doesn’t run it. He leaves that to an artificially intelligent system built by several thousand data scientists whose names he doesn’t know.

Under the banner of a startup called Numerai, Craib and his team have built technology that masks the fund’s trading data before sharing it with a vast community of anonymous data scientists. Using a method similar to homomorphic encryption, this tech works to ensure that the scientists can’t see the details of the company’s proprietary trades, but also organizes the data so that these scientists can build machine learning models that analyze it and, in theory, learn better ways of trading financial securities.

“We give away all our data,” says Craib, who studied mathematics at Cornell University in New York before going to work for an asset management firm in South Africa. “But we convert it into this abstract form where people can build machine learning models for the data without really knowing what they’re doing.”

He doesn’t know these data scientists because he recruits them online and pays them for their trouble in a digital currency that can preserve anonymity. “Anyone can submit predictions back to us,” he says. “If they work, we pay them in bitcoin.”

The company comes across as a Silicon Valley gag. All that’s missing is the virtual reality.

So, to sum up: They aren’t privy to his data. He isn’t privy to them. And because they work from encrypted data, they can’t use their machine learning models on other data—and neither can he. But Craib believes the blind can lead the blind to a better hedge fund.

Numerai’s fund has been trading stocks for a year. Though he declines to say just how successful it has been, due to government regulations around the release of such information, he does say it’s making money. And an increasingly large number of big-name investors have pumped money into the company, including the founder of Renaissances Technologies, an enormously successful “quant” hedge fund driven by data analysis. Craib and company have just completed their first round of venture funding, led by the New York venture capital firm Union Square Ventures. Union Square has invested $3 million in the round, with an additional $3 million coming from others.

Hedge funds have been exploring the use of machine learning algorithms for a while now, including established Wall Street names like Renaissance and Bridgewater Associates as well as tech startups like Sentient Technologies and Aidyia. But Craib’s venture represents new efforts to crowdsource the creation of these algorithms. Others are working on similar projects, including Two Sigma, a second data-centric New York hedge fund. But Numerai is attempting something far more extreme.

On the Edge

The company comes across as some sort of Silicon Valley gag: a tiny startup that seeks to reinvent the financial industry through artificial intelligence, encryption, crowdsourcing, and bitcoin. All that’s missing is the virtual reality. And to be sure, it’s still very early for Numerai. Even one of its investors, Union Square partner Andy Weissman, calls it an “experiment.”

But others are working on similar technology that can help build machine learning models more generally from encrypted data, including researchers at Microsoft. This can help companies like Microsoft better protect all the personal information they gather from customers. Oren Etzioni, the CEO of the Allen Institute for AI, says the approach could be particularly useful for Apple, which is pushing into machine learning while taking a hardline stance on data privacy. But such tech can also lead to the kind of AI crowdsourcing that Craib espouses

Craib dreamed up the idea while working for that financial firm in South Africa. He declines to name the firm, but says it runs an asset management fund spanning $15 billion in assets. He helped build machine learning algorithms that could help run this fund, but these weren’t all that complex. At one point, he wanted to share the company’s data with a friend who was doing more advanced machine learning work with neural networks, and the company forbade him. But its stance gave him an idea. “That’s when I started looking into these new ways of encrypting data—looking for a way of sharing the data with him without him being able to steal it and start his own hedge fund,” he says.

The result was Numerai. Craib put a million dollars of his own money in the fund, and in April, the company announced $1.5 million in funding from Howard Morgan, one of the founders of Renaissance Technologies. Morgan has invested again in the Series A round alongside Union Square and First Round Capital.

It’s an unorthodox play, to be sure. This is obvious just when you visit the company’s website, where Craib describes the company’s mission in a short video. He’s dressed in black-rimmed glasses and a silver racer jacket, and the video cuts him into a visual landscape reminiscent of The Matrix. “When we saw those videos, we thought: ‘this guy thinks differently,’” says Weissman.

As Weissman admits, the question is whether the scheme will work. The trouble with homomorphic encryption is that it can significantly slow down data analysis tasks. “Homomorphic encryption requires a tremendous about of computation time,” says Ameesh Divatia, the CEO of Baffle, a company that building encryption similar to what Craib describes. “How do you get it to run inside a business decision window?” Craib says that Numerai has solved the speed problem with its particular form of encryption, but Divatia warns that this may come at the expense of data privacy.

According to Raphael Bost, a visiting scientist at MIT’s Computer Science and Artificial Intelligence Laboratory who has explored the use of machine learning with encrypted data, Numerai is likely using a method similar to the one described by Microsoft, where the data is encrypted but not in a completely secure way. “You have to be very careful with side-channels on the algorithm that you are running,” he says of anyone who uses this method.

Turning Off the Sound at a Party

In any event, Numerai is ramping up its effort. Three months ago, about 4,500 data scientists had built about 250,000 machine learning models that drove about 7 billion predictions for the fund. Now, about 7,500 data scientists are involved, building a total of 500,000 models that drive about 28 billion predictions. As with the crowdsourced data science marketplace Kaggle, these data scientists compete to build the best models, and they can earn money in the process. For Numerai, part of the trick is that this is done at high volume. Through a statistics and machine learning technique called stacking or ensembling, Numerai can combine the best of myriad algorithms to create a more powerful whole.

Though most of these data scientists are anonymous, a small handful are not, including Phillip Culliton of Buffalo, New York, who also works for a data analysis company called Multimodel Research, which has a grant from the National Science Foundation. He has spent many years competing in data science competitions on Kaggle and sees Numerai as a more attractive option. “Kaggle is lovely and I enjoy competing, but only the top few competitors get paid, and only in some competitions,” he says. “The distribution of funds at Numerai among the top 100 or so competitors, in fairly large amounts at the top of the leaderboard, is quite nice.”

Each week, one hundred scientists earn bitcoin, with the company paying out over $150,000 in the digital currency so far. If the fund reaches a billion dollars under management, Craib says, it would pay out over $1 million each month to its data scientists.

Culliton says it’s more difficult to work with the encrypted data and draw his own conclusions from it, and another Numerai regular, Jim Fleming, who helps run a data science consultancy called the Fomoro Group, says much the same thing. But this isn’t necessarily a problem. After all, machine learning is more about the machine drawing the conclusions.

In many cases, even when working with unencrypted data, Culliton doesn’t know what it actually represents, but he can still use it to build machine learning models. “Encrypted data is like turning off the sound at the party,” Culliton says. “You’re no longer listening in on people’s private conversations, but you can still get very good signal on how close they feel to one other.”

If this works across Numerai’s larger community of data scientists, as Richard Craib hopes it will, Wall Street will be listening more closely, too.

 

Screen Shot 2015-11-18 at 4.55.47 PM

Back in the days of printed newspapers, magazines, and newsletters the acquisition of news and information was easier, or so it seemed.  The reason it seemed easier is that there was much less of it.  Today, with the internet, 24-hour financial media, blogs, and every conceivable method of acquisition, information is overwhelming.  Once I realized that some information was actionable and most of the rest was categorized as observable, then things became greatly simplified.  Hopefully this article will shed some light on how to separate actionable information from the much larger observable information.  As you can see from the Webster definitions below they initially do not seem that different.

Actionable – able to be used as a basis or reason for doing something or capable of being acted on.

Observable – possible to see or notice or deserving of attention; noteworthy.

However, when real money gets involved the difference can be significant.  Let me give you my definition and then follow up with some scenarios.  The world is full of observable information being dispensed as if it is actionable.  All the experts, television pundits, talking heads, economists (especially them), most newsletter writers, most blog authors, in fact most of the stuff you hear in regard to the markets is rarely actionable.  Actionable means that you, upon seeing it, can make a decision to buy, sell, or do nothing – period.

I’ll start by mentioning Japanese candle patterns, a subject I beat to death in this blog over the past few months.  I have never stated anything other than the fact that Japanese candle patterns should never be used in isolation; you should always use them in concert with other technical tools.  Hence, Japanese candle patterns for me are observable information; not actionable.  Only when backed up by Western technical tools can they become actionable.  I demonstrated with in my article Candlestick Analysis – Putting It All Together.

Too often I hear the financial media discussing economic indicators and how they affect the stock market.  Initially it seems they forget the stock market is one of the components of the index of LEADING indicators; in other words, the stock market is better at predicting the economy.  First of all, economics can never be proved right or wrong since it is an art, just like technical analysis.  Economic data is primarily monthly, often quarterly, and occasionally weekly.  It gets rebased periodically and often gets adjusted for seasonal affects and everything else you can think of.  It just cannot reliably provide any valuable information to a trader or investor.  However, boy does it sound really good when someone creates a great story around it and how at one time in the past it occurred at a market top; it is truly difficult to ignore.  Ignore you should!  The beauty of the data generated by the stock market, mainly price, is that it is an instantaneous view of supply and demand.  I have said this a lot on these pages, but it needs to be fully understood.  The action of buyers and sellers making decisions and taking action is determined by price, and price alone.  The analysis of price at least is a first step to obtaining actionable information.  Using technical tools that help you reduce price into information that you can rely upon is where the actionable part surfaces.

I also seriously doubt anyone relies totally upon one technical tool or indicator.  If they do, then probably not for long.  I managed a lot of money using a weight of the evidence approach which means I used a bunch of indicators from price, breadth, and relative strength (called it PBR – see graphic).  Each individual indicator could be classified as observable, but when used in concert with others, THEY became actionable.

I think the point of this entire article is to alert or remind you that there is a giant amount of information out there and that most of it is not actionable; it is only observable.  Sometimes it is difficult to tell the difference so just think about putting real money into a trade based upon what you hear or read.  Real money separates a lot of people from making decisions based upon observable information, no matter how convincing it is.

I am really looking forward to speaking at ChartCon 2016.  The schedule shows me on at 10:30am PT where I’ll talk about the marketing of Wall Street disguised as research and show a couple of things about Technical Analysis that annoy me.

Dance with the Actionable Information,

Greg Morris

 

Screen Shot 2015-11-18 at 4.55.47 PM

At its core, the blockchain is a technology that permanently records transactions in a way that cannot be later erased but can only be sequentially updated, in essence keeping a never-ending historical trail. This seemingly simple functional description has gargantuan implications. It is making us rethink the old ways of creating transactions, storing data, and moving assets, and that’s only the beginning.

The blockchain cannot be described just as a revolution. It is a tsunami-like phenomenon, slowly advancing and gradually enveloping everything along its way by the force of its progression. Plainly, it is the second significant overlay on top of the Internet, just as the Web was that first layer back in 1990. That new layer is mostly about trust, so we could call it the trust layer.

Blockchains are enormous catalysts for change that affect governance, ways of life, traditional corporate models, society and global institutions. Blockchain infiltration will be met with resistance, because it is an extreme change.

Blockchains defy old ideas that have been locked in our minds for decades, if not centuries. Blockchains will challenge governance and centrally controlled ways of enforcing transactions. For example, why pay an escrow to clear a title insurance if the blockchain can automatically check it in an irrefutable way?

Blockchains loosen up trust, which has been in the hands of central institutions (e.g., banks, policy makers, clearinghouses, governments, large corporations), and allows it to evade these old control points. For example, what if counterparty validation can be done on the blockchain, instead of by a clearinghouse?

An analogy would be when, in the 16th century, medieval guilds helped to maintain monopolies on certain crafts against outsiders, by controlling the printing of knowledge that would explain how to copy their work. They accomplished that type of censorship by being in cahoots with the Catholic Church and governments in most European countries that regulated and controlled printing by requiring licenses. That type of central control and monopoly didn’t last too long, and soon enough, knowledge was free to travel after an explosion in printing. To think of printing knowledge as an illegal activity would be unfathomable today. We could think of the traditional holders of central trust as today’s guilds, and we could question why they should continue holding that trust, if technology (the blockchain) performed that function as well or even better.

Blockchains liberate the trust function from outside existing boundaries in the same way as medieval institutions were forced to cede control of printing.

It is deceptive to view the blockchain primarily as a distributed ledger, because it represents only one of its many dimensions. It’s like describing the Internet as a network only, or as just a publishing platform. These are necessary but not sufficient conditions or properties; blockchains are also greater than the sum of their parts.

Blockchain proponents believe that trust should be free, and not in the hands of central forces that tax it, or control it in one form or another (e.g., fees, access rights, or permissions). They believe that trust can be and should be part of peer-to-peer relationships, facilitated by technology that can enforce it. Trust can be coded up, and it can be computed to be true or false by way of mathematically-backed certainty, that is enforced by powerful encryption to cement it. In essence, trust is replaced by cryptographic proofs, and trust is maintained by a network of trusted computers (honest nodes) that ensure its security, as contrasted with single entities who create overhead or unnecessary bureaucracy around it.

If blockchains are a new way to implement trusted transactions without trusted intermediaries, soon we’ll end up with intermediary-less trust. Policy makers who regulated “trusted” institutions like banks will face a dilemma. How can you regulate something that is evaporating? They will need to update their old regulations.

Intermediary-controlled trust came with some friction, but now, with the blockchain, we can have frictionless trust. So, when trust is “free” (even if it still needs to be earned), what happens next? Naturally, trust will follow the path of least resistance, and will become gradually decentralized towards the edges of the network.

Blockchains also enable assets and value to be exchanged, providing a new, speedy rail for moving value of all kinds without unnecessary intermediaries.

As back-end infrastructure, blockchains are metaphorically the ultimate, non-stop computers. Once launched, they never go down, because of the incredible amount of resiliency they offer.

There is no single point of failure unlike how bank systems have gone down, cloud-based services have gone down, but bona fide blockchains keep computing.

The Internet was about replacing some intermediaries. Now the blockchain is about replacing other intermediaries once again. But it’s also about creating new ones. And so was the Web. Current intermediaries will need to figure out how their roles will be affected, while others are angling to take a piece of the new pie in the race to “decentralize everything.”

The world is preoccupied with dissecting, analyzing and prognosticating on the blockchain’s future; technologists, entrepreneurs, and enterprises are wondering if it is to be considered vitamin or poison.

Today, we’re saying blockchain does this or that, but tomorrow blockchains will be rather invisible; we will talk more about what they enable. Just like the Internet or the Web, and just like data-bases, the blockchain brings with it a new language.

From the mid-1950s forward, as IT evolved, we became accustomed to a new language: mainframes, databases, networks, servers, software, operating systems, and programming languages. Since the early 1990s, the Internet ushered in another lexicon: browsing, website, Java, blogging, TCP/IP, SMTP, HTTP, URLs, and HTML. Today, the blockchain brings with it yet another new repertoire: consensus algorithms, smart contracts, distributed ledgers, oracles, digital wallets, and transaction blocks.

Block by block, we will accumulate our own chains of knowledge, and we will learn and understand the blockchain, what it changes, and the implications of such change.

Today, we Google for everything, mostly information or products.

Tomorrow, we will perform the equivalent of “googling” to verify records, identities, authenticity, rights, work done, titles, contracts, and other valuable asset-related processes. There will be digital ownership certificates for everything. Just like we cannot double spend digital money anymore (thanks to Satoshi Nakamoto’s invention), we will not be able to double copy or forge official certificates once they are certified on a blockchain. That was a missing piece of the information revolution, which the blockchain fixes.

I still remember the initial excitement around being able to track a shipped package on the Web when FedEx introduced this capability for the first time in 1994. Today, we take that type of service for granted, but this particular feature was a watershed use case that demonstrated what we could do on the early Web. The underlying message was that a previously enclosed private service could become openly accessible by anyone with Internet access. A whole host of services followed: online banking, filing taxes, buying products, trading stocks, checking on orders, and many others. Just as we access services that search public databases, we will search a new class of services that will check blockchains to confirm the veracity of information. Information access will not be enough. We will also want to ask for truth access, and we will ask if modifications were made to particular records, expecting the utmost transparency from those who hold them. The blockchain promises to serve up and expose transparency in its rawest forms.

The old question “Is it in the database?” will be replaced by “Is it on the blockchain?”

Is the blockchain more complicated than the Web? Most definitely.

The blockchain is part of the history of the Internet. It is at the same level as the World Wide Web in terms of importance, and arguably might give us back the Internet, in the way it was supposed to be: more decentralized, more open, more secure, more private, more equitable, and more accessible. Ironically, many blockchain applications also have a shot at replacing legacy Web applications, at the same time as they will replace legacy businesses that cannot loosen their grips on heavy-handed centrally enforced trust functions.

No matter how it unfolds, the blockchain’s history will continue to be written for a very long time, just as the history of the Web continued to be written well after its initial invention. But here’s what will make the blockchain’s future even more interesting: you are part of it.


Reprinted from The Business Blockchain: Promise, Practice, and Application of the Next Internet Technology by William Mougayar (foreword from Vitalik Buterin) with permission from John Wiley & Sons, Inc. Copyright (C) William Mougayar, 2016.

Snoopy-Typing-Away-1-CVV14J0D95-1024x768

IN THE SEARCH FOR THE NEW and different indexes that will power a new and different ETF, back-testing plays a critical role. Index providers, including S&P, Dow Jones, MSCI, Russell, Zacks and others can index just about anything. You want to rank the companies in the S&P 500 by earnings growth, then take the 50 top firms and weight them equally? Weight them by market capitalization? Go long the top 50 and short the bottom 50? They can build it. And then they back-test it. Indexes that look good in hindsight have a shot to become ETFs. Those that don’t, don’t.

Given our quantitative roots, we are sympathetic to the fact that backtests are often used as an input into making investment decisions. But past returns, as we all know, do not predict the future. And we think backtested results may be particularly problematic today. Very little fundamental data for US equities extends back more than 30 years, but the last 30 years were a period generally accompanied by two related phenomena: increasingly easy monetary policy and falling interest rates. In particular, the wave of liquidity and stimulus provided in the wake of the Tech Bubble coincided with unprecedented levels of credit expansion, rising asset correlations and record earnings volatility.

I think the creator of the chart was not that thorough in his research on the research, but, pretty chart nonetheless.

Statistically, according to this sample, we suck.

Statistics from CXO Research

Are prominent stock market bloggers in aggregate able to predict the market’s direction? The Ticker Sense Blogger Sentiment Poll “is a survey of the web’s most prominent investment bloggers, asking ‘What is your outlook on the U.S. stock market for the next 30 days?'” (bullish, bearish or neutral) on a weekly basis. The site currently lists 20 active prognosticators. Participation has varied over time. Based on results from Guru Grades and other stock market sentiment studies, we hypothesize that blogger sentiment: (1) tends to react to what just happened in the stock market; and, (2) does not predict stock market behavior. Using the 114 aggregate measurements from the poll since inception, we find that…

Because Ticker Sense collects data weekly, we look at weekly measurements and changes in weekly measurements. Because the poll question asks for a 30-day outlook, we test the forecasts against stock market behavior four weeks into the future. We use the S&P 500 index to represent the U.S. stock market. Because polling takes place Thursday-Sunday, we use the coincident Friday close to represent the state of the stock market for each poll. For example, the close of 1239 on Friday, 7/11/08 coincides with poll results for Monday, 7/14/08. We use [% Bullish] minus [% Bearish] as the net sentiment measure for each poll.

The following chart compares the coincident S&P 500 index and net blogger sentiment over the past 118 weeks (there were no surveys for 11/27/06, 1/1/07, 11/19/07 and 7/7/08). Net blogger sentiment is mostly bearish during July 2006-May 2007, mostly bullish during June 2007-April 2008 and mostly bearish since. On these visually comparable scales, blogger sentiment is much more volatile than the stock market. The fairly large week-to-week swings in net blogger sentiment suggest either that the bloggers are very sensitive to changes in market conditions, or that participation in polling varies considerably across weeks. A persistent change in participation could explain switches between mostly bullish and mostly bearish outlooks.

For a more precise test of the relationship, we look at poll-to-poll changes in net blogger sentiment versus associated stock market returns.

The following scatter plot relates poll-to-poll changes in net blogger sentiment to weekly changes in the S&P 500 index for concurrent intervals. If bloggers as a group react to what just happened in the stock market, a best-fit line would run from the lower left to the upper right. Based on 113 poll-to-poll changes, there is some support for this hypothesis. The Pearson correlation for these two series is 0.33. The R-squared statistic for the relationship is 0.11, indicating that the change in the stock market over the past week explains 11% of the change in blogger sentiment during that week.

How well does net blogger sentiment predict future stock returns?

The next scatter plot relates the 4-week future change in the S&P 500 index to net blogger sentiment.

If net blogger sentiment forecasts stock market behavior, a best-fit line would run from the lower left to the upper right.
If net blogger sentiment is a contrary indicator for stock market behavior, a best-fit line would run from the upper left to the lower right.
If net blogger sentiment does not predict stock market behavior at all, the plot would show no pattern and a best-fit line would be flat.
Based on 110 observations, the data indicate that bloggers in aggregate cannot predict the direction of the stock market. The Pearson correlation for the distribution is -0.05, and the R-squared statistic is 0.00. Blogger sentiment explains none of the stock returns over the next month. As sample size has grown, these statistics have varied (generally trending from contrary to non-predictive).

In summary, analysis of Ticker Sense Blogger Sentiment Poll results indicates that aggregate blogger sentiment is non-predictive for future stock market direction.

Next Page »