On one fine afternoon, I was consuming LinkedIn and randomly stumbled upon a Cryptocurrency Hackathon post. I quickly checked the problem statement and event location. I had a 100% match on the problem statement as it involved developing a fintech ML solution and it was happening just 400m from my home. I signed up without blinking twice.
This was going to be my 2nd Hackathon. The first one had left me with a sour experience. We were asked to use organiser’s API for collecting data and make predictive model over it. It was a 2 day hackathon and the API was broken for quite a time. When the API finally started working, the data was so scarce that nobody can make any ML model on it. The only relief was that office space was great and food was tasty. Also, we were in a team of 3 – Me(A data scientist), a friend(Another data scientist) and a economist(who was an ideator and not a coder). Having 2 data scientist in a team led to chaos and replication of work. Having an ideator in team is of no value in a hackathon. You need people who can come up with ideas that can be developed in a limited time-frame and produce them. No doubt we had a disaster.
This time, I was sure of not having another data scientist in the team. Ideally I wanted a Full-stack guy with practical mindset. The only such guy I knew was my room-mate. I felt that 2 of us are enough. In hackathons, less is more. Trust me.
We had 4 problem statements to choose from~
- Arbitrage trading – I understood the concept of arbitrage but had never implemented it. One of my friend has deployed this trading system and makes around $1000 every month. To actually prove that it works, you have to setup an account on 2 exchanges and make rules for trading so that the arbitrage is more than the transaction cost. You also need to take care of slippage so that you don’t end up making a loss. It’s all very straight-forward and doesn’t require any ML. Maybe you can train a ML model to understand the arbitrage trading opportunities. But I was not sure. Hence I just passed on this problem statement.
- Sentiment analysis – Well, it will be hard to find a data scientist who has not worked or read on sentiment analysis. It is the darling project of everyone. I myself have worked exhaustively in NLP and could have easily taken to solve this problem statement. But there were issues. Firstly, I know for a fact that it is not easy to collect the data required to train the model. There is no financial text data available for positive and negative sentences. Interestingly, I have the data. I work in a finance research company Morningstar and we have tons of textual and numerical data. I had recently trained a Tensorflow model for this very specific task on the most clean tagged data possible. I could have used it and tried to correlate price fluctuations with sentiment. But that would have been a breach of data privacy and hence I just dropped the idea. Also, there was no other way to collect such clean tagged data from anywhere else. I made a cognizant decision that I will prefer loosing the competition rather than spoil mine and organisation’s name for hiring an unethical data scientist.
- Portfolio management – This is an old problem to solve. You just select a bunch of cryptos and use a library to do interior point optimisation – Markowitz frontier. Recently I had also been reading on doing portfolio optimisation using Reinforcement Learning but it requires tons of computational resources and you cannot be sure if the solution will converge. I didn’t want to work on a boring old idea and I felt uncomfortable with RL. Hence I was not keen to work on this problem but it was my 2nd choice.
- Trend forecasting – This is almost like predicting the unpredictable. But I have heard a lot of people working on ML models for predicting the direction of price movement. Also, if you know the returns, you can be sure on how much money to bet i.e If you know that returns can be 5% after time t instead of 2%, you will bet more money as the strength of direction is strong. Interestingly, I was not sure what time horizon I want to use for prediction. After I got the data, I did an analysis on distribution of returns for 1hr, 2hr and 3hr. From what I know about cryptos – You cannot predict far in time and there is no point in predicting very close because it can be noisy and returns will be very less. Based on this logic, 3 hours seemed like a good value. Also, from trading perspective, you need to do a lot of trades to average out the predictions. If you trade only a few times, you might end up lucky or unlucky. A time-frame of 3 hours allowed me make many trades. Hence, I ended up selecting 3 hour as the prediction window i.e. I made a ML model which can look at historical data till time t and predict the returns at time t + 3hr.
Once we were sure what we had to do to come up with a good solution, we created the whole data science pipeline~
- Collected data using Binance API
- Technical analysis feature engineering
- Train – Validation – Test data : We used data from July-2017 to April-2018 for train and validation. We tested on May-June 2018 data. We used BTCUSDT, ETHUSDT, LTCUSDT and XRPBTC as they are one of the highest traded.
- A pipeline for algorithm selection, feature selection and hypertuning of parameters with cross-validation
- The algorithm traded ~8% of the time, used a maximum of ~$700 and made ~$1400 in the backtest of 33 days.
- I was happy to see that in ~1000 trades the algo predicted, it lost in only one trade. It predicted wrong direction of price movement only 1 time.
- We were able to achieve this high accuracy of predicting direction and making profits because of special trade entry and exit conditions we had enforced.
An AI evangelist and a multi-disciplinary engineer. Loves to read business and psychology during leisure time. Connect with him any time on LinkedIn for a quick chat on AI!