If you are here, I believe that you have a strong interest to understand what it takes to become a Data Scientist.
I am writing this post because I see a tremendous amount of people in a dilemma and there is absolutely no information out there, but just countless articles on what online courses you should take. People with almost all backgrounds – IT, mechanical, electrical, electronics, energy, chemical and civil, with people from B.Tech, M.Tech, B.Sc, M.Sc and even Ph.D, varying with no experience to 5 years of experience in my own circle and outside – asked me the same question – Should I get into data science? So hereby I will try my best to share how you can make it, with our knowledge on economics, psychology and study hacks.
To do or not to do.
It is my firm belief that you should know how this can work out from the economic point of view. When you will start applying, people are going to judge you based on your past. Accordingly, they will evaluate your worth in the market and roll out an offer letter.
If you are a non-IT professional, it is going to take a hell lot of effort to learn it and you don’t wont to be disappointed! So a safe number to say is that if you earn more than 7 lakhs*, you should be ready to be disappointed unless you hold a degree from Tier -1 college – Bachelors or Masters. What the hell did I just say -_- What I mean is you can start your journey of learning data science(DS) but it is going to be very difficult to accept an offer less than what you already earn. Also, in our field, the lower you earn the shittier the work. So there is a good chance that if you are leaving a high paid job where you can become an expert in the coming years, to start afresh in this new field where you will be treated like a new-comer, you should be god-damn sure you can hurt your ego. There is one catch though – if you are a young person(<25) and happen to earn more than 7 lakhs, you can choose to take this risk and make the switch. Financially speaking, your risk appetite should go down with time unless you absolutely hate your current job and want to change the field of work – this shit actually happened to me. I did masters in energy(mechanical) and came to a realisation that IT is way better than mechanical. I don’t advise people to do what I did. It’s a high risk, high reward game which can go in any direction.
If you are an IT professional, you are in an advantageous position. In DS interviews we definitely give importance to people who have worked in IT as we don’t have to teach them all the nitty-gritty of IT. No matter how fancy DS is, at the basic level it is still IT. We want people to know the IT stuff – database working, querying, ETL, testing, deployment and clean coding. People with a history of IT have exposure to this and adds a boost to the resume. The only thing they need to take care is data science – which seems much more manageable to them compared to non-IT people. Since their learning curve can be faster and they can leverage the past experience to get a better offer, their risk of exploring this direction is definitely less. So unless you are 7+ lakhs earner in India, you can go ahead and do it. Even if you end up getting less than 7, you will catch up much easily in later years.
To be frank, getting into data science is about capturing your fears. In the beginning it will get overwhelming with tons of acronyms and jargons but you will have to get used to it. And if you have been scared of equations back then in college, either it’s time to break your fears or forgot your dreams of getting in. The first step is to go through topics mentioned in this article to get a sound grip of basics. Most of the interviews will be a combination of basic questions and discussion around your projects(Personal + Professional). People with unclear basics don’t make a good impression and so its important to get it straight. This can get laboriously difficult. My initial days were filled with countless never ending doubts. The learning curve was steep, and confusion just kept on rising. After a while of making countless notes and revising them like 3 times, I was able to absorb things on its grand scale.
Ideally once you have gone through the basics, you should start interviewing to get a sense of the structure of interview and get comfortable failing at it. Remember these interviews can get excruciatingly tough. My experience tells me that the better the company, the tougher the interview. The richness of interview almost acts as a proxy of the strength of the team interviewing you. So if you are doing a very easy interview, chances are that you are going to get into some low quality excel or scraping stuff. And if you want to get into a good job, you should be great at the nuances of navigating an interview. Once you get into this process of interviewing, some companies will give you coding assignments. Here, it’s important for your to write your own code and review it with the aim of finding flaws in it. I cannot tell you how much I have screwed up in these assignments. I made projects with technical flaws and poor coding practices. But with each failure I found ways to do it better. I look back at those scripts some time and realise how far I have come. One thing to note here is – Don’t get it done by your friends. You can consult them if you want but get into the habit of cracking problems on your own. It is easier said than done but it will develop a character in you.
On journeys like this, it’s always good to have a companion. I found people through Facebook, Whatsapp and Telegram groups and learnt immensely by pairing up with them on projects. Work on the same project, push code to Github and discuss. This will keep you rolling and expand your approaches. You will find that there can be so many ways to solve the same problem that you will be surprised. A good data scientist is essentially someone who has made enough mistakes to know what will not work. Hence pair up with people and work on different ideas. Check out solutions of Kaggle problems. In case you have no ideas initially, just download data and available code from Kaggle and rewrite it line by line. My first clustering project happened exactly like this. I just rewrote existing code and tried to make sense of each line and the maths behind it. Later I started writing my own with help from StackOverflow. Now, if I am working on a problem already tackled before, I know what to do without any guidance or tutorial. It’s a god damn journey. Also, you will hardly remember the syntax unless you are doing the same thing every day. So don’t worry about it. Just open the documentation or tutorial and start writing.
Try to invest time in your LinkedIn profile from the beginning. It serves 2 purposes. Not only you will start networking with people in the industry but you will also get to know DS projects and latest advancements in the field. DS is evolving so quickly that you need some source of updates and there comes LinkedIn. This Facebook group is also very active and you can use it to find people of similar interests.
With time you will realise how less you remember of things you read. Hence, invest time in making notes. I used to pause videos of Andrew NG and make notes. It took almost thrice the time than watching videos but I ended up learning more – which is required in the beginning.
Try answering doubts of others even though you might not be an expert. This will lead to a deeper clarity on topics. Some of these can also end up being your interview questions.
There are a lot of courses and many approach the same topics in different ways. Initially, I was of the opinion that you should do only one course and that means you should find out the best course and do it and I selected Andrew’s. Later when I was placed and had time, I checked out the course on Udacity. I came to know that it is a more practical course on the pros and cons of algos and this was not discussed much in Andrew’s course. So it seems, different experts have different content to talk about. Hence, if you want to start a course, just do it. All of them are available for free. Stop the course if you feel uncomfortable with the style or content. I agree that Andrew’s course is a bit dense and requires you to watch it more than once. But that’s how you learn to not give up and learn what is required. If it was easy to do data science, everyone would have been doing it, demand would have been less than supply and people would not have been paid so high. So start with any course and don’t give up easily.
Many people ask how much time will it take to prepare and get a job. Since time is a function of your current knowledge and grasping ability, I would rather define it in terms of projects. Doing all the basic courses and around 8 supervised and 2 unsupervised projects can easily take 4-6 months of dedicated(10 h/day) effort. If you are doing it part-time, you can easily take 8-12 months. (Including time of finding companies and interviewing with them.)
The day of cracking your first job in data science will be etched forever in your memory.
Things to know to increase your chances of cracking interviews.
- Statistics – A lot of companies ask on Bayes theorem and Normal distribution
- Machine/Deep learning basics – Algorithm pros/cons and working
- Strong coding skills – Python + Competitive coding
- Database – Minimum required is SQL skills. Good to know both SQL and NoSQL databases.
- Cloud computing – A huge add-on but not an absolute necessity. Learn AWS(Amazon Web Services)
- Github – Displaying good work on Github shows confidence and enthusiasm – what best companies look for
- Blog – Blogging leads to self-clarity on your topics of interests. Also, since I learnt a lot by reading blogs of others, I always feel giving back by sharing my own learnings.
How I judge companies:
- The tougher the interview, the better the team, work and pay.
- I check out the profile of team members and team leaders on LinkedIn. I check out their history of work and current work descriptions. Sometimes people write vague descriptions or say I do scraping – This is a strong indicator for me to stay away from the company.
- Data science is god damn huge. If you want to learn quickly, join the team where smart people are. Sometimes they are startups and sometimes they are MNCs. The question of startups Vs MNCs is a debate worth another blog post. There are practices to learn from both of them. Startups have agility and MNCs have resources.
- Try to get into a company which is a market leader in at least one thing and has research-oriented mindset. It shouldn’t be a company which is doing data science for cost reduction but does it because it’s their bread and butter. Such companies are rare though.
- At the end of the interview, I ask age of the team, its size and the average experience of team members. I don’t hate startups or small teams but I just like to know the metrics. I also probe on what problems they are working on currently but most of them will not answer due to privacy.
- I have LinkedIn premium – so I check out the growth of hiring of the company in last 3 months, 6 months and 1 year. I specially do this for small companies and startups. Growth of team is directly related to the health of an organisation. This is good to have but not a necessary criteria.
- Check out reviews on Glassdoor.com for the company. Be sure to confirm if the work environment is healthy otherwise just cancel the process. If things look well, be ready with what kind of CTC they might roll out. You can also check the numbers in advance to see if they fit in your range.
Yes, all this is one long story. Just give your best. Earn it.
Let me know your thoughts on the article and it will be great if you can also share your own experience so far.
*The median salary of a data scientist in India is ~7 lakhs. Unless you are from Tier -1 college or have more than 1 year experience in DS or have rich IT experience with decent DS knowledge, you should not expect more than this.
An AI evangelist and a multi-disciplinary engineer. Loves to read business and psychology during leisure time. Connect with him any time on LinkedIn for a quick chat on AI!