Events Novel reviews Projects Research The College Store

How can you become a Data Scientist

Posted on March 9th, 2016 by in Research

We have been getting a lot of requests lately, wherein students from various colleges are interested in knowing what all it takes to pursue the data science career track. After all, you might have heard this before, Data Science is the sexiest job of the 21st century. During my short stint at an analytics company in Bangalore, I worked with people who had all kinds of academic background. Even if you ain’t an Engineer or a statistician, people with all backgrounds ranging from Biotechnology, Mechanical Engineering, Biology, or to Basic Sciences were working there. Point is, you just need that internal knack to be a data scientist. Nothing else matters. So just put your leg on the pedal, shift the gear to progress and reach your destination, wherein you will be a proud and happy data scientist.

Now, we aren’t masters of data science ourselves, but with a miniscule background and interest in the field, we’ll try to aggregate the best possible resources in this blog which will help you to board the ship of data science. And then, it will be upto you, how you sail your journey. But ofcourse, we will always be there for you in case you need any guidance, help in writing/debugging code or whatever it may be.

Essential Requirements

To start off with: Install Anaconda write away, whether you are on Windows, Linux or Mac OS, Anaconda is the one which WILL help you the most. Otherwise, you will land up installing this package or that package and will never come out of the hassles of installing all the dependencies (unless ofcourse you are already well versed with the development environment, but still I would suggest to go for Anaconda).

A small disclaimer or a pre-requisite

And now, I will scare you off. I mean, you ought to know a little math. You must be able to understand probability, statistics, linear & matrix algebra and somewhat multivariate calculus. Having even this much of mathematical understanding, you will certainly be able to grasp almost all of the machine learning techniques. For anything that you are not very confident about, please head over to youtube (watch NPTEL lectures, there are several MOOCs also and several other resources). Second thing in the list is, you must know how to code. Even a little bit would do. Because in that case, you will be able to pick up Python and other important languages, quite easily. But please keep in mind that Python is a really very important language for a data scientist.

Dump the pre-requisite in trash bin. No, I mean it, seriously!

Having said all of that, you must not worry even if you have little to no knowledge of everything or anything written above. Just remember, where there is a will, there is a way. There are ample of resources online and offline to help you.

Continuing on building the essential requirements

Ok so continuing, you could also install Sublime Text as a good editor. You could also learn probability and statistics (with heavy application in Python) from here (it’s a free pdf of the book).

Done with the installations, now what?

Now, most of you would wonder at this point, which would be the best MOOC or online course or video to follow? They are in plenty. One could be Machine Learning (Stanford), Andrew Ng course on Coursera (which by the way, is one of the most popular courses). Probability and Statistics (Harvard University) course. Data Science course from Harvard University (You will learn about Python libraries for data science, along with a thorough insight into Machine Learning algorithms). Data Analyst Nanodegree Program by Udacity. There are several courses apart from these, which are equally good. You can refer to more in-depth reviews about each one of these, to make a proper judgement call, to select the appropriate one.  

Finished with the videos as well?! I bet, you are surfing on a high tide!

Post that, I am pretty sure you will feel a nudge to do something hands-on apart from what was taught or asked to do in the above mentioned online courses/videos. So then head over to Kaggle, one of the best data science playground to hone your skills. You could easily start by doing some simple getting-started tasks. Some of their top, getting started hands-on tasks are: Titanic: Machine Learning from Disaster, Bike Sharing Demand, Sentiment Analysis on Movie Reviews. All of them are extremely helpful in getting you started & helping you learn a lot. Besides, there is a huge community that has already solved these problems. That means, an equal number of people would have faced all types of problems. Hence, you need not worry if at all you get stuck at any point. Because, there would definitely be someone who might have faced similar problem as you and someone might have posted a solution to that. So, no worries!! Once, you are through all of these, you can venture out in the open solving real world problems and making this world even a better and decision driven place to live ;) Have a look at this link: What are some good toy problems in data science.

It’s time to dive and swim in the real world. Wanna know how?

Now, towards the end of our blog, I would like to share with you a few links which you might be most interested in knowing. The guide to Job/internship opportunities in data science (some or most of these might be a little far-fetched if you are at a very beginner level, but still these are very interesting reads).

  1. How do I prepare for a data science interview? The best thing about this link is, people have also given direct as-it-is questions which are expected from an applicant. Here you will get links to other blog posts of interviewers also. Amazing!
  2. How should I prepare for statistics based questions for a data science interviews?
  3. This could be better looked over LinkedIn, angelist ( and various other job posting sites. But still, have a look at this link: Companies having data science internships.

Courtesy call and final remarks

Now, this blog was quite heavily inspired by aggregating resources primarily from The Official Data Science FAQ. That’s why you would mostly find quora links.

Wrapping up, the final pro tip: The best way to become a data scientist is – to do and practice – data science. As simple as that!! And if you want any help, and I quote again “any help”, in pursuing that, just let us know. We will reach out to you with best possible help in our reach without any obligation. PERIOD.

Stay tuned for a lot more :)

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Follow us on Facebook
Follow us on Google+