What exactly is Data Science? Some might tell you all the technical and complex terminologies related to this field which will surely bounce off if you are a novice.
So, instead let me tell you what data science is, in pure layman’s terms. This is the process of asking interesting questions and then answering those questions using data. As the name suggests, data is the central and main ingredient of so called recipe of Data Science. Data is the essence of it.
A simple work flow of data science can be perceived as:
Ask a question
Gather data that might help you to answer that question
Clean the data
Explore, analyze, and visualize the data
Build and evaluate a machine learning model
Communicate results.
Demand of Data Science field: 59% of all Data Science and Analytics (DSA) job demand is in Finance and Insurance, Professional Services, and IT.
Choosing the right role in data science is necessary as well as of huge importance.
To name a few, following are some of the most common job titles for data scientists:
Business Intelligence Analyst: The role of this job is to analyse business and market trends with the help of historic data of the organization.
Data Mining Engineer: It’s role is not just confined to analyzing the data but also to generate sophisticated algorithms to help analyse data for further future.
Data Architect: They determine database structural requirements by analyzing customer’s operations and applications. And further provide them with database support by coding utilities and resolving problems.
Some more varied roles in the field of data science include data visualization expert, a machine learning expert etc. Depending upon the background/your work experience, choosing a relevant role is easy to get started. Hence, first and foremost, figure out what you want (your area of interests) and what are your pluses and minuses and then choose the role that best suits you. Here is a descriptive comparison done by Analytics Vidhya a few months back on what is it like being a Data Scientist vs Data Engineer vs Statistician .
Educational Requirements:
After choosing the suitable role for yourself, the next step is to put in some dedication to understand the role. There are various MOOC courses available freely as well as accreditation programs to let you embark the journey for data science. The main objective should be whether the course clears your basics and brings you to a suitable level, from which you can push on further. Go through the course actively. Sincerely follow the coursework, assignments and the discussions throughout the course. Some good MOOCs to look for include: Analytics Edge on edX, Machine Learning from Andrew Ng.
The field of data science requires knowledge of a programming language and the ability to work with data in that language.
Get comfortable with Tool/Language:
The languages Python and R are both great choices as programming languages for data science. R tends to be more popular in academia, and Python tends to be more popular in industry, but both languages have a wealth of packages that support the data science workflow. If you want to master any one language out of these two, then Python is the recommended one.
If you're looking for a course to help you learn Python, here are a few recommendations:
Python Jumpstart by Building 10 Apps is an excellent video course taught by Michael Kennedy (host of the "Talk Python To Me" podcast).
DataCamp and Dataquest both offer short, interactive courses in beginning Python.
Introduction to Python is a more substantial course in beginning Python that feels like an interactive textbook.
Google's Python Class is best for people with some programming experience, and includes lecture videos and downloadable exercises.
For playing and working with data, you should learn how to use Pandas Library. pandas provides a high-performance data structure (called a "DataFrame") that is suitable for tabular data with columns of different types, similar to an Excel spreadsheet or SQL table. It includes tools for reading and writing data, handling missing data, filtering data, cleaning messy data, merging datasets and visualizing data.
Next aspect of data science is to understand Machine Learning (ML) thoroughly. Building ML models to predict the future/automatically extracting insights from data is the most fascinating and demanding part of the data science. Scilit-learn is the most popular library for ML in Python. This is because Scilit-learn provides a clean and consistent interface to tons of different models, it also offers many tuning parameters for each model and most importantly it helps you to understand the models as well as how to use them properly.
ML is a complex field which requires both experience and further study. Scilit-learn is the advanced and in depth tool in ML. For a novice in ML, here is one recommendation: Read An Introduction to Statistical Learning (PDF / Amazon). It will help you to gain both a theoretical and practical understanding of many important methods for regression and classification, without requiring a background in advanced mathematics. The authors also released 15 hours of high-quality videos to supplement the book.
Group/Peer study:
It is always a better idea to study and discuss in groups where you can physically interact with each other. Even if you don’t have this kind of peer group, you can still have a meaningful technical discussion over the internet. There are online forums which give you this kind of environment.
A few of them are:
Concentrate on the practical applications and examples rather than theories given in the course you are learning with. It will help you to apply the concept in reality. Do all the exercises and assignments sincerely so as to understand the applications. Try to work on a few open data sets and apply your concepts being learnt. Even if you don’t understand the math behind a technique initially, understand the assumptions, what it does and how to interpret the results. You can always develop a deeper understanding at a later stage.
Take a look at the solutions by people who have worked in the field. They would be able to pinpoint you with the right approach faster.
Keep Learning and practicing
Practice, practice and more practice. This is the only way to improve your knowledge in this ever emerging field. Find the data sets that are of your interests, it can be of sports, stocks etc and practise.
Kaggle competitions are a great way to practice data science without coming up with the problem yourself. Don't worry about how high you place, just focus on learning something new with every competition.
Having a sound and profound technical knowledge is just not suffice. Along with that, verbal and written business communication are must.
Learn and Network
After all the learning, you can go on to attend industry events and conferences, popular meetups in your area, participate in hackathons in your area – even if you know only a little. You never know who, when and where will help you out!
Actually, a meetup is very advantageous when it comes down to making your mark in the data science community. You get to meet people in your area who work actively in the field, which provides you networking opportunities along with establishing a relationship with them will in turn help you advance your career heavily. A networking contact might:
Give you inside information of what’s happening in your field of interest.
Help you to have mentorship support.
Help you search for a Job, this would either be tips on job hunting through leads or possible employment opportunities directly.
Hope this helps you.
So, instead let me tell you what data science is, in pure layman’s terms. This is the process of asking interesting questions and then answering those questions using data. As the name suggests, data is the central and main ingredient of so called recipe of Data Science. Data is the essence of it.
A simple work flow of data science can be perceived as:
Ask a question
Gather data that might help you to answer that question
Clean the data
Explore, analyze, and visualize the data
Build and evaluate a machine learning model
Communicate results.
Demand of Data Science field: 59% of all Data Science and Analytics (DSA) job demand is in Finance and Insurance, Professional Services, and IT.
Choosing the right role in data science is necessary as well as of huge importance.
To name a few, following are some of the most common job titles for data scientists:
Business Intelligence Analyst: The role of this job is to analyse business and market trends with the help of historic data of the organization.
Data Mining Engineer: It’s role is not just confined to analyzing the data but also to generate sophisticated algorithms to help analyse data for further future.
Data Architect: They determine database structural requirements by analyzing customer’s operations and applications. And further provide them with database support by coding utilities and resolving problems.
Some more varied roles in the field of data science include data visualization expert, a machine learning expert etc. Depending upon the background/your work experience, choosing a relevant role is easy to get started. Hence, first and foremost, figure out what you want (your area of interests) and what are your pluses and minuses and then choose the role that best suits you. Here is a descriptive comparison done by Analytics Vidhya a few months back on what is it like being a Data Scientist vs Data Engineer vs Statistician .
How can I build my career in data science |
Educational Requirements:
After choosing the suitable role for yourself, the next step is to put in some dedication to understand the role. There are various MOOC courses available freely as well as accreditation programs to let you embark the journey for data science. The main objective should be whether the course clears your basics and brings you to a suitable level, from which you can push on further. Go through the course actively. Sincerely follow the coursework, assignments and the discussions throughout the course. Some good MOOCs to look for include: Analytics Edge on edX, Machine Learning from Andrew Ng.
The field of data science requires knowledge of a programming language and the ability to work with data in that language.
Get comfortable with Tool/Language:
The languages Python and R are both great choices as programming languages for data science. R tends to be more popular in academia, and Python tends to be more popular in industry, but both languages have a wealth of packages that support the data science workflow. If you want to master any one language out of these two, then Python is the recommended one.
If you're looking for a course to help you learn Python, here are a few recommendations:
Python Jumpstart by Building 10 Apps is an excellent video course taught by Michael Kennedy (host of the "Talk Python To Me" podcast).
DataCamp and Dataquest both offer short, interactive courses in beginning Python.
Introduction to Python is a more substantial course in beginning Python that feels like an interactive textbook.
Google's Python Class is best for people with some programming experience, and includes lecture videos and downloadable exercises.
For playing and working with data, you should learn how to use Pandas Library. pandas provides a high-performance data structure (called a "DataFrame") that is suitable for tabular data with columns of different types, similar to an Excel spreadsheet or SQL table. It includes tools for reading and writing data, handling missing data, filtering data, cleaning messy data, merging datasets and visualizing data.
Next aspect of data science is to understand Machine Learning (ML) thoroughly. Building ML models to predict the future/automatically extracting insights from data is the most fascinating and demanding part of the data science. Scilit-learn is the most popular library for ML in Python. This is because Scilit-learn provides a clean and consistent interface to tons of different models, it also offers many tuning parameters for each model and most importantly it helps you to understand the models as well as how to use them properly.
ML is a complex field which requires both experience and further study. Scilit-learn is the advanced and in depth tool in ML. For a novice in ML, here is one recommendation: Read An Introduction to Statistical Learning (PDF / Amazon). It will help you to gain both a theoretical and practical understanding of many important methods for regression and classification, without requiring a background in advanced mathematics. The authors also released 15 hours of high-quality videos to supplement the book.
Group/Peer study:
It is always a better idea to study and discuss in groups where you can physically interact with each other. Even if you don’t have this kind of peer group, you can still have a meaningful technical discussion over the internet. There are online forums which give you this kind of environment.
A few of them are:
- Analytics Vidhya
- StackExchange
Concentrate on the practical applications and examples rather than theories given in the course you are learning with. It will help you to apply the concept in reality. Do all the exercises and assignments sincerely so as to understand the applications. Try to work on a few open data sets and apply your concepts being learnt. Even if you don’t understand the math behind a technique initially, understand the assumptions, what it does and how to interpret the results. You can always develop a deeper understanding at a later stage.
Take a look at the solutions by people who have worked in the field. They would be able to pinpoint you with the right approach faster.
Keep Learning and practicing
Practice, practice and more practice. This is the only way to improve your knowledge in this ever emerging field. Find the data sets that are of your interests, it can be of sports, stocks etc and practise.
Kaggle competitions are a great way to practice data science without coming up with the problem yourself. Don't worry about how high you place, just focus on learning something new with every competition.
- Create your own data science projects and share them on GitHub. This way others can see that you know how to do reproducible data science.
- Read data science blogs. Stay updated with the recent happenings.
- To be able to experience the Python community, attend Pycon conferences.
- WildML
- NYU
- KDnuggets News
Having a sound and profound technical knowledge is just not suffice. Along with that, verbal and written business communication are must.
Learn and Network
After all the learning, you can go on to attend industry events and conferences, popular meetups in your area, participate in hackathons in your area – even if you know only a little. You never know who, when and where will help you out!
Actually, a meetup is very advantageous when it comes down to making your mark in the data science community. You get to meet people in your area who work actively in the field, which provides you networking opportunities along with establishing a relationship with them will in turn help you advance your career heavily. A networking contact might:
Give you inside information of what’s happening in your field of interest.
Help you to have mentorship support.
Help you search for a Job, this would either be tips on job hunting through leads or possible employment opportunities directly.
Hope this helps you.
Post a Comment