As per Glassdoor Top 5 skills in Data Science for job openings are:
- Python
- R
- SQL
- Hadoop
- Java
Most Java developers know SQL, Hadoop & Java to a good extent in today’s environment, two important skills Python & R should be learned by the Java developer / architect / manager if s/he wants to contribute / work in Data Science area. In this article you will find a structured step by step approach for learning programming in R.
- To start with install R and familiarize yourself with R Console & R Script interface. You can run commands on both but it’s best to write multiple commands and try them out in script editor. Use short cut CTRL + R to run your commands in R Script editor.
- Explore menu options like Package -> Install / Load / Choose CRAN Mirror. By default many commands for statistics, visualization, etc. are given in R by default. Big set of libraries are already loaded into R by default and 100s more are available. Select any mirror to download new packages and install / load them step by step. You will need working internet connection. Learn how to set working directory. You can see default libraries available in R using library()
- From there move on to various R Objects / Data Types – Explore Data Frame, Vector, List, Matrices, Arrays and Factors. Try out examples for the same.
- Next step learn to load / read and write datasets by commands like read.csv / write.csv. You can also read / write excel sheets but for it you will need other packages. See the basic commands like summary, structure & fix to analyze / edit your dataset
- Next step – go through various categories of operators (logical, mathematical, relational) and concepts like pipe %>%, constants, rules for naming identifiers followed by various statistical functions directly available in R. You can get help on a command by using ?<COMMAND>. Also, learn to create functions and use conditions like if
- By now you should revise basics of statistics & various visualization charts which are taught typically in Year 1 / Semester 1 of MBA. Explore various default commands for statistics built into R by default. Some examples – mean, variance, standard deviation, etc.
- Learn to manipulate / read / write datasets using subset, sample_n & sample_frac and using dplyr package which has commands like select & filter among others
- Check various types of default visualization commands in R for various charts like barplot & pie. Post this learn how to use ggplot2 package
- You will get many datasets at kaggle.com and various websites like stock exchanges – NSE / BSE, RBI, Open Data websites of various Governments and others
- Explore top 20 packages of R categorized by various areas as given below.
An advantage of learning R is that you will become better at statistics & data science. It’s much simpler than Java in terms of syntax and structure and is influenced by open source languages / scripting like Linux, etc.
Reach out to me at neil@techandtrain.com if you want to discuss R, conduct a training for MBA / BE / MCA / MSc students in R or want to conduct a workshop for your managers / executives on Data Science / R / Java / etc.
References:
Top 10 skills for Data Science – Glassdoor Economic Research