Learning Hub‎ > ‎

programming polls

1. A big increase in SAS user participation in 2014, perhaps partly driven by growth and change in KDnuggets readers composition, and likely also by increased visibility of this poll among SAS users. SAS users had a high percentage of "lone" votes - in 2014, 58% of them said they used only SAS, compared to 26% in 2013. The fraction of "lone" votes in 2014 was 20.5% for R, 14% for Python, and only 4.5% for SQL. 

2. Consolidation among top 4 languages - R, SAS, Python, and SQL. 91% of all voters have used at least one of them. Almost all other languages declined in their popularity for data mining tasks, including Java, Unix shell, MATLAB, C/C++, Perl, Octave, Ruby, Lisp, and F#. 

Here is a Venn diagram that shows significant overlap between R, Python, and SQL. The percentages indicated how many voters chose that option, eg 20% of all voters have used both R and Python, while 10% have used R, Python, and SQL. The areas of the circles and intersections approximately correspond to the fraction of voters. 

KDnuggets 2014 Poll - Overlap between languages for Analytics/Data Mining: R, Python, and SQL

Here is a similar Venn diagram showing overlap between R, Python, and SAS. We see that SAS is much more independent from R and Python, with about 2/3 of of SAS users not using R or Python. 

KDnuggets 2014 Poll - Overlap between languages for Analytics/Data Mining: R, Python, and SAS 

3. Languages with the highest growth in 2014 were
  • Julia, 316% growth, from 0.7% share in 2013 to 2.9% in 2014
  • SAS, 76% growth, from 20.8% in 2013 to 36.4% in 2014
  • Scala, 74% growth, from 2.2% in 2013 to 3.9% in 2014

 
4. The languages with the largest decline in share of usage were
  • F#, 100% decline, from 1.7% share in 2013 to zero in 2014
  • C++/C, 60% decline, from 9.3% in 2013 to 3.6% in 2014
  • GNU Octave, 57% decline, from 5.6% in 2013 to 2.4% in 2014
  • MATLAB, 50% decline, from 12.5% in 2013 to 6.3% in 2014
  • Ruby, 44% decline, from 2.2% in 2013 to 1.3% in 2014
  • Perl, 41% decline, from 4.5% in 2013 to 2.6% in 2014

 
Here is the table with more details: 

What programming/statistics languages you used for an analytics / data mining / data science work in 2014?
Language used % voters in 2014 (719 total) 
 % voters in 2013 (713 total) 
 % voters in 2012 (579 total)
R (352 voters in 2014) 49.0%
 60.9%
 52.5%
SAS (262) 36.4%
 20.8%
 19.7%
Python (252) 35.0%
 38.8%
 36.1%
SQL (220) 30.6%
 36.6%
 32.1%
Java (89) 12.4%
 16.5%
 21.2%
Unix shell/awk/sed (63) 8.8%
 11.1%
 14.7%
Pig Latin/ Hive/ other Hadoop-based languages (61) 8.5%
 8.0%
 6.7%
SPSS (58) 8.1%
not asked
not asked
MATLAB (45) 6.3%
 12.5%
 13.1%
Scala (28) 3.9%
 2.2%
 2.4%
C/C++ (26) 3.6%
 9.3%
 14.3%
Julia (21) 2.9%
 0.7%
 0.3%
Other low-level languages (20) 2.8%
 5.9%
 11.4%
Perl (19) 2.6%
 4.5%
 9.0%
GNU Octave (17) 2.4%
 5.6%
 5.9%
Ruby (9) 1.3%
 2.2%
 3.8%
Lisp/Clojure (5) 0.7%
 1.0%
 4.3%
F# (0)0%
 1.7%
not asked in 2012


Among other programming languages William Dwinnell mentioned Compiled BASIC (PowerBASIC). 

Regional participation was
  • US/Canada, 51.6%,
  • Europe: 26.7%,
  • Asia: 13.3%,
  • Latin America: 3.7%,
  • Africa/Middle East: 3.5%
  • AU/NZ: 2.0%
Comments