Top 100 tags and contributors on StackOverflow
statistics April 28th, 2012
Big data is all over the place. Another wonderful source of a large dataset is StackOverflow. As of this moment, the site has more than 10 million questions and around 1.3 million users. Stackoverflow provides data about all answers, questions and voting information through their creative commons data dump. This dataset consists of a number of xml files and I consider this a very rich and interesting dataset.
I wanted to build a suggestion engine on top of this dataset. The idea is to suggest other geeks in the StackOverflow community who have technical interests and technical expertise similar to the given user (assuming that the user has a stackoverflow account). My features are entirely based on the tags in which a particular user has participated (questions/comments/answers). While I was on my way building the tool, I thought that it might be fun to just post the top contributors for each tag.
I decided to consider the measure of contribution as the number of instances of participation. Answering, commenting and asking a question are considered as participation. I just heard you, I do have double counting in cases where there are multiple participation from the same person on a given question.
Below is the list of the top 100 tags and contributors:
You can also Download the full set of 30K tags and contributors to review it offline
| Tags | Display Name | Contribution Count | Gravatar |
|---|---|---|---|
| c# | Jon Skeet | 8542 | |
| android | CommonsWare | 5103 | |
| jquery | Nick Craver | 4614 | |
| java | BalusC | 4558 | |
| python | Alex Martelli | 3639 | |
| .net | Jon Skeet | 3106 | |
| php | Pekka | 2935 | |
| asp.net-mvc | Darin Dimitrov | 2844 | |
| sql | OMG Ponies | 2263 | |
| sql-server | gbn | 2232 | |
| jsf | BalusC | 2081 | |
| c++ | Jerry Coffin | 2067 | |
| javascript | Nick Craver | 2043 | |
| asp.net | Darin Dimitrov | 1833 | |
| jsp | BalusC | 1790 | |
| django | Daniel Roseman | 1770 | |
| asp.net-mvc-3 | Darin Dimitrov | 1593 | |
| cocoa | Peter Hosey | 1565 | |
| xslt | Dimitre Novatchev | 1564 | |
| git | VonC | 1561 | |
| silverlight | AnthonyWJones | 1478 | |
| c | R.. | 1422 | |
| wcf | marc_s | 1400 | |
| linq | Jon Skeet | 1391 | |
| google-app-engine | Nick Johnson | 1347 | |
| html | Quentin | 1331 | |
| mysql | Quassnoi | 1294 | |
| winforms | Hans Passant | 1277 | |
| maven-2 | Pascal Thivent | 1236 | |
| css | thirtydot | 1234 | |
| servlets | BalusC | 1233 | |
| entity-framework | Ladislav Mrnka | 1212 | |
| swing | camickr | 1187 | |
| eclipse | VonC | 1185 | |
| iphone | TechZen | 1178 | |
| jqgrid | Oleg | 1172 | |
| bash | Dennis Williamson | 1132 | |
| hibernate | Pascal Thivent | 1125 | |
| scala | Daniel C. Sobral | 1105 | |
| asp.net-mvc-2 | Darin Dimitrov | 1104 | |
| objective-c | Dave DeLong | 1079 | |
| wpf | H.B. | 1057 | |
| flex | www.Flextras.com | 1020 | |
| xml | Dimitre Novatchev | 1003 | |
| windows-phone-7 | Matt Lacey | 993 | |
| regex | Tim Pietzcker | 989 | |
| tsql | gbn | 971 | |
| matlab | gnovice | 964 | |
| perl | Sinan Ünür | 956 | |
| oracle | Gary Myers | 945 | |
| delphi | Mason Wheeler | 872 | |
| ms-access | Remou | 859 | |
| r | Dirk Eddelbuettel | 830 | |
| sql-server-2005 | gbn | 829 | |
| nhibernate | Diego Mijelshon | 811 | |
| mod-rewrite | Gumbo | 800 | |
| spring | skaffman | 800 | |
| core-data | TechZen | 793 | |
| xpath | Dimitre Novatchev | 784 | |
| jsf-2.0 | BalusC | 761 | |
| ruby | the Tin Man | 757 | |
| f# | Tomas Petricek | 750 | |
| jpa | Pascal Thivent | 747 | |
| orm | Pascal Thivent | 713 | |
| sql-server-2008 | gbn | 701 | |
| security | Rook | 669 | |
| xhtml | Jitendra Vyas | 658 | |
| drupal | googletorp | 648 | |
| ruby-on-rails | apneadiving | 646 | |
| powershell | Keith Hill | 630 | |
| vb.net | Hans Passant | 624 | |
| mercurial | Ry4an | 615 | |
| entity-framework-4 | Ladislav Mrnka | 611 | |
| .htaccess | Gumbo | 611 | |
| generics | Jon Skeet | 609 | |
| windows-mobile | ctacke | 608 | |
| visual-studio | JaredPar | 603 | |
| rest | Darrel Miller | 591 | |
| haskell | Don Stewart | 589 | |
| shell | Dennis Williamson | 585 | |
| compact-framework | ctacke | 554 | |
| web-services | John Saunders | 550 | |
| ruby-on-rails-3 | AnApprentice | 543 | |
| multithreading | Jon Skeet | 540 | |
| linux | Ignacio Vazquez-Abrams | 539 | |
| emacs | Trey Jackson | 537 | |
| magento | clockworkgeek | 510 | |
| jaxb | Blaise Doughan | 499 | |
| performance | Mike Dunlavey | 494 | |
| winapi | Hans Passant | 493 | |
| windows | Hans Passant | 486 | |
| apache | Gumbo | 479 | |
| version-control | VonC | 464 | |
| entity-framework-4.1 | Ladislav Mrnka | 457 | |
| cakephp | deceze | 452 | |
| database | HLGEM | 450 | |
| postgresql | Frank Heikens | 449 | |
| wordpress | songdogtech | 447 | |
| ajax | Darin Dimitrov | 445 |

