Counting by a number of groups — in some cases known as crosstab reviews — can be a handy way to search at details ranging from general public viewpoint surveys to professional medical checks. For instance, how did men and women vote by gender and age group? How numerous software program builders who use both of those R and Python are adult males vs. gals?

There are a ton of techniques to do this sort of counting by groups in R. Listed here, I’d like to share some of my favorites.

For the demos in this posting, I’ll use a subset of the Stack Overflow Developers study, which surveys builders on dozens of subject areas ranging from salaries to technologies made use of. I’ll whittle it down with columns for languages made use of, gender, and if they code as a interest. I also included my individual LanguageGroup column for no matter whether a developer documented working with R, Python, both of those, or neither.

If you’d like to observe alongside, the final webpage of this posting has guidance on how to down load and wrangle the details to get the exact same details set I’m working with.

The details has a person row for each and every study response, and the four columns are all characters.

str(mydata)
'data.frame':83379 obs. of  4 variables:
 $ Gender            : chr  "Male" "Male" "Male" "Male" ...
 $ LanguageWorkedWith: chr  "HTML/CSSJavaJavaScriptPython" "C++HTML/CSSPython" "HTML/CSS" "CC++C#PythonSQL" ...
 $ Hobbyist          : chr  "Yes" "No" "Yes" "No" ...
 $ LanguageGroup     : chr  "Python" "Python" "Neither" "Python" ...

I filtered the raw details to make the crosstabs extra workable, such as removing lacking values and taking the two biggest genders only, Male and Lady.