Embedding GitHub Repositories: A Comparative Study of the Python and Java Communities

As the largest open-source platform for code contributors to develop software toolkits, GitHub has enabled extensive research on social coding. Nevertheless, previous research has not explored whether and how different programming languages lead to distinct collective coding patterns. Our research investigates the linguistic relativity hypothesis for programming languages: that choice of programming language influences the development practices of programmers. By initially building repository representations from source code, readme text, and co-contribution networks, we investigate differences in the development practices of Python and Java communities on GitHub. We conduct sub-studies about the communities’ socio-functional mapping, programmers’ contribution diversity, and repository popularity. Our findings indicate that Python, a language that emphasizes flexibility and reusability of code, rewards contributors’ ability to produce code suited to many different functional needs. In contrast, Java, primarily used to build complete projects for passive end-users, fosters a community of programmers with highly specialized skills. In the Java community, functionally similar repositories exist within a tight-knit social cluster; and this social affiliation is predictive of a repository’s popularity. On the other hand, consistent with the Python community’s focus on broadly applicable scripting, it is the function, rather than the social context, that relates more to the popularity of a Python repository.

Leave a comment