It has been 2 years since I published “CamelCase vs underscores: Scientific showdown”, and it still is easily the most visited article on this blog. Yesterday alone it got 2,614 views thanks to a forum post on Y Combinator, pretty much suppressing my normal visit rates entirely. What is it that makes it such a hot topic? Honestly, it doesn’t interest me that much anymore since there are many more important ways by which to make your code more readable; note it is code comprehension we are talking about here, not how fast you can write code! Before I outlined how the entire discussion could be made obsolete by moving away from a textual representation of code, and in my previous post I related software design principles as an act of communication to the cooperative principle in Linguistics. Nonetheless, given the immense interest this article seems to be getting I feel it’s my obligation to report on follow-up research of the previously discussed paper “To camelcase or under_score” by Binkley et al. (2009) (PDF).
In “An Eye Tracking Study on camelCase and under_score Identifier Syles” by Sharif and Maletic (2010) (PDF) the previous study is replicated but deviates from it in a few points:
- Only programmers are used as subjects.
- All of the subjects had experience with both styles and their preference of style was approximately split even among the groups.
- Most of the subjects were historically trained in the underscore style. (The opposite was true in the study by Binkley et al.)
- Eye tracking is used to measure fixation count and rate. Results from previous eye tracking studies in the domain of cognitive psychology imply that camel-cased identifiers should be more difficult to read compared to underscored identifiers.
No difference in accuracy was reported (as opposed to Binkley et al.), but on average, camel-cased identifiers took 932ms (20%) longer than underscored identifiers, in line with the 13,5% longer as reported by Binkley et al. The eye tracking results also give some insight into visual effort. Camel-cased identifiers require a higher average duration of fixations.
When interested into the details of the studies, don’t forget to read the papers yourself. I linked to them for your convenience, but if the links break you can easily find them by looking up their titles on Google Scholar.
It seems in general the subject has gotten more attention over the past 2 years in research. You can find relevant resources yourself by checking out the ‘Citing Documents’ of the discussed papers, but here are a few interesting ones:
- Woman and men – Different but equal: On the impact of identifier style on source code reading by Sharafi et al. (2012) (PDF)
- Context and Vision: Studying Two Factors Impacting Program Comprehension by Soh, Z. (2011)
- Can Better Identifier Splitting Techniques Help Feature Location? by Guerrouj et al. (2011) (PDF)
10 thoughts on “CamelCase vs underscores: Revisited”
Your arguments are interesting, but Chris Done has a [conciser and appealing arguments](http://chrisdone.com/posts/camelcase-vs-underscores-vs-hyphens), in my opinion. 🙂
I’d agree (and this is also in line with both studies), but unfortunately not an option in many languages.
Hello. Thank you for articles on case. I am not a coder. I am a project lead for records management. We have several options to name our records:
4. Camel case.
I want the e-file names to readily transfer from one platform to another. I am concerned that spaces will not transfer consistently and correctly. That is we lose part of the name.
Underscores get very low up take. People just forget or don’t bother. In the world of 2 finger typers, underscores are hard.
We want to be able to scan very long lists of documents and see how they are related (naming conventions) and be able to retrieve quickly.
Your articles provided some things to consider but I rejected CamelCase due to its 20% increase to identify. We literally have millions of items to name and potentially retrieve. 20% is considerable under those circumstances.
Thank you again for your posts. Enjoyed them immensely.
When I used Microsoft VB6 I enjoyed their built in camelcase naming because they identified what type of variable I was looking at. I found that helpful. Example: btnSubmit For whatever reason I found that very readable. Perhaps because of the repetition of the prefix.
That second study you mention states “In this study, subjects were trained primarily in
the underscore style (…)” without stating explicitly if or what has been done to account for this bias. Without a control group trained in no style or in camel case style it’s very hard to tell how this influenced the results. Which is why I’d still give more credit to the first one.
Interesting study. I am particularly interested in names of TEST methods only. I am not about to try and fight the idiomatic style in any language for production code. However I do argue, repeatedly, that for tests code functions the context differs.
Test functions are;
1. rarely called explicitly
2. typically longer then production code methods
3. designed to be read
I do think this is a compelling use case for underscore where idiomatic or dogmatic arguments might override readability in _normal code_
public void a_user_must_have_a_valid_email_address()
The study has proven that CamelCase is slower and harder to read, fight is supposed to be over, but as usual humanity will take decades to understand and apply this piece of info.
Lol, decades? More like centuries.