An Introduction to the Psychology of Programming

Tim Mattson

Introduction

As far as we know, all computer programmers are human. Hence, to understand programmers, you need to understand the human psychology behind the activity of programming.

Given the importance of computer programming and the need to make programmers more productive, one would expect a large body of research on the psychology of programming. This topic, however, is only being addressed by a handful of research groups. The literature is sparse and there are many holes in our understanding of this vital problem.

The reason for this lack of research on the psychology of programming is straightforward. It is very hard to carry out this research. It is very difficult to design experiments that expose the inner workings of a programmer's mind. Once the experiments are designed, finding statistically significant numbers of programmers to use as subjects is even more difficult. You can always work on a captive pool of subjects (i.e., students), but they are not really indicative of real-world, professional programmers.

The result is that we cannot paint a complete picture of the psychological issues involved with programming. We can, however, paint a picture that illuminates key issues that need to be addressed when trying to understand software development. This picture would contain many holes and include gray areas far short of a consensus view. The picture would be valuable, however, both to the developer of programming environments and to the psychology researcher.

In this paper, I will paint this picture based on a review of the literature. This picture is too complex to present in a single format. Hence, I will provide several projections of this review, with each projection serving a different primary audience. The first projection is for the developer of programming environments. This projection consists of a series of design rules to be used when creating new programming environments. The second projection is for scientists interested in a consensus view of the cognitive models underlying the psychology of programming. Finally, for the active researcher in this field, I provide an annotated bibliography of the literature related to the psychology of programming.

But before we can launch into the review and its projections for different audiences, let's define our context and consider the field as a whole. Establishing the context for this discussion will occupy the rest of this chapter.

Parallel Computing and the Psychology of Programming

The literature on the psychology of parallel programming is sparse. Actually, it's essentially a null set. There is a dissertation that addresses the topic [Kann97], but the author himself described it as inconclusive and of little use. All I've been able to find is an unpublished report by Kann about his experiences with teaching concurrent programming with Ada [Kann98]. This report was interesting, but not really about the psychology of parallel programming.

The result is that we are on our own in terms of the psychology of parallel programming.

Programming in general, however, has been studied by psychologists for decades. If we assume that the results from the general case apply to parallel computing, then there is quite a bit of material for us to work with.

Is this a reasonable assumption? I believe it is a reasonable assumption. The parallel programmer has to think in terms of multiple tasks executing simultaneously. This adds tremendous complexity. The parallel programmer, however, still has to understand a problem specification and design algorithms to meet its objectives. The parallel programmer has to design data structures and code that safely manipulates them. In short, much of what the programmer does is the same whether writing sequential or parallel programs. Hence, I believe it's a reasonable assumption that conclusions based on the study of programmers of traditional sequential computers can be applied to programming parallel computers.

Overview of Research in the Psychology of Programming

Gee, it would be great to have a nice, 500-word summary of the field. It would provide so much context for the rest of this report. Maybe someday I will write it.

... but for now, I have more pressing work to worry about. If you want to read one book about this topic, take a look at [Hoc90]. This book is only slightly out of date. In it, you will learn about the major threads in the literature:

There's lots of good stuff to look at about sequential programming. Someday when I have some time, I'll summarize it.

Problems with our understanding of the Psychology of Programming

Humans are very hard to study. You just can't mess with them enough to learn what you need without breaking the law. Instead of informative and relatively intrusive techniques, you have to use interviews and external monitoring.

Because of this problem, there are huge holes in our understanding of the psychology of programming. We have made a lot of progress on understanding the way knowledge is represented by a programmer, but we lack good theories that describe programmer strategies. This situation holds since its much easier to ask a programmer what they know then to ask them how they know it.

So the experiments are hard, and as a result our knowledge is incomplete. There is another problem with much of the research on the psychology of programming. This issue is addressed in the important paper titled "Did anyone study any real programmers?" [Curtis86]. According to this paper, research on psychology and programming is dominated by studies of students. For example, a common type of study looks at expert vs. novice behavior. The problem is that in most cases, novices are defined as beginning undergraduates and experts are graduate students. I have nothing against graduate students, but they are still academically trained programmers with little or no industrial experience. They really aren't the experts we are interested in.

Therefore, while we can gain a lot from the literature on the psychology of programming, a healthy dose of skepticism is required. If a conclusion seems really strange or contradictory to your experience, it may be flawed.

Cognitive dimensions

One of the most influential researchers in the field is Thomas Green, a professor at the University of Leeds in the UK. His more recent work has moved away from cognitive models and fancy theories and more toward methods to support informal discussion in productive ways. In his own words [Green97]:

"The way forward is not to make strong, simple claims about how cognitive processes work. The way forward is to study the details of how notations convey information."

He has developed a framework [Green96] to facilitate discussions about programming environments. He calls this framework the "Cognitive Dimensions Framework". This framework consists of 14 relatively independent features that characterize programming environments:

Viscosity resistance to change. How hard is it to introduce small changes to the program?

Hidden Dependencies important links between entities are not visible. A hidden dependency results when one part of a program changes another part in a way that is not overtly apparent in the program text.

Visibility and Juxtaposibility ability to view components easily. Is required material directly accessible without cognitive work?

Imposed Look-ahead Constraints on order of doing things. Does the programming environment force the user to make a decision before the information is available?

Secondary Notation extra information in means other than program syntax. Can statements or primitive elements be grouped or otherwise denoted to provide extra information to the programmer? For example, white space around blocks of code can show related statements.

Closeness of Mapping representation maps to domain. How close is the mapping between the end-user visible problem domain and the programming domain?

Progressive Evaluation ability to check while incomplete. Can parts of the program be tested before the entire program is written?

Hard Mental Operations: operations that tax working memory. Does the programming language lead one to tortured combinations of primitive operations that are painful to de-tangle?

Diffuseness/Terseness succinctness of language

Abstraction Gradient amount of abstraction required, amount possible

Role-expressiveness purpose of a component is readily inferred

Error-proneness syntax provokes slip. Is it easy to make cognitive slips and introduce errors into the code?

Perceptual mapping important meanings conveyed by position, size, color, etc.

Consistency: Similar semantics expressed in similar syntax. Can you infer one part of a language from another part?

Individual dimensions aren't bad or good. The point is that it's a zero sum game and making these dimensions explicit in our discussions will help us manage the tradeoffs one must inevitably make. So far, Green has only applied these to comparisons of visual programming environments. I am planning to use these dimensions when we start comparing different solution stacks in the PAL project.

So what's in this set of documents?

I have completed my major push to understand the psychology of programming. I will continue studying this problem, but only as a "far back burner" effort.

The following documents summarize what I have found out during my study of the psychology of programming: