您的当前位置：首页 Accepted for publication by IEEE Transactions on Software Engineering.

Accepted for publication by IEEE Transactions on Software Engineering.

来源：华佗小知识

Investigating Reading Techniques for Object-Oriented Framework Learning

Forrest Shull, Filippo Lanubile, and Victor R. Basili, Fellow, IEEE

Abstract--The empirical study described in this paper

addresses software reading for construction: how applicationdevelopers obtain an understanding of a software artifact foruse in new system development. This study focuses on theprocesses that developers would engage in when learning andusing object-oriented frameworks. We analyzed 15 studentsoftware development projects using both qualitative andquantitative methods to gain insight into what processesoccurred during framework usage. The contribution of thestudy is not to test predefined hypotheses but to generate well-supported hypotheses for further investigation. The main

hypotheses we produce are that example-based techniques arewell suited to use by beginning learners while hierarchy-basedtechniques are not because of a larger learning curve. Othermore specific hypotheses are proposed and discussed.Index Terms--object-oriented frameworks, software reading,empirical study

I. INTRODUCTION

In almost any software development environment, thevarious work documents used (e.g. requirements

documents, code and design plans) require continual reviewand modification throughout the lifecycle. This is due tothe central role such documents play in many softwareengineering tasks (e.g. verification and validation,

maintenance, evolution, and reuse). Software reading, i.e.,the individual analysis of textual software work productsaimed at achieving whatever degree of understanding isneeded to accomplish a particular task, is thus a key

technical activity for software development. A number ofstudies have examined reading techniques applied to avariety of software engineering tasks, such as detectingdefects in requirements [3, 4, 41, 44, 49], assessing userinterfaces [52], and reading Object-Oriented code [27].The reading techniques presented in this paper are classifiedunder Reading for Construction, which is aimed at

answering the question: Given an existing artifact, how do Iunderstand how to use it as part of my new system?

Reading for construction is important for comprehendingwhat a system does, what capabilities exist and do not exist;it helps us abstract the important information in the system. Forrest Shull and Victor Basili are with the Computer Science Department,University of Maryland, College Park, MD 20742. E-mail: {fshull,basili}@cs.umd.edu.

Filippo Lanubile is with the Dipartimento di Informatica, University ofBari, Via Orabona 4, 70126 Bari, Italy. E-mail: lanubile@di.uniba.it.

It is useful for maintenance as well as for building newsystems from reusable components and architectures [4].We chose to focus on the understanding of object-orientedframeworks as the artifact to be used in system

development. An object-oriented framework (hereinafter,simply framework) is \"a reusable design of all or part of asystem that is represented by a set of abstract classes andthe way their instances interact\" [22, 24]. From theperspective of the application developer, it is a skeletonapplication that can be customized to produce specificapplications in some domain [14, 24]. Some of the best-known frameworks support the development of graphicaluser interfaces (e.g. MacApp, ET++, Interviews, MFC andAWT). Frameworks are also spreading into other domainssuch as communication software [38], manufacturingsystems [8, 37], and banking applications [5].

The choice to focus on frameworks was motivatedprimarily by two reasons:

1. Frameworks are a promising means of reuse. Although

class libraries are often touted as an effective means ofbuilding new systems more cheaply or reliably, theselibraries provide only functionality at a low level. Thisforces the developer to provide the interconnectionsboth between classes from the library and between thelibrary classes and the system being developed.Greater benefits, such as faster application

development [28], are expected from reusable, domainspecific frameworks that usefully encapsulate theseinterconnections themselves.

2. Frameworks have associated learning problems that

affect their usefulness. The effort required to learnenough about the framework to begin coding is veryhigh, especially for novices [32, 46]. Developing anapplication by using a framework is closer to

maintaining an existing application than to developinga new application from scratch: in framework-baseddevelopment, the static and dynamic structures mustfirst be understood and then adapted to the specificrequirements of the application. As in maintenance,for a developer unfamiliar with the system to obtainthis understanding is a non-trivial task. Little work hasyet been done on minimizing this learning curve.Recognizing that one study cannot address issues for alltypes of frameworks, this paper concentrates on white-boxframeworks. Frameworks of this type are tailored byderiving new classes through inheritance, and by writing1

Accepted for publication by IEEE Transactions on Software Engineering.application-specific methods. Black-box frameworks, incontrast, provide a set of application-specific componentsthat are plugged together through polymorphic composition[22]. It has been suggested that frameworks evolve towardsblack-box as the system design for their application domainbecomes better understood [22]. Conclusions about white-box frameworks are of interest primarily for two reasons:1. White-box frameworks are in wide use [14].

2. Not every framework inevitably evolves into a black-box framework. For example, some frameworks areretired from use before evolving to black-box stage;other frameworks are in application domains that arenot understood to a sufficient degree to support black-box frameworks. [33]

II. RESEARCH QUESTIONS

Since we approached this study from the viewpoint ofsoftware reading, our primary focus was on the processesdevelopers would engage in, as they attempted to discoverenough information about the framework to be able to use iteffectively. We reasoned that the best approach would beto observe a number of different approaches or techniquesand their effects in practice. From this information, wehoped to determine what kinds of strategies could be used,and for which situations they were likely to be particularlywell- or ill-suited. Ultimately we hoped to gain anunderstanding of the deeper principles involved in

framework usage by studying the interaction between thedifferent techniques and the specific task undertaken.Since our study took place in the context of a classroomassignment, we felt it necessary to give our students astarting point for using frameworks. In the absence of anyempirical evidence or general agreement in the literature onthe best way to teach developers how to use a framework,we selected two promising approaches. Each approach wasthe basis for a set of guidelines that was taught to half of theclass, so that the strengths and weaknesses of the

approaches could be compared. (We discuss the guidelinesthemselves and the process of their creation in section V.)At the same time, we did not want to prevent the studentsfrom using work practices that they already knew, ordiscovered during the project, to be effective. Therefore,we allowed the students to modify these guidelines asdesired. Our intention was to study the work practices ofthe students to determine when our guidelines were used,what other work practices were used, and how effectivethey were for particular tasks. We did not wish to constrainour subjects in any way to an artificial procedure, but tostudy and understand what they felt the most suitableapproaches were for the problem. Our main researchquestions can be phrased as:

Can strategies for learning frameworks be identified?What are their characteristics?

As it turned out, one of the learning approaches was viewedby subjects as too cumbersome for the environment of this

study, and was not used by any subject for the duration ofthe study. Although aspects of this approach were

incorporated into the work practices of the subjects (thesepractices are described in some detail in section VII.A), astraightforward quantitative comparison of the results ofusing each approach was not possible. Instead, most of theresults presented in this paper come from a quantitative andqualitative analysis of the student experiences with learningand using the framework over the course of the semester.This type of information is useful for giving us a deeperunderstanding of what is important in learning to useframeworks.

III. RELATED WORK

A survey of the literature on frameworks shows thatrelatively little has been written on using frameworks (asopposed to building or designing them). Most of the workon using and learning frameworks tends to concentrate onstrategies for framework designers to use in documentingtheir work. The primary weaknesses of this approach arethat, first, the results are only applicable to frameworks forwhich the prescribed documentation has been constructed(that is, they do not directly contribute to general guidelinesthat would help developers in approaching any framework)and secondly, that usually very little empirical evidence ispresented in order to demonstrate that the prescribed

method is as effective as claimed. We present some of themain areas of framework documentation here and discussrepresentative papers for each. We would like to reiteratethat we in no way see our study as competitive with theseworks. Our aim is not to suggest that these other

approaches are right or wrong, or to present an alternativeapproach which we argue to be superior. Rather, we aim toprovide a more low-level indication of what sources ofinformation or types of activities are important in

framework use - information which may be used to helpidentify weaknesses in higher-level techniques and focusthem on aspects of the framework which are mostimportant.

1. Patterns and recipes: Beck and Johnson [6, 23]

advocate the use of “patterns” (interlocking

descriptions of problem/solution pairs, similar toobject-oriented design patterns [17] or cookbook

recipes, e.g. [1]) to describe frameworks. Each patterndescribes a functionality supported by the framework,demonstrates how to implement the functionality, anddiscusses the impact of the implementation on thesystem. This seems a promising approach, because itseems capable of both showing the developer only asmuch detail as he or she needs for the current task anddirecting the developer’s attention to only the mostrelevant portions of the framework. Patterns have infact been used as the sole form of documentation forthe HotDraw framework. However, the only evidencepresented as to their effectiveness is an informal studyin which subjects were asked to learn HotDraw usingpatterns and provide feedback [23]. This study seemsto have been very successful at its primary goal of

Accepted for publication by IEEE Transactions on Software Engineering.

helping the patterns’ authors debug their work, butdoes not provide much detail as to how the learningprocess was influenced. Thus the presentation leavesunanswered questions as to whether the observedeffectiveness was a function of the specific projectundertaken or would be true in any environment.A related approach is the use of “hooks,” which aremeant to be similar to Beck and Johnson’s patterns,although more structured, more uniform, and lessnarrative in style [16]. Like patterns, each hookprovides only the information necessary to solve aspecific, focused problem. They are produced by theframework developer to illustrate how the frameworkis intended to be used.

Johnson states [23] that it “would probably be

worthwhile to try out the patterns in a controlled settingwhere it would be possible to watch how people usethe patterns and what aspects of [the framework] arehard to learn.” We feel that our study examines

framework usage in exactly this way, although as wedid not want to make a priori assumptions about theeffectiveness of one method of documentation overanother we did not work in an environmentdocumented using patterns.

2. Formal and/or searchable specifications of

behavior: Another tactic has been to formalize

descriptions of the behavior of framework components,which then allows the creation of a search mechanismfor finding useful components given a query. One suchexample is the prototype framework browser

constructed at the University of Quebec [30], which isespecially promising in that it concentrates on finding ageneral solution which can be applied to any existingframework, regardless of the level of documentationsupplied. However, Gangopadhyay and Mitra pointout two major stumbling blocks for query-based

learning of frameworks: first, searching for and reusingone component at a time does not allow the potentiallysubtle connections between components to be

understood, and second, it is a very difficult problem tomatch a query which has been specified in a way

meaningful to the developer with the description of theframework components [18].3. Architectural approaches: Gangopadhyay and Mitra

recommend instead a top-down approach to learningframeworks, by which they mean a concentration onthe framework architecture rather than on individualcomponents [18]. They recommend the developmentof exemplars, executable visual models that consist ofinstances of concrete framework classes along withexplicit representations of their collaborations. Anexemplar should contain at least one concrete subclassfor each abstract class in the framework. This

approach might prove difficult to use for frameworksthat do not conform to good design style issues, such ashaving only a few abstract classes, and using abstract

classes to implement important sites for customizationin the framework. It is also unclear how helpful theexemplar approach would be in cases in which the

developer wants to make a modification the frameworkdesigner has not anticipated (i.e. a modification at alocation that is not represented as an abstract class inthe framework).

4. Tutorials: Other work has focused on tutorials created

for users to follow which will presumably guide usersthrough the most important points of the framework.(Two examples are [47] for Unidraw and [15] forET++.) An interesting example of work in this area isRosson et al.’s tutorial for learning Smalltalk [34]which applies Minimalist instruction techniques [9]and seems to corroborate the benefits that may resultfrom a well-designed tutorial course (claiming to allownew users to develop code for interactive applicationsafter only four hours). Like Johnson, Rosson

undertakes some testing which is aimed not at testinghypotheses but at helping to debug the documentation.However, no study has been undertaken to examine thebreadth of knowledge achieved, although this becomesan exceptionally pertinent question when the goal isradical decreases in learning time for a constantbreadth of knowledge.An important weakness which is shared by all of theseapproaches is that they assume that the frameworkdeveloper will be able to anticipate future uses of theframework adequately to provide enough patterns (orexemplars, or tutorial lessons) in sufficient detail. Analternate research approach would be to avoid making anysuch assumptions about framework usage in order toundertake an empirical study of how developers go aboutperforming the necessary tasks. Such an approach is notnew, and has in fact proven useful in understanding howdevelopers perform related tasks such as understandingcode [48] or performing maintenance [43]. Similar methodscan be used to study the process of learning frameworks,since white-box framework understanding is a specializedkind of program understanding. A framework can be

thought of as a set of object classes that collaborate to carryout a set of responsibilities; the developer needs to gain anunderstanding of what the various classes are and thefunctionality they provide. Since classes cannot be reusedin isolation, it is also necessary to understand how theseclasses interact with each other. Understanding the

dependencies between the components is a difficult taskand the source of many complaints about frameworkcomplexity [24].

Example applications play a key role in the documentationof frameworks, by showing what the framework is good forand pointing out features that the framework provides.However, examples do not explicitly show how thesefeatures are provided [23]. The problem remains that it isdifficult to understand the interactions between objectsusing source code.

Accepted for publication by IEEE Transactions on Software Engineering.Other authors [11] who have applied an empirical approachto studying framework usage in industrial environmentsagree that, in most cases, framework customization will bemore complex than just making modifications at a limitednumber of predefined spots. Documentation that assumesthis is possible will be too constraining and will not providesupport for many realistic development problems, which farfrom requiring isolated changes may sometimes evenrequire changes to the underlying framework architecture.Our study belongs in this category of empirical study ofpractical framework use. It is similar in type to the studyundertaken by Schneider and Repenning [39], which drawsconclusions about the process of software developmentwith frameworks from fifty application-building effortssupervised by the authors. The large number of projectsfollowed allowed the authors to examine both successfuland unsuccessful projects, and their observation of (andsometime participation in) the process allowed them to bothcharacterize the usual process and to identify some

conditions that contribute to situations where the processbreaks down and leads to unsuccessful projects. Our resultscomplement and in some cases extend the results from theSchneider and Repenning study, and we consequentlydiscuss them in greater detail later in section VIII.E.

IV. DEFINITIONS

We include the following definitions to clarify and illustratesome terminology that we use often in this discussion. It isour hope that these definitions will help to make clear ourmodel of framework usage and to keep certain conceptsdistinct throughout the discussion. For example, we shouldfirst be careful to differentiate the framework developer (thedeveloper(s) who designed and implemented theframework) from the application developer (the

developer(s) who design and implement a new system usingthe framework to provide certain key functionality), who isusually referred to as simply the “developer” in thisdiscussion.

example application: an application which has beenconstructed using the framework. Such examples may becreated by the framework developer to illustrate how toproduce some functionality using the components providedby the framework, or may be a “real-life” application inwhose development process the framework happened to beused.

functionality supported by framework: a particularfunctionality is either provided by an example application,or there is a one-to-one mapping between the functionalityand a framework component at some level of granularity(e.g. subsystem, class, method, ...). An example might be

the behavior of radio buttons (primitive GUI objects wouldbe likely to be supported by a class in a GUI framework) ora linked list (most frameworks provide classes thatencapsulate reusable abstract data types).

functionality provided by examples: an example

application exhibits dynamic behavior which corresponds tothe required functionality. The functionality may beimplemented in one component or a combination ofcomponents which contain some code specific to theexample (i.e. some but not all of the functionality may beinherited from the framework).

functionality supported by object model: the requiredfunctionality can be implemented as part of the systemrepresented by the object model without too much changeto the object model. Obviously the phrase “too muchchange” is unacceptably vague, but we do not yet have auseful way of characterizing the concept of whenadjustment to an object model becomes excessive.Although there are some high-level studies of softwarearchitecture underway, we look upon the effort to provideguidance as to how much adjustment is possible in a givensituation as an important and promising area of futureresearch.

V. INITIAL DESIGN OF FRAMEWORK LEARNING

TECHNIQUESIn the absence of a definitive approach to frameworklearning, we turned to the literature to identify useful

approaches. Our first step was to identify helpful models ofthe framework, that is, helpful ways of thinking about theframework that would highlight the truly important featuresand could be used for finding particular functionality. Themost common description of a framework uses a classhierarchy to describe the functionality supported by theframework and an object model to describe how thedynamic behavior is implemented. Most of the commondescriptions of a framework in the literature (e.g. [28, 46])present a model of the framework similar to this one. Toteach subjects how to use this model, we created a set ofguidelines that could be used to gain an understanding ofthe class hierarchy. The guidelines help developers

understand the functionality provided by the framework byconcentrating on abstract classes in the hierarchy. Thedeveloper is guided first through the broad classes offunctionality, then through deeper and deeper levels ofconcrete classes to find the most specific instantiation. Werefer to this procedure as the Hierarchy-Based (HB)procedure, to emphasize that the underlying model of theframework is the class hierarchy.

Accepted for publication by IEEE Transactions on Software Engineering.

Abstractions of Information:Class hierarchy & dynamic model ORSet of illustrative examplesProvide a detailed procedurefor tracing functionalitythrough codeProcedure:For finding functionality in theframework that can be used inthe new systemUses of Information:Find reusable functionalityProvide a detailed procedurefor identifying relevant functionalityAsk subjects to step through therelevant models, followingguidelines for finding functionalitywhich are tailored to the modeland level of detail

Figure 1: Producing focused, tailored procedures

As an alternative model, we decided to look at the

framework through a set of example applications which,taken together, were meant to illustrate the range offunctionality and behavior provided by the framework.

Although a detailed examination of learning frameworks bymeans of examples has not been undertaken, learning byexample also seemed a promising approach. Sets ofexample applications have been used to document someframeworks (the framework we used came with such a set)and the approach has been recommended for similar typesof activities, such as learning effective techniques for

problem solving [10], or learning how to write programs ina new programming language [26, 34]. It has also beenargued that learning by examples is well-suited for“domains where multiple organizational principles andirregularities in interaction exist” [7], which may be a fairassessment of the large hierarchy of classes in a framework.The framework we used in this study came with a set ofexamples at varying levels of complexity that was

constructed to demonstrate the important concepts of theframework. To help subjects use these examples to learnthe framework we created a set of guidelines that wouldguide exploration through the example set, to particularexamples, to particular objects in the implementation, toparticular lines of code in the object. This procedure isreferred to as the Example-Based (EB) procedure.

Once we had identified suitable models, we constructeddetailed guidelines for identifying functionality that wouldbe relevant to the system being developed. To do this weconcentrated on identifying similarities between the classesthat were specified by subjects in their original object

models, and the classes in the framework. These guidelineswere then also tailored to the models, and integrated withthe procedures for understanding the framework (see figure1). The final guidelines were intended to be step-by-stepprocedures that could be taught to the students and used tofind functionality in the framework.

VI. DESCRIPTION OF THE STUDY

To undertake an exploratory analysis into framework usage,we ran a study as part of a software engineering course atthe University of Maryland. Our class of 43 upper-levelundergraduates and graduate students was divided into 15two- and three-person teams. Teams were chosen randomlyand then examined to make certain that each team metcertain minimum requirements (e.g. no more than one

person on the team with low C++ experience) for the class.Each team was asked to develop an application during thecourse of the semester, going through all stages of the

software lifecycle (interpreting customer requirements intoobject and dynamic models, then implementing the systembased on these models). The application to be developedwas one that would allow a user to edit OMT-notationdiagrams [36]. That is, the user had to be able to

graphically represent the classes and different types ofrelations between them of a system, to be able to performsome operations (e.g. moving, resizing) directly on theserepresentations, and to be able to enter descriptive attributes(class operations, object names, multiplicity of relations,etc.) that would be displayed according to the notationalstandards. The project was to be built on top of the ET++framework [50], which assists the development of GUI-based applications. ET++ provides a hierarchy of over 200classes that provide windowing functionality such as eventhandling, menu bars, dialog boxes, and the like. ET++ wasconsidered an attractive choice for this study because itpossesses an “outstanding design” and is available from thepublic domain [32]. The design of ET++ is of sufficientquality that it was the source of seventeen of the designpatterns in [17].

Before implementation began on the project, the class wasrandomly divided into two groups and each was taught onlyone of the two framework models and its correspondingguidelines for use. During the implementation, we thenmonitored the activities undertaken by the students as muchas possible in order to understand if our learning techniqueswere used, what other learning approaches were applied,and which of these were effective. To do this, we asked the5

Accepted for publication by IEEE Transactions on Software Engineering.students to provide records of the activities they undertookso that we could monitor their progress and the

effectiveness of the techniques, and augmented this withinterviews after the project was completed. (These datacollection mechanisms are discussed below.) Although wefelt our learning techniques made good starting points, wedid not constrain students to use the techniques exactly asthey were taught. We wanted to leave the students theflexibility to modify their development methods if thetechniques were not well suited to a particular part of theimplementation, or if they had personal techniques thatwould work better for them.

Since the analysis was carried out both for individuals andthe teams of which they were part, we were able to treat thestudy as an embedded case study [51]. Over the course ofthe semester, we used a number of different methods tocollect a wide variety of data, each of which we discussbriefly below. Most of our collection methods are

mentioned by Singer and Lethbridge in their discussion ofthe pros and cons of various methods for studying

maintenance activities [43], and we respond to some oftheir comments where appropriate. We hope this studyprovides an additional illustration of their conclusion that,in order to obtain an accurate picture of the work involved,a variety of methods must be used at different points of thedevelopment cycle in order to balance out the advantagesand disadvantages of each method.

1. Questionnaires were used at the beginning (to reportprevious programming experience) and end (to reporteffort spent during the last week of implementation andthe level of completion for each functional requirementfor the project) of the semester. Although the

information reported on the beginning questionnairescould not be verified, the end questionnaires wereverified against the executables submitted for eachteam. The unit of analysis for the beginning

questionnaires was the individual student, while theend questionnaires were filled out for the entire team.Both were mandatory although self-reported, and didnot impact the students’ grades.

2. Exam grades were recorded for certain questions on themidterm that could be used by us to gauge the students’level of understanding of framework concepts. Thesegrades were recorded for the individual students, wereassigned by us after evaluating the students’ responsesfor the level of understanding exhibited, and weremandatory (as they constituted part of the students’grades for the course).

3. Progress reports were to be submitted by each team foreach week of the implementation phase. They

consisted of an estimate of the number of hours workedby the team for the week in implementing the project,and a list of which functional requirements had beenbegun and which completed. As the students were toldthat the progress reports had no bearing on their grades,many teams opted to submit them only sporadically ornot at all. (In some ways, these reports were similar toSinger and Lethbridge’s idea of logbooks, which allowthe developer to record information at certain times

throughout the development process. Singer andLethbridge concentrate on the dangers of making thereport too time-consuming, but we have noticed a quiteopposite phenomenon: if the experimenter makes thereport too minimal, the developer may assume that theinformation to be collected cannot be truly importantand thus make completing the report a very lowpriority.)

4. Problem reports were requests for clarification or forhelp with ET++ that the students submitted (via email)to the course instructors. A record was kept of thegeneral subject of each request, and by which team ithad been submitted. In this way we hoped to maintaina record of the kinds of difficulties encountered byteams during the course of the project. Problem reportswere obviously not mandatory and had no effect onstudent grades, but were a resource that the studentsknew could be made use of at their discretion. (Singerand Lethbridge focus on the inaccuracies of

retrospective reports, but our problem reports wereactually an excellent way to get an accurate picture ofwhere teams were having problems at the time theywere having them - which may be, admittedly, uniqueto the classroom environment.)

5. Implementation score was assigned by us to each teamat the end of the semester. Projects were graded byassessing how well the submitted system met each ofthe original functional requirements (on a 6 point scalebased upon the suggested scale for reporting run-timedefects in the NASA Software Engineering Laboratory[45]: “required functionality missing”, “program stopswhen functionality invoked”, “functionality cannot beused”, “functionality can only partly be used”, “minoror cosmetic deviation”, “functionality works well”).The score for each functional requirement was thenweighted by the subjective importance of the

requirement (assigned by us) and used to compute animplementation score that reflects the usefulness andreliability of the delivered system. The weights werechosen in such a way that if each functionality workedwell, an implementation score of 100 would beobtained. Scores less than 100 provided a rough

indication of what percentage of system functionalityhad been implemented. (Because extra credit wasawarded in rare instances that functionality beyondwhat was required was implemented, it was alsopossible for implementation scores to be slightlygreater than 100.)

6. Final reports were collected from each team at the endof the semester. These reports consisted of

documentation for the submitted system (object modelsand use cases) as well as records of the activitiesundertaken while implementing the project (objectmodels of examples that had been studied, lists ofclasses that had been examined for some

functionalities). Additionally, in-class presentationswere given by each team in which they could presentinteresting details of the functionality available in theirsystem, their experiences and difficulties with thetechniques they used, and/or their general approach to

Accepted for publication by IEEE Transactions on Software Engineering.implementation. The completeness of the final reportscounted toward each team’s grade, although theirconformance to any particular technique did not.

7. Self-assessments were mandatory ratings in which eachstudent was asked to rate the effectiveness of eachmember of his or her team (including him- or herself)as well as the team performance as a whole. Partly thiswas to detect if every team member had done theirshare of the work, and partly it was to ask students tothink about what they had done rightly and wronglyduring the course of the implementation. Although itwas mandatory that each student return a self-assessment, they did not count directly toward thestudent grade (although in some cases, evidence fromthe self-assessments and the interviews led toindividual grades being slightly adjusted).

8. Interviews were mandatory “debriefing” sessions at theend of the semester. Each team would come as a groupto the course instructors, to be asked questions aboutwhat kinds of activities they did during the course ofthe semester, which of these they found particularlyuseful or useless, and what parts of the project wereeasiest and hardest. A set of base questions was askedin every interview, although additional questions wereconducted in a dynamic manner. That is, the course ofthe interview was directed in new directions by us asunforeseen but interesting themes were raised.Table 1 summarizes the types of data we collected alongwith the collection methods we used. In the interest ofspace, we do not present the actual data in this paper, butmaintain a website that contains them in a form suitable fordownloading [42].

VII. ANALYSIS

We then analyzed this mix of qualitative and quantitativedata to gain some insight into what was going on withineach team. By comparing and contrasting teams, we beganto see implications that addressed our research questions.Since there has not yet been a large amount of work spenton understanding this area of framework use, our focus wason using this information to look for tentative butreasonable hypotheses and not on testing knownhypotheses. The process of building theories fromempirical research has been first proposed in the socialscience literature [13, 19] but it is also followed in thesoftware engineering discipline [40].

A. Development Processes

The analysis approach we used was primarily a mix ofqualitative and quantitative, in order to understand in detailthe development strategies our subjects undertook. Ourfirst step was to get an overview of what developmentprocesses teams had used. (By “development processes”we mean how the team had been organized, what

techniques they had used to understand the framework andimplement the functionality, whether they based their

implementation on an example or started from scratch, andwhat tools they had used to support their techniques.) To

this end, we performed a qualitative analysis of theexplanations given by members of the teams during theinterviews and final reports, and on the self-assessments.We first focused on understanding what went on withineach of the teams during the implementation of the project.We identified important concepts by classifying thesubjects’ comments under progressively more abstract

themes, then looked for themes that might be related to oneanother (an example is given below). Once we felt we hada good understanding of what teams did, we made

comparisons across groups to begin to hypothesize what therelevant variables were in general. This allowed us to lookfor variations in team effectiveness that might be the resultof differences in those key variables, as well as to rule outconfounding factors.

In order to illustrate this analysis technique better, let usconsider a small example. While trying to categorize thetypes of remarks students made during the final interviews,we noticed a lot of comments (some made spontaneously,some with prompting by the interviewers) concerning howteams spent most of their time during the implementationphase of the project. We grouped some of these remarksinto a general category that deals with what kinds ofactivities students found useful in implementation;combined with other categories (which deal with, for

example, what kinds of tools students found useful or whatkinds of examples were helpful to examine) we began to geta better idea of what techniques students developed to helpthem in implementation. To get an accurate picture of whatteams did over the entire course of the semester, however,we needed to look at other categories of remarks,

concerning for example which of the two initial procedures(HB or EB) the team was taught, what their experienceswere with the technique, and what parts of the techniquewere found not to be useful and were discarded. By makingsuch abstractions from the students’ comments, we built anunderstanding of what each separate team did.

With this understanding of the processes at work withinteams, we compared experiences across teams to try toidentify themes that emerge. For example, we noticed thatmost teams who began by modifying an example tended todo better than teams who began implementing from scratch.Our next step of the analysis was to test these provisionalhypotheses, again by making comparisons across teams tofind possible refuting evidence or confounding factors. Forexample, suppose Team X started their implementationfrom an example but turned in a very poor implementationin the end. Does this refute our provisional hypothesis?Perhaps it signifies that the trend we thought we hadobserved was simply a fluke, or perhaps we may notice aconfounding factor - say, Team X was also very poorlyorganized - that may account for the seemingly anomalousresults and needs to be taken into account as part of theanalysis.

Accepted for publication by IEEE Transactions on Software Engineering.

Aspect of InterestDevelopmentProcesses

MeasuresTechniques usedTools used

Team organizationStarting point forimplementation

Difficulties encounteredwith technique

Form of DataQualitativeQualitativeQualitativeQualitativeQualitativeQuantitativeQuantitativeQuantitativeQuantitative

Unit of Analysisteamteamteamteamteamteamteamindividualindividual

ProductOther FactorsInfluencingEffectiveness

Degree of

implementation foreach functionalityEffort

Level of understandingof technique taughtPrevious experience

CollectionMethods

interviews, finalreportsinterviews

interviews, self-assessmentsinterviews, finalreports

problem reports,self assessments,final reportsimplementationscore, finalreports

progress reports,questionnairesexam gradesquestionnaires

Table 1: Types of measurements and means for collecting.

Category

Number ofTeams

EBEB/HB

ad hoc EB4

EB/scratch1

Students in this category used the EB technique as it wastaught, following the guidelines as closely as they could.

0, 5This is a hybrid approach that focuses on using examples to

identify classes important in the implementation of a

particular functionality. The main difference from the EBtechnique is that, in the hybrid technique, the student does notalways begin tracing the functionality through the code, butmay instead use the example to suggest important classes andthen return to the framework hierarchy to focus on learningrelated classes.

2, 2This was an ad hoc approach that emphasized the importance

of learning the framework via examples, but ignored thedetailed guidelines given. The primary difference betweenthese techniques and the EB category is that ad hoc EB

techniques are missing a consistent mechanism for selectingexamples and tracing code through them.

1, 0The team used the EB technique to identify basic structure

and useful classes, but implemented the functionality mostlyfrom scratch.

Table 2: Description of development processes observed in the study.

NumberOriginallyTaught EB,HB5, 0

Description

Accepted for publication by IEEE Transactions on Software Engineering.Although this is not a common method of analysis in

computer science, it is a recommended approach for socialsciences and other fields that require the analysis of humanbehavior [13, 29]. It is well suited for our purposes herebecause our variables of interest are heavily influenced byhuman behavior and because we are not attempting to provehypotheses about framework usage, but rather to beginformulating hypotheses about this process, about which wecurrently know little.

We found that teams used development processes that fellinto 1 of 4 categories (Table 2). While no team used theHB technique for the entire semester (although these

guidelines were partly incorporated into some of the othertechniques that were used), the EB technique did enjoyconsistent use, both as taught and in combination with othertechniques. It can be noted that even teams who weretaught HB and not exposed to EB tended to reinvent atechnique similar to EB on their own. (Some teams wereeven contrite about this. “We didn’t realize at the time thatthis was the technique taught to the other part of the class,but it seemed the natural thing to do.”)

B. Potentially Confounding Factors

We also undertook quantitative analyses to gauge the

effects of potentially confounding factors in the study. Dueto the small sample size and the exploratory nature of thisstudy, we used an α-level of 0.20 for the statistical testsreported in this paper, which is higher than standard levels.Although not common, this α-level has been used in similarhypothesis-building studies, e.g. [2]. We realize thatstatistical tests at this significance level do not providestrong evidence of a relationship, but instead see theircontribution as helping detect patterns in the data that canbe specifically tested in future studies. The relatively highα-level makes it more likely that we err in the direction offinding false correlations than of inadvertently missingsignificant relationships. However, the low number of datapoints in this study means that the second case is still apossibility.

We noticed that teams that had been taught the HB

technique experienced many early difficulties, particularlyin tracking flow of control in the framework, and in findingand correctly parameterizing reusable classes from thehierarchy. We wanted to test if the progress of these teamshad been adversely affected by the reading technique. Wetherefore undertook an analysis of the number of person-hours spent by different groups in order to understand if theamount of effort a team spent on implementation was

largely dependent upon the technique they had been taught.The test for differences in the two groups could not beconclusive, as only six teams reported their overall effortdata. The analysis did, however, show no significantdifference between the average amount of effort spent onthe project over the course of the semester by teams whohad been taught each of the different techniques (t-value of0.5916 and p-value of 0.5859). Student remarks from theinterviews tended to support this. Most teams who had

been taught the HB technique reported that they switchedtheir development approach usually after trying to applyHB for the first 1 to 2 weeks. Regardless of the technique ateam had been taught, the heaviest investments of effort foralmost all teams came toward the end of theimplementation phase.

We also wished to examine whether students actually had adifferent initial understanding of the framework based onthe models and procedures we had taught them. We gaugedtheir initial understanding by means of two questions thatappeared on their midterm, after they had been taught oneor the other of the framework models, but before they hadactually had any experience using the ET++ framework.The first question was intended to measure how well thestudents grasped the concept of the framework hierarchy ofclasses (Table 3 shows the distribution of grades on thisquestion by procedure taught). The second question

measured understanding of the model of interaction (Table4). We tested whether response rates were independent ofthe procedure taught by using Wilcoxon rank sum tests.The results of neither test were significant (Z = -0.6933, p =0.4881 and Z = 0.6343, p = 0.5259 for the first and secondquestions, respectively) showing that there is no differencein how well the questions are answered with respect to thetechnique taught. This shows that, although taught differentprocedures for using the framework, neither group ofsubjects started out at a disadvantage to the other in termsof their understanding of the framework itself.

Grade AchievedABCDTechnique

7DXJKW(%+%󰀘󰀜󰀘󰀖󰀘󰀗󰀙󰀙Table 3: Distribution of grades on exam questiondealing with the framework class hierarchy.

Grade AchievedABCDTechnique

7DXJKW(%+%󰀔󰀓󰀙󰀖󰀚󰀕󰀗󰀙󰀘Table 4: Distribution of grades on exam questiondealing with the framework model of interaction.A final concern in all studies of this type has to do with theexperience of the subjects. We were concerned that theeffectiveness of our teams might have more to do with thelevel of experience the team members had with

implementing similar projects than with any of the variablesunder study in our experiment. We removed one team(which had had extreme organizational difficulties) fromour analysis as an extreme outlier (as defined by [31]). Wethen used the Pearson correlation coefficient [21] tomeasure the strength of the linear relationship betweenexperience and implementation score (with correlationvalues close to 1 or -1 representing an exact linear

relationship and values close to zero representing no linear

Accepted for publication by IEEE Transactions on Software Engineering.relationship). We found no correlation between the totalamount of experience of a team with programming in anacademic environment and its effectiveness at theimplementation of the project (Pearson’s correlationcoefficient of -0.0015), but did uncover a correlationbetween total experience programming in industry andeffectiveness at implementation (Pearson’s correlationcoefficient of 0.55, Figure 2). However, an Rsquare valueof 0.31 for the model means that industrial experienceaccounted for only 31% of the observed variation in theimplementation score. This low correlation implies that theprevious experience of our subjects is not sufficient to

explain the observed results, and thus that the way in whichthis implementation was undertaken has affected the qualityof implementation.110100er90ocS no80itatnem70elpm60I5040

-1

01234

Team Industrial Experience

Figure 2: The total industrial experience of a team (inyears) and its correlation with its effectiveness atimplementation of the project (the team with

implementation score of 44 has been removed fromanalysis as an outlier).

VIII. RESULTS

In this section we present the hypotheses that result fromour study of framework usage. Along with each we presentthe relevant indications from our study, so that the readermay judge the strength of the evidence which supportsthese hypotheses.

A. Hypotheses About Our Models of the Framework

In order to understand the process of learning a frameworkby means of examples, we can generalize not only from theEB procedure itself but from its example-based derivativesas well. From Table 1, it can be seen that all students whowere taught EB ended up using this technique (or a closederivative) throughout the implementation phase. Perhaps

more importantly, all students who were taught the othertechnique ended up employing a less rigorous example-based technique on their own. It seemed, therefore, that notonly our EB technique, but example-based learning ingeneral, was a natural way to approach learning such acomplicated new system. This leads us to formulate:HYPOTHESIS 1: Example-based techniques are well-suitedto use by beginning learners.

In contrast, subjects tried to use the HB procedure buteventually abandoned it. Qualitative analysis wasnecessary to understand why this happened. What waswrong with the HB procedure that made it not useful in thisenvironment? We analyzed the student remarks from theproblem reports, self-assessments, and final reports to see ifcharacteristic difficulties had been reported. A commontheme (remarked upon by half of the teams who had beentaught HB) was that the technique gave subjects no ideawhich piece of functionality provided the best starting placefor implementation, or where in the massive frameworkhierarchy to begin looking for such functionality.

Teams also registered complaints about the time-consumingnature of the HB technique - especially compared to anexample-based approach where implementation can beginmuch more rapidly, hierarchy-focused approaches seem torequire a much larger investment of effort before anypayoff is observed. One team pointed out that they hadexplicitly compared their progress against other (as ithappened, Example-Based) teams: “We talked to othergroups, and they seemed to be getting done faster withexamples. So after the first week we started going toexamples, too.”

Despite these difficulties, students reported that they feltthat the HB technique would have been very effective ifthey had had both sufficient documentation to support itand more time to use it. “The Hierarchy-Based procedurewould be helpful if you have the time,” said one group inthe final interviews, “but on a tight schedule it doesn’t helpat all.” Another opinion was expressed by the group whosaid, “It’s the technique I normally use anyway - and itwould have been especially good here when the examplesare not enough for implementing the functionality.” Thereseemed to be a consensus that it would have allowed themto escape from the limitations of the example-basedapproach and engage in greater customization of the

resulting system, but simply wasn’t effective in the currentenvironment. Five teams were able to create effectivestrategies that were hybrids of the hierarchy-based andexample-based methods (EB/HB). However, the lack ofguidance as to how to get started, and the time required tolearn the necessary information to use it effectively, meantthat no development teams used it exclusively for asignificant portion of the implementation phase. (By nomeans was this a completely negative development, as wenow have more detail on techniques that minimize thatcrucial learning curve.)

Accepted for publication by IEEE Transactions on Software Engineering.HYPOTHESIS 2: A hierarchy-focused technique is notwell-suited to use by beginners under a tight schedule.B. Practical Implications for Using an Example-BasedProcedure

The analysis in the last section should not be taken to implythat our EB procedure was without problems. We alsoundertook a qualitative analysis of subject satisfaction withthe Example-Based procedure in order to understand whereit could be strengthened for future use. We found that therewere also characteristic problems with EB that wereencountered by beginners.

Although the teams using this technique usually managed toget the functionality working in the end, almost all (4 out of5) of the groups in the study who used our EB techniquereported difficulties in finding the necessary functionalitywithin the example set. The problems with finding it in thefirst place seemed especially acute when the functionalityneeded was a very small part of a much larger example(e.g., a characteristic way of displaying items onscreen, orthe dialog boxes discussed under “key functionalities”,below). This indicates a problem with our EB technique –further guidance is required to assist developers in findingand extracting small pieces of functionality embedded

within larger examples. As there is currently little guidanceavailable in this regard, more work will be have to be donein this area to enable effective example-based techniques.This study also provides indications that characteristics ofthe example set can influence the performance of anexample-based approach as well. One-fourth of all theteams in our study had trouble making use of the examplesbecause the example set provided did not conform to aconsistent organization or structure. Some examples werebased on Model-View-Controller interaction [20] whileothers were not constrained by any such separation of

functionality, and different examples seemed to achieve thesame functionality by using different classes from theframework hierarchy. As others [35] have pointed out,learning how to implement functionality from existingapplications is difficult because the rationales for designchoices, which explain why the finished implementationlooks the way it does, are usually not included in thedocumentation. When attempting to reuse functionalityfrom existing applications, developers are implicitly askedto reconstruct the choices that led to the finished

implementations they are studying. This situation can

actually be made worse in a framework-based environment,where effective reuse requires the developer to understandthe rationales behind a number of applications, not just one.This is a problem that will have to be addressed by anyexample-based technique.

HYPOTHESIS 3: The effectiveness of a technique foradapting framework-based applications depends onbreadth of functionality and other characteristics of theexisting applications

This hypothesis is very high-level, and further work isrequired to understand better the effect of characteristics ofthe example set. However, hypotheses 4 through 6 in thenext section are concrete applications that may demonstratethe usefulness of this hypothesis.

C. Hypotheses About the Level of Specificity in theProcedures

This study provides us with an excellent opportunity tounderstand whether the procedure we created was at auseful level of specificity. Because some teams followedthe procedure exactly while others followed it only to acertain extent, we can distinguish between the followingtwo types of example-based processes:

• Strictly adopted: The EB technique is considered

“strictly adopted” when followed in a step-by-stepfashion. This procedure focuses entirely on guiding thedeveloper to find useful functionality in the exampleapplications, which can then be tailored to the currentsystem.

• Ad hoc adopted: The EB/HB, ad hoc EB, and

EB/scratch techniques are considered to fall in thisgeneral category. Subjects who used these techniquesare still considered to have adopted the basic approachbehind the EB technique (viz. learning from examples).However, the subjects have augmented this basicapproach to a greater or lesser degree with othertechniques that were found to be effective.

We can compare the experiences of strictly adopted to adhoc adopted teams to understand if our technique, at itscurrent level of detail, is useful.

To perform this analysis, we focused on certain key

functionalities, that is, certain requirements for which therewas a large degree of variation between teams in terms ofthe quality of the implementation. Recall that we had

graded each required piece of functionality for each projecton a 6-point scale. To select key functionalities we lookedfor functionalities for which there was enough variationamong all 15 of the teams on the rating scale (regardless ofwhat type of technique they used) that teams could be easilydivided into at least two groups based on their score. Wethen analyzed whether the level of detail at which theprocedure was followed had any correlation with whetherthe team was able to implement a particular functionality ina more complete or sophisticated way.

To back up this qualitative analysis, we attempted to usestatistical tests to verify the correlation. Since both of thevariables in this analysis (technique followed and resultfrom implementation) are on a nominal scale (i.e. expressedas categories) we can organize data into contingency tableswhere each dimension corresponds to a variable andnumbers represent frequencies. We want to test whetherthe proportion of teams in each type of implementationvaries due to the type of technique that was used.

Specifically, the null hypothesis is that the proportion ofteams who achieved each type of implementation is

independent of the technique that was used to perform the

Accepted for publication by IEEE Transactions on Software Engineering.implementation. In order to test this difference we applythe chi-square test of probability1. Due to our small samplesizes and the exploratory nature of this study, we again usedan α-level of 0.20. We also present the product momentcorrelation coefficient, r, as a measure of the effect size[25]. (An r-value of 0 would show no correlation betweenthe variables, whereas a value of 1 shows a perfectcorrelation.) We realize that these tests do not providestrong statistical evidence of any relationship, but insteadsee their contribution as helping detect patterns in the datathat can be specifically tested in future studies.

We identified 4 such key functionalities: links, dialog

boxes, deletion, and multiple views. They illustrate 3 typesof situations that may arise when functionality is beingsought in examples:

1. The examples don’t provide all of the functionality

desired. Key functionality 1 (links) fits into thiscategory. OMT notation requires that links be drawnbetween classes in the model which interact in someway with each other. The specifications for our OMTeditor stated certain other specifications for handlinglinks, such as how they should be entered into thediagram, and that they should automatically updatewhen their associated classes are moved. Providedwith the example set was the “er” entity-relationdiagram editor which provided similar functionalitythat allowed two linked objects to be represented bymeans of a line that connected their centers. Althoughuseful as a starting point, this implementation was notsophisticated enough for the project, because the sametwo classes in an OMT diagram may be connected bymultiple links. If these are all represented by linesdrawn from center to center, every link between theseclasses will overlap. There is no example providedwith ET++ that shows functionality that explicitlyaddresses this concern.

About half of the teams implemented the link

functionality as in the er example. Of the teams thatimplemented a more sophisticated version that allowedlinks to be uniquely represented (i.e. multiple linksbetween two classes do not overlap), there were anumber of solutions: randomly displacing links by asmall offset, recording the point where the user drewthe link across the boundary of the class, and breakingthe line down into a series of line segments which maythen be individually positioned.

Almost all (4/5) of the teams who used the EB

technique implemented the less sophisticated versionof the functionality found in the er example. Bycomparison, less than half (4/10) of the teams whoused a modified version of EB turned in the less

sophisticated implementation (Table 5). A chi-squaretest of independence was undertaken to test whether 1

We base our use of the chi-square test, rather than theadjusted chi-square test, on [12], which argues that theadjusted test “tends to be overly conservative.”

the level of sophistication was dependent on the type oftechnique used, although we recognize that the smallnumber of data points involved can lead to someinaccuracies in the results [31]. The test resulted in ap-value of 0.143 (χ2 = 2.143), which is statisticallysignificant at the selected α-level. An r-value of 0.38confirms that this shows a moderate correlation

between level of sophistication and type of technique[21]. From this example we hypothesize that:

HYPOTHESIS 4: A detailed Example-Based procedurecan cause developers to not go beyond the functionalitythat is to be found in the example set.

Technique Followedstrictlyad hoc5HVXOW󰀈RI

adapt.Implementation6RSKLVWLFDWHG6LPSOH󰀔adapt.󰀗󰀙󰀗7DEOH󰀅󰀘versus simple implementation of links, whether using: Number of teams achieving sophisticatedstrictly or ad hoc adopted techniques.

2. The functionality was completely contained in

(perhaps multiple) examples. Key functionalities 2(dialog boxes)and 3 (deletion) provide evidence thatthe EB technique performed about the same as thevariant techniques in this case.

For dialog boxes, examples existed which showed howto create dialog boxes containing graphical devices(e.g. text fields, radio buttons) and how to use them todisplay and store information. The difficulty was thatthis functionality was spread piecemeal over multipleexamples and students had a hard time finding andintegrating all of the functionality they needed. Abouthalf of the class (7/15) managed to get the dialog boxfunctionality working correctly and interfaced with therest of the system (Table 6). All techniques were

distributed roughly equally between the teams who didand did not get this functionality working correctly.The chi-square test here yielded a p-value of 0.714 (χ2= 0.134), for which the related r-value is 0.10. Thisconfirms that response levels are very likely effectivelyequal between the two categories.

For deletion, there was at least one example that clearlycontained functionality to delete classes and links froma diagram. All teams were able to implement thisfunctionality. Getting the functionality to support theability to undo or redo a deletion was apparently morechallenging, however, although the examples coveredthis as well. Partly, this may have been due to studentssimply forgetting to implement this part of the

functionality since it was not explicitly mentioned inthe requirements (although adequate testing shouldhave revealed the need to interface correctly with thisstandard ET++ functionality!).

Accepted for publication by IEEE Transactions on Software Engineering.

Teams basically implemented deletion in one of three

ways:

1. The ability to undo or redo was integrated fully withdeletion.

2. The ability to undo or redo was not implemented fordeletion, but deletion was implemented in such a waythat a later call to undo or redo would not cause a coredump (this minimally satisfied the requirements for theproject, in letter if not in spirit!).

3. The ability to undo or redo was not implemented atall for deletion; deletion could be used but a later use ofundo or redo could cause a core dump if the programtried to manipulate an object that had been deleted.Again, the different techniques for learning

frameworks seem pretty equally distributed amongthese three categories (Table 7). The chi-square testfor this functionality yielded a p-value of 1 (χ2 =0.000), with an associated r-value of 0, which is notsignificant and indicates that response rates are exactlyequal regardless of the type of technique used.From these two examples we hypothesize

HYPOTHESIS 5: When the functionality sought iscontained in the example set, Example-Based

techniques will perform about the same, regardless ofthe level of detail provided.

Technique Followedstrictlyad hocadapt.adapt.Result of

,PSOHPHQWDWLRQ)XOO\\󰀈FRUUHFW,QFRPSOHWH󰀕󰀖󰀘󰀘Table 6: Number of teams who achieved fully correctversus incomplete implementations of dialog boxes,whether using strictly or ad hoc adopted techniques.

Technique Followedstrictlyad hocadapt.adapt.Result of

,PSOHPHQWDWLRQ

,PSO󰀑󰀈󰀔,PSO󰀑󰀈󰀕󰀔,PSO󰀑󰀈󰀖󰀕󰀕󰀕󰀗󰀗Table 7: Number of teams achieving each of the threelevels of implementation for deletion, whether usingstrictly or ad hoc adopted techniques.

3. The examples provide a more sophisticated

implementation than is required. Key functionality4 (views) fits here. The requirements for the OMTeditor stated that the program must provide multipleviews of the currently opened document. Examplesexisted which satisfied the project’s requirements aboutviews (that they update automatically, beindependently scrollable, and allow resizing).

However, there were also examples that gave an evenmore sophisticated implementation that allowed viewsto be dynamically added and deleted.

All but three teams chose the more sophisticated

implementation. Two of these three teams turning inless functionality used the EB technique (Table 8). Thechi-square test resulted in a p-value of 0.171 (χ2

=1.875), which is significant at the selected α-level. Anr-value of 0.35 shows a moderate correlation betweenthe variables and allows us to hypothesize:

HYPOTHESIS 6: When the example set contains

functionality beyond what is required for the system, asufficiently detailed Example-Based procedure canhelp focus developers on just what is necessary.

Technique Followedstrictlyad hocadapt.adapt.Result of

,PSOHPHQWDWLRQ6RSKLVWLFDWHG

6LPSOH󰀖󰀕󰀜󰀔Table 8: Number of teams achieving sophisticatedversus simple implementation of views, whether usingstrictly or ad hoc adopted techniques.

The practical consequences of this analysis may be that thelevel of detail that is appropriate in an Example-Basedprocedure is strongly dependent on the breadth of theexample set provided. We hypothesize that the more

detailed a procedure is, the more it focuses the developer onusing only functionality provided by the examples. Thismay be because developers become too reliant on theexamples and do not understand the system at a sufficientlevel of detail to implement effectively from scratch whennecessary. Alternatively, it may be that integrating

functionality written from scratch becomes more and moredifficult when more and more of the system is taken fromexamples that someone else has written.

Of course, there are other benefits to providing additionaldetail in such procedures. Among the most important maybe packaging experience to guide new developers. Forexample, in our study we noticed that half of the teams whoused a variant of the EB procedure wasted time and effortduring the course of the implementation phase by having tore-implement some functionality that they had implementedpreviously in a short-sighted way. Since only 1 of the 5teams who used our EB procedure exactly reported thesame difficulty, we feel that a definite benefit of theprocedure was that it helped guide our subjects to focusmore on features which were important to the

implementation as a whole, rather than just the portion theyhappened to be working on at a particular time.

Accepted for publication by IEEE Transactions on Software Engineering.D. Hypotheses About Beginning Implementation

Because we did not constrain the students as to how theywent about implementing the functionality, we could alsostudy whether the way in which the implementation wasbegun had an effect on the rest of the implementationprocess. Some teams decided to start from scratch,

beginning with no functionality on top of the frameworkand slowly growing the code as they determined how toimplement specific functionalities. Other teams chose tostart with an example which seemed to contain some of thefunctionality they would need in the finished system. Theythen modified the functionality that was there and addedwhatever else was required to produce the finished system.We wanted to see if one approach tended to get morefunctionality working more reliably (as measured by theteam’s implementation score).

All teams who started from an example chose the same one,a simple entity-relation diagram editor (known as “er”). Itwas similar to the OMT editor to be developed, but muchsimpler: as specified for the OMT editor, the ER diagrameditor allowed simple shapes to be added to an editabledocument, and allowed these shapes to be selected, moved,and associated with one another. However, more

sophisticated functionality, such as entry of attributes viadialog boxes and the maintenance of separate regionswithin a particular shape, was lacking.

110100er90ocs no80itatnem70elpm60i5040

example

scratch

starting from...

Figure 3: Team implementation scores for teamsstarting from scratch versus teams starting by

modifying the “er” example. 90%, 75%, 50%, 25%,and 10% quantiles are shown. The team with

implementation score of 44 has been removed fromanalysis as an outlier.

We used a t-test to determine whether teams starting fromthe “er” example tended to perform significantly better thanteams starting from scratch (Figure 3). One point,representing a team which experienced severe

organizational difficulties which were primarily responsiblefor a very low implementation score, must be removed fromthis analysis as an extreme outlier (according to thedefinition given in [31]).

The test yielded a p-value of 0.15 (t = 1.538), which issignificant at the 0.20-level and provides some evidencethat teams who started by modifying an example tended tobe more effective than those starting from scratch. Thisindicates that, overall, the benefits of relying on an existingexample as a starting point (which may include being ableto exploit an existing file structure and to model new

classes on similar ones which already exist in the example)outweigh the negatives (the extra work involved in

identifying relevant functionality and removing irrelevantcode).

HYPOTHESIS 7: For implementing a set of requirements ina framework-based environment, if a suitable exampleapplication can be found, then adapting that application isa more effective strategy than starting from scratch.It was especially noteworthy that every one of the teamswho started from an example used the “er” editor as astarting point. There was, in fact, a second exampleapplication which could conceivably have served as astarting point also. This second important exampleapplication was a small drawing editor that seemed toprovide most of the functionality needed for the OMTeditor, as well as much extraneous functionality. It is veryinteresting that no teams chose the drawing editor as astarting point. Of the teams who commented on this

decision during the final interviews, most seemed to agreewith the reasoning that the drawing editor just had too much“extra functionality and complicated code that we do notneed”, and that the large amount of extraneous functionalitywould be too confusing to enable the code responsible forthe functionalities of interest to be easily separated out. Ifwe had been able to compare the performance of teams whohad started from each of the examples, we might have beenable to draw useful conclusions about what attributes of anexample make it a better or worse starting point for

implementation. This is a promising direction for futurework.

E. Hypotheses About the Importance of the Object ModelFrom the qualitative analysis of interviews and self-assessments, there were some general indications about theimportance of the object model:

• 6 of our development teams mentioned – without being

prompted by us - the importance of their object modelsas guides to implementation. 3 of these teams reportedthat they had been able to stay fairly close to theiroriginal object model of the system during the courseof implementation. All of these teams ranked in the

Accepted for publication by IEEE Transactions on Software Engineering.

top half of the class with regards to implementationscore. The remaining 3 teams were reporting

problems; they had strayed from their original modelduring implementation. It seems that this inability tofollow the model had some negative effects, as all 3were ranked in the bottom half of the class. (Since 5 ofthese 6 teams had all received the same grade on theoriginal model, it seems unlikely that the variation inperformance could have been caused by factors outsidethe implementation phase, such as the quality of themodel itself.)•

2 of the 3 teams who were not able to follow theirobject model also reported difficulties due to theinconsistent nature of the examples.

Throughout this study we have observed the central

importance of the object models in development. Althoughwe assume that object models are important and necessaryparts of any software development effort, their interactionwith framework development seems even more noteworthy.When using frameworks, there is the important question ofwhether to modify the object model of the system, so as toexploit a piece of functionality offered by the frameworkthat might not exactly fit the original plan, or to keep theobject model “as is”, even if it makes implementing theapplication on top of the framework harder.The general indications from the interviews and self-assessments leads us to:

HYPOTHESIS 8: Overuse of framework functionality canlead to negative effects (e.g., rework) which mightovercome the positive ones (e.g., short cycle time).It seems that one cause of trouble may have been that theseteams did not understand the examples well enough toadapt the functionality illustrated by the examples to thesystem being developed. Teams who implemented

incorrect functionality may have been simply too willing tomake modifications to their planned system toaccommodate the examples more easily.

Schneider and Repenning [39] present an interesting studyof framework use that comes to similar conclusions. Whatwe describe as straying from the original object model, theyidentify as projects for which “the application designprocess had been driven by features of the framework,” acondition which is described as being present in somedevelopment efforts which “ended up with really messydesigns, or were cancelled.” As we noticed through

observation of our teams, they identify the likelihood that ateam will have to reimplement some functionality as one ofthe major negative consequences arising from this situation:“Premature design decisions made during the feature-drivenphase can corrupt application system architecture or requireabandonment of much work.” They further point out thatthe developer’s temptation is to deal with the easierproblem of adapting functionality provided by the

framework first, leaving more difficult functionality (which

is not guaranteed to fit nicely into the framework-baseddevelopment) for later and making breakdowns all the moredevastating when they occur.

Some of the successes of our EB technique may also havecome from its treatment of the object model. We noticed inour analysis of problems that a much higher percentage ofteams using modified versions of EB reported wasted effortdue to incorrect implementations than teams using EB. Weattribute this to the EB technique’s focus on the objectmodel as a guide for implementation, which may not havebeen so evident in ad hoc variants.

These results create an interesting tension with our

experiences with example-based techniques, which seem toimply that reuse of framework functionality is always apositive thing. Taken together, our conclusions indicatethat, within proper bounds, exploiting functionality fromthe framework and example set is the most helpful direction- otherwise, it can be very bad indeed. Schneider [39]

draws a similar conclusion: although overuse of frameworkfunctionality can lead to negative effects, as described

above, exploitation of the low-level framework features is asensible trend that presumably pays off, when used withinbounds. Discerning just where these bounds may lie is acrucial question that awaits further study; for now, weconclude only that development difficulties are liable toappear unless the object model of the system is wellsupported by the framework and its example set (i.e., theframework permits an implementation of the system thatdoes not require major deviations from the object model).

IX. ANSWERS TO RESEARCH QUESTIONS

In this section, we relate our observations to our specificresearch questions for the study.

A. Can strategies for learning frameworks be identified?Hypotheses 1 and 2 address this question.

The example-based technique was identified as an effectivestrategy in our environment. While we cannot say in allcases that an example-based learning approach would besuperior to one based on the class hierarchy and model ofinteraction, the indication of this study was that for noviceusers, the examples were a more effective way to learn.Within the category of example-based techniques, wefurther differentiate “strictly adopted” from “ad hoc

adopted.” Although these techniques share many commonfeatures, our analysis of their use in practice discoveredcertain characteristic differences. This leads us to classifythem as separate strategies.

Although the hierarchy-based approach cannot be deemedeffective in our environment, as it was abandoned and notused to implement any of the semester projects, we cannotassume that a hierarchy-based approach is always inferior.15

Accepted for publication by IEEE Transactions on Software Engineering.The most important environmental condition appeared to bethe subjects’ familiarity with the particular framework

being used. Several of our subjects had recognized benefitsof the HB technique but were unable to apply it due to theirlack of familiarity. They expressed this in commentsduring the interviews such as: “The HB procedure wasmore similar to what I normally do, but…” or “I found theexamples limiting in some ways and thought the HB

procedure would address this problem, but…” It is possiblethat, if our subjects had had more experience with theframework, the HB procedure would have proven bettersuited to their needs. Indications are that hierarchy-basedprocedure required more experience with the framework tobe used effectively.

B. What are the characteristics of these learningstrategies?

Since the subjects of our study only had significant

experience with the EB technique, we can report only onthe characteristics of example-based strategies. (Our

observations on this subject were recorded in hypotheses 3through 8.) We identified two main types of example-basedstrategies: strictly adopted and ad hoc adopted, each with itsown strengths and weaknesses. The relative effectivenessof each seems to be most strongly determined by howclosely the object model of the system to be developedcorresponds with the existing applications.

Hypothesis 5 expresses our observation that when thefunctionality called for by the object model is well-contained in the set of existing applications, just about anyexample-based technique should be helpful. However, asillustrated by hypothesis 4, a strictly adopted techniquecan’t take the developer far beyond what is provided by theexisting applications themselves. In a situation in which theset of applications is sparse and does not contain the

necessary functionality, an ad hoc technique may be moreappropriate. As hypothesis 6 indicates, if the set ofapplications is particularly large, then a strict adaptationtechnique may be most helpful. Despite its weaknesses,such a technique in procedural form was shown to guide thedeveloper toward implementing the object model “as is”and away from “gold-plating,” or spending time providingextra features that seem nice but are not necessary.From the experiences of our beginning learners, we alsohave evidence about other characteristics that are requiredby example-based techniques in order to be successful.Hypothesis 3 indicates that future studies need to be

undertaken to determine if we can add better guidance forhelping developers find functionality in existing exampleapplications. Hypotheses 7 and 8, respectively, mayindicate that example-based techniques should guide

developers to begin their implementation from an existingapplication, if a suitable one can be found, and to stayclosely to the original object model once implementationhas begun. (It is possible that, as developers get more

experience with the framework, it may be possible tosynchronize the design of the system more closely to theframework infrastructure from the beginning, therebyminimizing the problem of mismatch between the systemdesign and the framework. Our study did not address thispossibility.)

X. THREATS TO VALIDITY

There are three tests which can be considered to evaluatethe quality of any empirical study: construct validity,internal validity, and external validity [25].

A. Construct Validity

Construct validity aims to assure that the study correctlymeasures the concepts of interest. The main problem is thatvariables never measure only the construct of interest butalso other extraneous sources of variation. One tactic toenhance construct validity is triangulation: the use ofmultiple sources aimed at corroborating the same fact orphenomenon [51, pp.90-94].

In our study we applied data triangulation, by includingmultiple measures for the same aspect of interest anddifferent collection methods for the same measure (Table1).

B. Internal Validity

Internal validity aims to establish correct causal

relationships between variables as distinguished fromspurious relationships. Although a case study cannot havethe same internal validity as a controlled experiment,

because the investigator has little control over events, thereare analysis techniques that can strengthen the internalvalidity, even for exploratory studies like this.

We made inferences using the qualitative analytic techniquedescribed in [13]. It consists of performing a within-caseanalysis, to gain familiarity with each case and find

emerging patterns, followed by cross-case analysis, to lookfor similarities and differences between cases. Althoughthis is not a common method of analysis in computerscience, it is a recommended approach for social sciencesand other fields that require the analysis of human behavior[13, 29]. It is well suited for our purposes here because ourvariables of interest are heavily influenced by humanbehavior and because we are not attempting to provehypotheses about framework usage, but rather to beginformulating hypotheses about this process, about which wecurrently know little.

A specific threat to internal validity might be that we haveconstructed one of the techniques incorrectly, which wouldexplain the differences in performance. As regards theexample-based technique we used, EB, we attempted tominimize the odds of making this mistake by basing thespecific technique on the example set which is provided bythe framework's authors themselves. For the hierarchy-16

Accepted for publication by IEEE Transactions on Software Engineering.focused technique, HB, we based our model of the

framework on the most important facets of the frameworkdefinition. We feel that the use of the inheritance hierarchymeans that our model is complete because all functionalityprovided by the framework must be encapsulated in one ofthe classes of the class hierarchy, with the inheritancerelations showing how the functionalities provided arerelated to one another. Thus, while there may be othermodels more adept at supporting framework learning, wefeel confident that our model is adequate to the job.We also undertook quantitative analyses to test the effectsof potentially confounding factors (such as differences ineffort spent, understanding achieved, or previousexperience) which could be rival explanations to our

findings. As pointed out in section VII.B, the relatively lownumber of data points in this study means that these resultsshould be seen, not as providing a definitive list of theimportant factors, but of identifying potential factors likelyto have some impact on framework learning. It remains forfurther study to verify this impact, a topic we return to inthe next section.

C. External Validity

External validity aims to assure that the findings of thestudy can be generalized beyond the immediate study.Although generalization can be achieved only throughreplication in multiple studies, we believe that our findingsare relevant for a larger population than this single study.A first threat to external validity might be that professionaldevelopers would have behaved differently than thestudents that we used as the subjects of the study.

Certainly, this is always a danger in studies of this sort.However, in this case we feel that this difference would notbe a strongly significant one. Although the level of

industrial experience in the class was not high, all studentshad experience both programming in the language used(C++) and in object-oriented techniques. More importantly,even professional developers would almost certainly havebeen novices in terms of the use of the ET++ framework, sothat the most immediately applicable experience would nothave significantly varied in either case.

A second threat to external validity might be that ourfindings are tied to the framework we used, ET++.Although we cannot completely rule out this threat tovalidity, ET++ has been thoroughly tested and improvedfrom the initial version and it incorporates seventeen of thedesign patterns in [17]. From this point of view, we

consider ET++ representative of the class of sophisticatedwhite-box frameworks that pose learning problems, whichcan be major inhibitors against their use.

XI. CONCLUSIONS AND FUTURE RESEARCH

This paper has formulated a set of well-motivated

hypotheses concerning white-box frameworks based upondirect observation of white-box framework use indevelopment. As the research community builds up

confidence as to the validity (or falsity) of such hypotheses,a more objective basis can be constructed for activities suchas tool support and training for this class of framework. Forexample, hypotheses 1 and 2 are the result of evidenceshowing that learning by example (as opposed to gainingfamiliarity with the framework itself first) is useful forhelping beginning learners produce working systemsquickly. While the general benefits of example-basedapproaches may or may not seem intuitive, this study hasprovided some evidence, based on empiricism rather thanintuition, that such approaches can be of use on non-trivialdevelopment projects using frameworks. Since an

organization that uses frameworks will tend to build up aset of related applications, all based on the same underlyingstructure, hypotheses 1 and 2 indicate that reuse of

components or even whole applications from this set can beespecially beneficial in a framework environment.Frameworks that come packaged with a set of exampleapplications can provide an analogous benefit.

Hypotheses 4 through 6 (summarized in hypothesis 3)qualify the benefits of example-based learning techniquesby pointing out some of their limitations. A direct

implication of these hypotheses is that the emphasis placedon adapting examples should vary according to the relationbetween the example set and the application to bedeveloped. Hypothesis 7 implies that reusing as muchprevious functionality as possible (even including wholeprevious applications) is a useful strategy in this kind ofapplication environment. Finally, hypothesis 8 providesmore evidence for the benefits of a general software

engineering principle (viz. that a system’s design should beclosely followed during implementation) by showing that italso applies to development using framework technology.As we have emphasized, the results of this study arehypotheses for further study, not definitive conclusions.This is partly due to a number of factors (which wepresented and discussed in section X) that affect thestrength of the conclusions that can be drawn from ourobservations. However, we hope that this paper has made auseful contribution by identifying an initial set of factorsthat should be controlled, monitored, or tested in laterstudies.

One lesson learned from this study is the importance ofprocess conformance; subjects are rarely malicious but arealmost always loathe to use processes they are

uncomfortable with to achieve a result they know can bereached in another manner. This discomfort can be the

result of a steep learning curve for the new technique, or theresult of the unsuitability of the technique for the currentenvironment (as with the HB technique in this study). Theexperimenter must make the important decision of whether:• To constrain subjects to use a specific process, in order

to draw conclusions about that process, or

• To allow subjects the freedom to use processes they

feel are useful, while monitoring the processes

Accepted for publication by IEEE Transactions on Software Engineering.undertaken. (This strategy was the one used in ourstudy.)

In the latter case, qualitative data collection and analysis areespecially important. Without qualitative analysis in thisstudy, we could only have concluded that the HB techniquewas unsuitable to the environment; through our analysis ofinterviews and problem reports, we have some confidencethat the contributing factor was low subject experience withthe framework.

Another result of the qualitative analysis was the

identification of another factor influencing frameworklearning, namely, the level of specificity at which anexample-based technique is followed. This factor, whichcontributes to our hypotheses 4, 5, 6, and 8, seems nearlyimpossible to assess without the use of qualitative methods.Because our analysis detected distinct differences betweensubjects using different levels of specificity (in ourterminology, between “strictly adopted” and “ad hoc

adopted”) this factor should be assessed in future studies aswell.

As shown in section VII.B, a third important factor

identified by this study was the previous experience of thesubjects. The correlation between subject experience andeffectiveness at implementation shows that it is necessary toassess the effect of experience on the use of new

techniques. In any case, subject experience can be expectedto have an effect on the outcome of software engineeringpractices and it is necessary to check that it does not

overshadow other factors, such as the use of the techniqueunder study.

It is our hope that future studies in this area can use ourresults as a beginning for verifying, bounding andextending these hypotheses. Certainly, studies in theframework domain have the potential for verifying thepractical implications of these hypotheses, such as whethertraining professional developers to concentrate too much onreuse will result in systems of lower quality (as it did forour student subjects, reflected in hypotheses 4 and 8).Studies can bound the hypotheses by discovering contextsin which the techniques are more or less effective, e.g., EBis effective for beginning framework users but HB is moreeffective for advanced framework users. But it is to behoped that studies will be undertaken to test the

generalizability of our results to other areas as well. Forexample, the indications from this study have already

proven useful for our study of software reading techniques.In this study, we saw that the level of specificity at which atechnique is followed can have a distinct effect on theoutcome; we have since run experiments on other readingtechniques in which the level of specificity was explicitlyvaried [41]. These experiments have helped us concludethat specificity is an important variable in software readingresearch, although its specific effects may vary from case tocase.

Other potential areas of generalization exist as well. Oneexample (among many) is that hypotheses 1 and 2,

concerning the effectiveness of example-based learning,need not be limited to framework-based environments.Evaluations of example-based learning in related areaswould be of great interest. For example, could these resultsbe taken to imply that an effective way to learn to programwould be to first study existing programs (i.e. that the firststep in learning to write good programs is learning how toread them)?

In order to facilitate replication or review of this study wehave set up a web site containing as much as possible of ourexperimental materials and data. This web site may befound at [42].

XII. ACKNOWLEDGEMENTS

Our thanks to Gianluigi Caldiera for his invaluable

assistance designing and running this experiment as a partof the course CMSC 435 (Fall 1996) at University of

Maryland, College Park. Our thanks also go to the studentsof CMSC 435 for their cooperation and hard work.This work has been supported by UMIACS and NSF grantCCR9706151.

XIII. REFERENCES

[1]

MacApp Programmer’s Guide. Apple Computer,1986.

[2]V. Basili, R. Reiter, Jr, “A Controlled Experiment

Quantitatively Comparing Software DevelopmentApproaches”, IEEE Trans. Software Engineering, vol.SE-7, pp. 299-320, May 1981.

[3]V. Basili, S. Green , O. Laitenberger , F. Lanubile, F.

Shull, S. Soerumgaard, and M. Zelkowitz, \"TheEmpirical Investigation of Perspective-BasedReading”, Empirical Software Engineering: AnInternational Journal , vol. 1, no. 2, pp. 133-1,1996.

[4]V. Basili, G. Caldiera, F. Lanubile, and F. Shull,

“Studies on Reading Techniques”, in Proc. of the

Twenty-First Annual Software Engineering Workshop,Dec. 1996, pp. 59-65.

[5] D. Baumer, G. Gryczan, R. Knoll, C. Lilienthal, D.

Riehle, and H. Zullighoven, \"Framework

Development for Large Systems\Communications ofthe ACM, vol. 40, no. 10, pp.52-59, October 1997.[6]K. Beck, R. Johnson, “Patterns Generate

Architectures”, in Proc. ECOOP’94, 1994.[7]D. S. Brandt, “Constructivism: Teaching for

Understanding of the Internet”, CACM, vol. 40, pp.112-117, Oct. 1997.

[8] D. Brugali, G. Menga, and A.Aarsten, “The

Framework Life Span”, Communications of the ACM,vol. 40, no. 10, pp.65-68, October 1997.[9]J. Carroll, The Nurnberg Funnel: Designing

Minimalist Instruction for Practical Computer Skill.Cambridge, MA: MIT Press, 1990.18

Accepted for publication by IEEE Transactions on Software Engineering.[10]M. Chi, M. Bassok, M. Lewis, P. Reimann, and R.

Glaser, “Self-Explanations: How Students Study andUse Examples in Learning to Solve Problems”,University of Pittsburgh, Technical ReportUPITT/LRDC/ONR/KBC-9, Nov. 1987.

[11]W. Codenie, K. DeHondt, P. Steyaert, A. Vercammen,

“From Custom Applications to Domain-SpecificFrameworks”, CACM, vol. 40, pp. 71-77, Oct. 1997.[12]Conover, Practical Nonparametric Statistics, 2nd

Edition. NY: John Wiley & Sons, 1980.

[13]K. Eisenhardt, “Building Theories from Case Study

Research”, Academy of Management Review, vol. 14,no. 4, pp. 532-550, 19.

[14]M. A. Fayad, and D. C. Schmidt, \"Object-Oriented

Application Frameworks\Communications of theACM, vol. 40, no. 10, pp.32-38, October 1997.[15]C. Frei and H. Schaudt, ET++ Tutorial: Eine

Einführung in das Application Framework. SoftwareSchule Schweiz, Bern, 1991.

[16]G. Froehlich, H. Hoover, L. Liu, and P. Sorenson,

“Hooking into Object-Oriented ApplicationFrameworks”, in Proc. of the 19th International

Conference on Software Engineering, May 1997, pp.491-501.

[17]E. Gamma, R. Helm, R. Johnson, and J. Vlissides,

Design Patterns: Elements of Reusable Object-Oriented Software. Reading, MA: Addison-Wesley,1995.

[18]D. Gangopadhyay, S. Mitra, “Understanding

Frameworks by Exploration of Exemplars”, in Proc.of 7th International Workshop on CASE, July 1995,pp. 90-99.

[19]H. G. Glaser, A. L. Strauss. The Discovery of

Grounded Theory: Strategies for QualitativeResearch. Hawthorne, NY: Aldine PublishingCompany, 1967.

[20]A. Goldberg, Smalltalk-80: The Interactive

Programming Environment. Reading, MA: Addison-Wesley, 1984.

[21]L. Hatcher, E. J. Stepanski, A Step-by-Step Approach

to Using the SAS® System for Univariate and

Multivariate Statistics. Cary, NC: SAS Institute Inc.,1994.

[22]R.E. Johnson and B. Foote, \"Designing Reusable

Classes\Journal of Object-Oriented Programming,vol.1, no. 5, pp.22-35, June/July 1988.

[23]R. Johnson, “Documenting Frameworks with

Patterns”, in Proc. OOPSLA ‘92, October 1992, pp.63-76.

[24]R.E. Johnson, \"Frameworks = Patterns +

Components\Communications of the ACM, vol. 40,no. 10, pp.39-42, October 1997.

[25]C.M.Judd, E.R.Smith, and L.H.Kidder, Research

Methods in Social Relations, sixth edition. ForthWorth: Holt, Rinehart and Winston, Inc., 1991.[26]P. Koltun, L. Deimel Jr., and J. Perry, “Progress

Report on the Study of Program Reading”, ACMSIGCSE Bulletin, vol. 15, pp. 168-176, Feb. 1983.[27]O. Laitenberger and C. Atkinson, “Generalizing

Perspective-based Inspection to Handle Object-Oriented Development Artifacts”, in Proc. ICSE’99.(To appear.)

[28]T. Lewis, L. Rosenstein, W. Pree, A. Weinand, E.

Gamma, P. Calder, G. Andert, J. Vlissides, K.Schmucker, Object Oriented Application

Frameworks. Greenwich: Mannings Publication Co.,1995.

[29]M. Miles, “Qualitative Data as an Attractive

Nuisance: The Problem of Analysis”, AdministrativeScience Quarterly, vol. 24, no. 4, pp. 590-601, 1979.[30]H. Mili, H. Sahraoui, I. Benyahia, “Representing and

Querying Reusable Object Frameworks”, in Proc. ofthe Symposium on Software Reusability, May 1997.[31]R. Ott. An Introduction to Statistical Methods and

Data Analysis. Belmont, CA: Duxbury Press, 1993.[32]W. Pree, Design Patterns for Object-Oriented

Software Development. Reading, MA: ACM Press &Addison-Wesley Publishing Co., 1995.

[33]D. Roberts and R. Johnson, \"Patterns for Evolving

Frameworks\Pattern Languages of ProgramDesign, R.C. Martin et al. (eds.), Software PatternsSeries, Addison Wesley, 1997.

[34]M. B. Rosson, J. M. Carroll, and R. K. E. Bellamy,

“SmallTalk Scaffolding: A Case Study of MinimalistInstruction”, in Proc. CHI ‘90, April 1990, pp. 423-429.

[35]S. Rugaber, S. B. Ornburn, and R. J. LeBlanc, Jr.,

“Recognizing design decisions in programs”, IEEESoftware, vol. 7, pp. 46-54, Jan. 1990.

[36]J. Rumbaugh, M. Blaha, W. Premerlani, F. Eddy, and

W Lorensen, Object-Oriented Modeling and Design.Englewood Cliffs, NJ: Prentice Hall, 1991.[37] H. A. Schmid, \"Creating Applications from

Components: a Manufacturing Framework Design\IEEE Software, vol. 13, no. 6, pp.67-75, November1996.

[38] D. C. Schmidt, \"Applying Patterns and Frameworks to

Develop Object-Oriented Communication Software\in P. Salus, ed., Handbook of Programming

Languages, vol.1, MacMillan Computer Publishing,1997.

[39]K. Schneider, A. Repenning, “Deceived by Ease of

Use: Using Paradigmatic Applications to Build VisualDesign Environments”, in. Proc. of the Symposium onDesigning Interactive Systems, Aug. 1995.

[40]C. B. Seaman, V. R. Basili, “An Empirical Study of

Communication in Code Inspection”, in Proc.ICSE’97, May 1997, pp. 96-106.

[41]F. Shull. Developing Techniques for Using Software

Documents: A Series of Empirical Studies. Ph.D.thesis, University of Maryland, College Park,December 1998.

[42]F. Shull, “Reading Techniques for Object-Oriented

Frameworks\

http://www.cs.umd.edu/projects/SoftEng/ESEG/manual/sbr_package/manual.html.

[43]J. Singer and T. C. Lethbridge, “Methods for Studying

Maintenance Activities”, in Proc. of 1st InternationalWorkshop on Empirical Studies of SoftwareMaintenance, Nov. 1996, pp. 105-110.

Accepted for publication by IEEE Transactions on Software Engineering.[44]S. Sørumgård. Verification of Process Conformance

in Empirical Studies of Software Development. Ph.D.thesis, Norwegian University of Science andTechnology, February 1997.

[45]Software Engineering Laboratory, Recommended

Approach to Software Development, Revision 3.National Aeronautics and Space Administration,

Software Engineering Laboratory, SEL-81-305, 1992.[46]Taligent, Inc., The Power of Frameworks. New York:

Addison-Wesley, 1995.

[47]J. Vlissides, Unidraw Tutorial I: A Simple Drawing

Editor. Stanford University, 1991.

[48]A. von Mayrhauser and A. M. Vans, “Industrial

Experience with an Integrated Code ComprehensionModel”, IEEE Software Engineering Journal, pp.171-182, Sept. 1995.

[49]A. Porter, L. Votta Jr., and V. Basili. “Comparing

Detection Methods for Software RequirementsInspections: A Replicated Experiment”. IEEETransactions on Software Engineering, 21(6): 563-575, June 1995.

[50]A. Weinand, E. Gamma, and R. Marty, “Design and

Implementation of ET++, a Seamless Object-OrientedApplication Framework”, Structured Programming,vol. 10, no. 2, 19.

[51]R. Yin, Case Study Research: Design and Methods.

London: Sage Publications, 1994.

[52]Z. Zhang, V. Basili, and B. Shneiderman,

\"Perspective-based Usability Inspection: AnEmpirical Validation of Efficacy\InternationalJournal of Empirical Software Engineering, specialissue on Human-Computer Interaction. (To appear.)

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文