This testing seeks to exercise the transitions among the states of objects based upon the identified inputs. For this testing, finite-state machine FSM or state-transition diagram representing the possible states of the object and how state transition occurs is built. In addition, state-based testing generates test cases, which check whether the method is able to change the state of object as expected.
If any method of the class does not change the object state as expected, the method is said to contain errors. To perform state-based testing, a number of steps are followed, which are listed below. Fault-based testing is used to determine or uncover a set of plausible faults.
In other words, the focus of tester in this testing is to detect the presence of possible faults. Fault-based testing starts by examining the analysis and design models of OO software as these models may provide an idea of problems in the implementation of software. With the knowledge of system under test and experience in the application domain, tester designs test cases where each test case targets to uncover some particular faults.
The effectiveness of this testing depends highly on tester experience in application domain and the system under test. This is because if he fails to perceive real faults in the system to be plausible, testing may leave many faults undetected.
However, examining analysis and design models may enable tester to detect large number of errors with less effort. As testing only proves the existence and not the absence of errors, this testing approach is considered to be an effective method and hence is often used when security or safety of a system is to be tested.
Integration testing applied for OO software targets to uncover the possible faults in both operation calls and various types of messages like a message sent to invoke an object. These faults may be unexpected outputs, incorrect messages or operations, and incorrect invocation. The faults can be recognized by determining the behavior of all operations performed to invoke the methods of a class.
This is the concept of abstraction, which is incredibly useful in all areas of engineering and also applied to great effect in object-oriented programming.
Example: In OOP, we might have a class defined to represent the human body. These details are completely hidden in the implementation of the walk and eatFood body functions and are, therefore, us abstracted away from the end user. Therefore, if a class inherits from another class, it automatically obtains a lot of the same functionality and properties from that class and can be extended to contain separate code and data.
Consider two classes: one being the superclass—or parent—and the other being the subclass—or child. The child class will inherit the properties of the parent class, possibly modifying or extending its behavior. Example: For instance, in the animal world, an insect could be represented by an Insect superclass.
All insects share similar properties, such as having six legs and an exoskeleton. Subclasses might be defined for grasshoppers and ants. Because they inherit or are derived from the Insect class, they automatically share all insect properties.
In OOP, polymorphism allows for the uniform treatment of classes in a hierarchy. This research question entails measuring the coverage provided by the set of the most popular metrics for each language and providing the optimal set of tools that can compute those metrics.
The search strategy involves the selection of the search resources and the identification of the search terms. The formulation of the search strings is crucial for the definition of the search strategy of the SLR. According to the guidelines defined by Kitchenham et al. In this phase, all the researchers collaboratively selected several pilot studies. The selected pilot studies are presented in Table 1 and are related to the target research domain. These studies are selected to be used to verify the goodness of the research queries: the researchers should review the queries if the pilot studies are not present after the refining phase.
The starting keywords identified were software , maintainability , and metrics. Our results include articles published between and This first search pointed out that adding the code synonym of the keyword software added a large numbers of papers to the results. Also, the following keywords were excluded from the search to reduce the number of unfitting papers from the results: i Defect and fault , to avoid considering manuscripts more related to the topic of verification and validation, error-proneness, and software reliability prediction, than to code maintainability ii Co-change , to avoid considering manuscripts more related to the topic of code evolution iii Policy-driven and design , to avoid considering manuscripts more related to the definition and usage of metrics used to design software, instead of evaluating existing code.
Table 2 reports the search queries before and after excluding the keywords listed above, for each of the chosen digital libraries. The final phase of the study selection uses the studies obtained by applying the final search queries detailed below.
After defining the review protocol in the planning phase, the conducting phase involves its actual application, the selection of papers by application of the search strategy, and the extraction of relevant data from the selected primary studies.
This phase consisted of gathering all the studies by applying the search strings formulated and discussed in Section 2.
To this end, we leveraged the Publish or Perish PoP tool [ 17 ]. To aid the replicability of the study, we report that we performed the last search iterations at the end of October After the application of the queries and the removal of the duplicate papers on the four considered digital libraries, unique papers were gathered see Table 3.
The result of this phase is a list of possible papers that must be subject to the application of exclusion and inclusion criteria. This action allows having a final verdict for their selection as primary studies for our SLR. We exported the mined papers in a CSV file with basic information about each extracted manuscript. The authors of this SLR carried the paper selection process independently. To analyze the papers, we used a 5-point Likert scale, instead of dividing them between the fitting and unfitting.
We performed the following assignation: i One point to the papers that matched exclusion criteria and did not match any inclusion criteria ii Two points to papers that matched some exclusion criteria and some inclusion criteria iii Three points to papers that did not match any criteria neither exclusion or inclusion iv Four points to papers that matched some, but not all, inclusion criteria v Five points to papers that matched all inclusion criteria.
We analyzed the studies in two different steps: first, the title and abstract for finding immediate compliance of the paper to the inclusion and exclusion criteria. For papers that received 3 points after reading the title and abstract, the full text was read, with particular attention to possible usage or definition or metrics throughout the body of the article.
At the end of the second read, none of the uncertain studies were evaluated as fitting with our research needs, and hence, no other primary study was added to our final pool.
During this phase, we also applied the process of snowballing. Snowballing refers to using the reference list of the included papers to identify additional papers [ 18 ]. The application of snowballing, for this specific SLR, did not lead to any additional paper to take into consideration. In this phase, we read each identified primary studies again, to mine relevant data for addressing the formulated RQs. We have created a spreadsheet form to be compiled for each of the considered papers, and that contained the data of interest subdivided by the RQ they concurred to answer.
The data extraction phase, again, was performed by all the authors of the papers in an independent manner. For each paper, we collected some basic context information: i Year of publication ii Number of times the paper was viewed fully and number of citations iii Authors and location of the authors.
To answer RQ1. Hence, for each paper, we extracted the following data: i The list of metrics and metric suites utilized in each paper ii The programming languages and the family of programming language e. We took in consideration the opinion of the authors on each of the metrics studied in their papers. This allowed us to evaluate if a metric is considered useful or not in most papers. This analysis allowed us to take into consideration the popularity of the metrics by counting the difference between positive and negative citations by authors.
To answer RQ2. For each paper that mentioned tools, we hence gathered the following information: i The list of tools described, used, or cited by each paper ii When possible, the list of metrics that can be calculated by each tool iii The list of programming languages on which the tool can operate iv The type of the tool, i.
Finally, to answer RQ2. We achieved this by finding the tool or tools covering the metrics that proved to be the most popular among selected primary studies. In this phase, we elaborated the data extracted and synthesized previously to obtain a response for each of the research questions we had. Having all the data we needed, in the shape of a form per paper analyzed, we proceeded with the data synthesis. We gathered all the metric suites and the metrics we found in tables, keeping track of the papers mentioning them.
We computed aggregate measures on the popularity value assigned to each metric. This section describes the results obtained to answer the research questions described in Section 2. The appendices of this paper report the complete tables with the extracted data to improve the readability of this manuscript.
At the end of this phase, we collected a final set of 43 primary studies for the subsequent phase of our SLR. Figure 1 reports the distribution over the considered time frame of the selected papers, and Figure 2 indicates the distribution of authors of related studies over the world.
We report the selected papers in Table 4. The statistic seems to suggest that the interest in software maintainability metrics had grown since and has increased in the latest years since see barplot in Figure 1. The papers selected as primary studies for our SLR cited a total of different metrics.
We report all the metrics in Table 5 in the appendix. The table reports i the metric suite empty if the metric is not part of any specific suite ii the metric name acronym, if existing, and a full explanation, if available iii the list of papers that mention the metric. The last two columns, respectively, report iv the total number of papers mentioning the metric i. By examining the last two columns of the metrics table, it can be seen that the last two columns are most of the times identical.
This is because the majority of the papers we found just utilize the metrics without commenting them, neither positively or negatively. It is immediately evident that some suites and metrics are taken into consideration much more often than others. The boxplots in Figure 3 show, in red, the distribution of the total number of mentions and the score for all the considered metrics.
It is evident, from the boxplots, that the difference between the two distribution is rather limited, confirming the vast majority of neutral or positive opinions when the metrics are referenced in a research paper. Since only In general, however, it is worth underlining that a low score does not necessarily mean that the metric is of lesser quality but instead that it is less known in the related literature.
Another interesting thing to point out is that we did not find a particular metric that received many negative scores. Since our analysis was aimed at finding the most popular metrics, to extract a set of them to be declined to different languages, we were interested in finding metrics mentioned by multiple papers.
In Table 6 we report metrics that were used by at least two papers among the selected primary studies. This operation allowed us to reduce the noise caused by metrics that were mentioned only once possibly in the papers where they were originally defined.
After applying this filter, only 43 metrics the The boxplots in Figure 3 show, in green, the distributions of the total number of mentions and the measured score for this set of metrics.
On these distributions, the rounded median value for the total number of mention is 3, and for the score is 3. Since our final aim in answering RQ1. With this additional filtering, we obtained a set of 13 metrics and 2 metric suites, which are reported in Table 7. Two suites were included in their completeness namely, the Chidamber and Kemerer suite and the Halstead suite because all of their metrics had a number of total mentions and score higher or equal to the median. For them, the table reports the lower number of mentions and score among those of the contained metrics.
Instead, for the Li and Henry suite, only the MPC message passing coupling metric obtained a number of mention and score above the median and hence was included in our set of selected most popular metrics. A brief description of the selected most popular metrics is reported in the following.
It is developed by McCabe in [ 56 ] and is a metric meant to calculate the complexity of code by examining the control flow graph of the program, i. The assumption is that the complexity of the code is correlated to the number of execution paths of its flow graph. Such relationship is independent from the used programming language and code paradigms [ 57 ].
Each node in the flow graph corresponds to a block of code in the program where the flow is sequential; the arcs correspond to branches that can be taken by the control flow during the execution of the program. It is a metric that measures how many data types the analyzed class utilizes, apart from itself. The metric takes into consideration the known type inheritance, the interfaces implemented by the class, the types of the parameters of its methods, the types of the declared attributes, and the types of the used exceptions.
It is a change metric, which measures how many lines of code are changed between two versions of the same class of code. This metric is hence not defined on a single version of the software project, but it is tailored to analyze the evolution of the source code. The assumption between the usage of this metric is that if a class is continuously modified, it can be a sign that it is hardly maintainable. Generally, three types of changes can be made to a line of code: additions, deletions, or modifications.
In the literature, there is typically accordance about how to count the operations of modifications, which typically counts two times as the additions or deletions the modification is considered as a deletion followed by an addition.
Most of the times, comments, and blanks are not considered in the computation of the changed LOCs during the evolution of software code. It is one of the best-known sets of metrics, which was introduced in [ 58 ]. This suite has been designed keeping into consideration the object-oriented approach. DIT, depth of inheritance tree, defined as the length of the maximal path from the leaf node to the root of the inheritance tree of the classes of the analyzed software.
Inheritance helps to reuse the code; therefore, it increases the maintainability. The side effect of inheritance is that classes deeper within the hierarchy tend to have increasingly complex behaviour, making them difficult to maintain. Having one, two, or even three levels of inheritance can help the maintainability, but increasing the value further is deemed detrimental.
NOC, number of children, is the number of immediate subclasses of the analyzed class. As the NOC increases, maintainability of the code increases. CBO, coupling between objects, is the number of classes with which the analyzed class is coupled. Two classes are considered coupled when methods declared in one class use methods or instance variables defined by the other class. Thus, this metric gives us an idea on how much interlaced the classes are to each other and hence how much influence the maintenance of a single class has on other ones.
RFC, response for class, is defined as the set of methods that can potentially be executed in response to a message received by an object of that class. Also, in this case, the greater is the returned value, the greater is the complexity of the class. LCOM, lack of cohesion in methods, is defined as the subtraction between the number of method pairs having no attributes in common, and the number of method pairs having common attributes. Several other versions of the metrics have been provided in the literature.
High values of LCOM metric value provide a measure of the relative disparate nature of methods in the class. It is the metric which gives the number of lines of code which contain textual comments. Empty lines of comments are not counted. In contrast to the LOC metric, the higher the value CLOC returns, the more the comments there are in the analyzed code; therefore, the code should be easier to understand and to maintain.
It is introduced in [ 59 ] and is a set of statically computed metrics, which tries to assess the efforts required to maintain the analyzed code, the quality of the program, and the number of errors in the implementation. To compute the metrics of the Halstead suite, the following indicators must be computed from the source code: n 1 , i.
Operands are the objects that are manipulated, and operators are all the symbols that represent specific actions.
Operators and operands are the two types of components that form all the expressions. By definition, the Vocabulary constitutes a lower bound for the Length, since each distinct operator and operand has at least an occurrence. It is a metric specific for Java code, which is defined as the number of lines of code to which JavaDoc comments are associated. It is similar to other metrics discussed in the literature that measure the number of comments in the source code.
In general, a high value for the JLOC metrics is deemed positive, since it suggests better documentation of the code and hence a better changeability and maintainability. This metric is specific to the Java programming language. It is a widely used metric which is often used for its simplicity.
It gives an immediate measure of the size of the source code. Among the most popular metrics, the LOC metric was the only one to have two negative mentions in other works in the literature. These comments are related to the fact that there appears to be no single, universally adopted definition of how this metric is computed [ 14 ]. Some works consider the count of all the lines in a file, and others the majority remove blank lines from such computation; if there is more than one instruction in a single line or a single instruction is divided into different rows, there is ambiguity about considering the number of lines physical lines or the actual number of instructions involved logical lines.
Thus, it is of the utmost importance that the tools to calculate the metrics specify exactly how they calculate the values they return or that they are open source, hence allowing an analysis of the tool source code for deriving such information. Although LOC seems to be poorly related to the maintenance effort [ 14 ] and there is more than one way to calculate it, this metric is used within the maintainability index, and it seems to be correlated with many of different metric measures [ 60 ].
The assumption is that the bigger the LOC metric, the less maintainable the analyzed code is. LCOM2 equals the percentage of methods that do not access a specific attribute averaged over all attributes in the class. If the number of methods or attributes is zero, LCOM2 is undefined and displayed as zero. A low value of LCOM2 indicates high cohesion and a well-designed class.
It is a composite metric, proposed as a way to assess the maintainability of a software system. There are different definitions of this metric, which was firstly introduced by Oman and Hagemeister in [ 61 ]. There are two different formulae to calculate the MI, one utilizing only three different metrics, Halstead volume HV , cyclomatic complexity CC , and the number of lines of code LOC , while the other takes in consideration also the number of comments. Despite being quite popular, Ostberg and Wagner express their doubts about the effectiveness of this metric, claiming it does not give information about the maintainability of the code, since it is based on metrics considered not suited for that task, and the result of the metric itself is not intuitive [ 14 ].
In contrast, Sarwar et al. A returned value above 85 means that the code is easily maintainable; a value from 85 to 65 indicates that the code is not so easy to maintain; below 65, the code is difficult to maintain.
The returned value can reach zero, and even become negative, especially for large projects. It is a metric from the Li and Henry suite the only metric of that suite to have a score above the rounded median , and it is defined as the number of send statements defined in a class [ 62 ], i. It returns the number of all the methods in a class that are declared as public. It counts the number of statements in a method.
Different variations of the metric have been proposed in the literature, which differ on the decision of counting statements also in named inner classes, interfaces, and anonymous inner classes.
For instance, Kaur et al. It is a measure of complexity that sums the complexity of all the methods implemented in the analyzed code. A simplified variant of this metric, called WMC-unweighted, simply counts each method as if it had unitary complexity; this variant corresponds to the NOM number of methods metric. In Table 8 , we report all the tools that were identified while reading the papers. The columns report, respectively, as follows: the name of the tool, as it is presented in the studies; the studies using it; a web source where the tool can be downloaded.
In the upmost section of the table, we reported papers from which we cannot find the used tool i. In the second and third section of the table, we have divided the tools according to their release nature, i. The table reports information about a total of 38 tools: 19 were not found; 6 were closed source; and 13 were open source. The majority of the tools we found are mentioned by only one study; three are cited by two studies, and only one, CKJM, is quoted by five papers.
It is immediately evident that the open-source tools are more than two times in number than the closed-source ones. Save Article. Like Article. Recommended Articles. Article Contributed By :. Easy Normal Medium Hard Expert. Writing code in comment? Please use ide.
0コメント