PSP Final Report Guidelines

This page provides guidelines for producing the PSP Final Report with the help of ProcessPAIR.

The requirements for the PSP Final Report are described in the official SEI’s “Assignment Kit for Final Report” of the “PSP for Engineers” course.

ProcessPAIR provides important advantages:

  • it lets you compare your performance data with the performance data of a large number of PSP students (more than 3,000);
  • it significantly automates the identification of your performance problems and root causes;
  • it reduces the effort needed to produce the PSP Final Report;
  • it prevents errors in the analysis;
  • it lets you focus your analysis on the identification of deeper causes (not present in the data collected) and remedial actions.

The next sections are structured according to the analysis questions described in the assignment kit. For each question, it is explained how ProcessPAIR helps answering the question. It is assumed that you have recorded your performance data with Process Dashboard or the SEI’s PSP Student Workbook. Before proceeding to the next sections, please open ProcessPAIR and upload your performance data file, selecting the “PSP Performance Model for PSP Final Report” model (see initial instructions in the Introduction to ProcessPAIR tutorial). It is important that you read the guidelines sequentially because several features and techniques are explained on the first occurrence.

Analysis of size estimating accuracy

What are the average, maximum, and minimum actual sizes of your programs in LOC to date?

In the “Indicator View”, “Base Measures” group, select the “Actual Size” item to obtain the required information, as illustrated below (student 13). You can copy the chart to your report, using the “Copy Image” button on the top of the screen or the usual print screen facilities, and add any pertinent comments. In this example, the average, maximum, and minimum actual sizes in LOC to date are 140.5, 252 and 56, respectively. Please notice that these are added and modified lines of code (as explained in the bottom left).

Screenshot 2016-05-01 10.57.29

Excluding assignment 1, what percentage over or under the actual size was the estimated size for each program (for example, if estimated/actual is in %, 85% is 15% under, 120% is 20% over)?  What are your average, maximum, and minimum values for these?

In the “Indicator View”, “Performance Indicators” group, select the “Size Estimation Accuracy” item to obtain the required information, as illustrated below.

Screenshot 2016-05-01 10.57.03

Please notice that ProcessPAIR measures the size estimation accuracy as the ratio of actual to estimated size, being 1 the optimal value. To obtain the percentual size estimation error you have to subtract 1 and multiply by 100%. In the example below, the 2.06 accuracy corresponds to a 106 % (under) estimation error, and the 0.52 accuracy corresponds to a -48% (over) estimation error. The average is 1.108 (10.8%).

ProcessPAIR computes the weighted average of a ratio by dividing the total value of the numerator for all projects by the total value of the denominator for all projects. In this case, it is the total actual size of programs 2 to 6 divided by the total estimated size of programs 2 to 6.  The weighted average is 0.851 (-14.9%). This value is of less interest in this case.

ProcessPAIR also shows in the chart the recommended performance ranges, calibrated based on the performance data of many PSP students. The green range (good performance) corresponds to the 1/3 best values in the calibration data, and goes from 0.85 to 1.22 (-15% to 22%). The ‘red’ range (poor performance) corresponds to the 1/3 worst values in the calibration data. The yellow range is in between. By comparing your data with the control limits shown in the chart , you can make informed comments about your own performance. In the example above, the student wasn’t able to reach a good size estimation performance (only two points green), as compared to the population used for calibration.

You can also check in the “Report View” if ProcessPAIR identified any clear or potencial (moderate) problem regarding your size estimation performance, as illustrated below.

Screenshot 2016-05-01 10.54.58

By pressing the “Show Statistical Distribution Chart” button, you can get another view of how your performance compares with the population used for calibration, as illustrated below (bottom left).

Screenshot 2016-05-01 10.55.51

By selecting both the “Actual Size” and “Estimated Size” in the list of indicators, you can also visually compare the base measures from which the Size Estimation Accuracy is computed, as illustrated below.

Screenshot 2016-05-01 10.56.37

How often was my actual program size within my 70% statistical prediction interval (when you used methods A or B)?

The relevant chart is obtained by selecting the Actual Size, Size UPI (upper prediction interval) and Size LPI (lower prediction interval), as shown below (student 15). In this example a prediction interval was computed only in program 6. The actual size (round symbol) is within the 70% statistical prediction interval (between LPI and UPI).

Screenshot 2016-05-01 10.58.42

Do I have a tendency to add/miss entire objects?

The relevant charts are shown below. In ProcessPAIR, missing objects are objects that were actually added or modified, but were not planned. Extraneous objects are objects that were planned, but were not actually added or modified. In this example, missed or extraneous objects occurred only in the first project, so it is fair to conclude that there is no tendency to add/miss entire objects.

Screenshot 2016-05-01 10.59.48

Screenshot 2016-05-01 11.00.55

Currently, ProcessPAIR has no benchmarks for these two indicators, because of the lack of historical data, but obviously the closest the values are to 0% the better. A range of 0 to 20% can usually be considered acceptable.

Do I have a tendency to misjudge the relative size of objects?

The relevant chart is shown below. This chart refers to objects that were correctly identified in the planning phase (i.e., missing or extraneous objects are not included here).

Screenshot 2016-05-01 11.01.23

Currently, ProcessPAIR has no benchmarks for this indicator, because of the lack of historical data, but it is reasonable to use ranges similar to the Size Estimation Accuracy, i.e., a green range between 0.85 and 1.22. In this example, although it can be observed a tendency for improvement, there is an oscillatory behavior  with room for improvement towards the green range.

By viewing together the Objects Estimation Accuracy and the Size Estimation Accuracy, we can check that they are closely related as expected, as shown below. After program 2,  the small deviations occur because size estimates sometimes have a statistical adjustment.

Screenshot 2016-05-01 11.01.52

Based on my historical size-estimating accuracy data, what is a realistic size-estimating goal for me?

ProcessPAIR lets you compare your performance to the performance achieved by other people, and hence helps establishing realistic goals. In case you already have a good performance (green range), then probably you just have to keep your current performance. Otherwise, moving to the next range may be a realistic goal (from red to yellow, or from yellow to green).

How can I change my process to meet that goal?

Before thinking of process changes you should first identify root causes for current performance issues. To help identifying root causes, ProcessPAIR organizes the performance indicators in a tree-like structure, where the child nodes represent factors that may affect the parent nodes, as illustrated below.

Screenshot 2016-05-01 11.12.17

So, by drilling down from top-level indicators to child indicators, you can identify the problematic factors (root causes) and subsequently devise improvement actions (process changes) to address them, as illustrated below.

In the case of size estimation accuracy, the child indicators are related with the causes asked in previous questions: tendency to add/miss entire objects; tendency to misjudge the relative size of objects.  In the example shown, the relevant cause is the tendency to misjudge the relative size of objects. It can be observed a tendency for improvement, with a very good accuracy in the last project (0.96 or -4% error), but oscillations are still high. Usually such oscillations are reduced with continued practice.

In case you found deeper or different causes for the time estimation problems, you should devise actions for addressing them.

Analysis of time estimating accuracy

What are the average, maximum, and minimum times of your assignments to date?

In the “Indicator View”, “Base Measures” group, select the “Actual Time” item to obtain the required information, as illustrated below (student 13). You can copy the chart to your report, using the “Copy Image” button or the usual print screen facilities, and add pertinent comments. In this example, the average, maximum, and minimum times in the assignments to date are 227.2, 322 and 78 minutes, respectively.

Screenshot 2016-05-01 11.28.16

What percentage over or under the actual time was the estimated time for each program (for example, if estimated/actual is in %, 85% is 15% under, 120% is 20% over)?  What are your average, maximum, and minimum values for these?

In the “Indicator View”, “Performance Indicators” group, select the “Time Estimation Accuracy” item to obtain the required information, as illustrated below.

Screenshot 2016-05-01 11.28.39

Similarly to the size estimating accuracy, please notice that ProcessPAIR measures the time estimation accuracy as the ratio of actual to estimated time, being 1 the optimal value. To obtain the percentual size estimation error you have to subtract 1 and multiply by 100%. In the example below, the 3.46 accuracy corresponds to a 246% (under) estimation error, and the 0.66 accuracy corresponds to a -34% (over) estimation error. The (simple) average is 1.430 (43% under estimation error). The weighted average is 1.086 (8.6% under estimation error).

ProcessPAIR also shows in the chart the recommended performance ranges, calibrated based on the performance data of many PSP students. The green range (good performance) corresponds to the 1/3 best values in the calibration data, and goes from 0.87 to 1.20 (-13% to 20%). By comparing your data with the control limits shown in the chart , you can make informed comments about your own performance. In the example above, there was a significant improvement after the first project, but the student wasn’t yet able to produce good estimates (inside the green range).

You can also check in the “Report View” if ProcessPAIR identified any clear or potencial problem regarding your time estimation performance, as illustrated bellow. In this example ProcessPAIR indicates a clear performance problem, together with some causes (to be investigated later in this section).

Screenshot 2016-05-01 11.29.10

By selecting both the “Actual Time” and “Estimated Time” in the list of indicators, you can also visually compare the base measures from which the Time Estimation Accuracy is computed, as illustrated below.

Screenshot 2016-05-01 11.29.43

How often was my actual development time within my 70% statistical prediction interval (when you used methods A or B)?

The relevant chart is obtained by selecting the Actual Time, Time UPI and Time LPI, as shown below (student 14). In this example a prediction interval was computed only in program 6. The actual time (round symbol) is well within the 70% statistical prediction interval (between lower prediction interval and upper prediction interval).

Screenshot 2016-05-01 11.30.16

What are the average, maximum, and minimum values for productivity per program to date in LOC/hr.?

In the “Indicator View”, “Performance Indicators” group, select the “Productivity” item to obtain the required information, as illustrated below.

Screenshot 2016-05-01 11.32.06

Please notice that the weighted average is computed as the ratio between the total size of the programs developed to date and the total time spent in developing those programs.

Similarly to other performance indicators, ProcessPAIR also shows in the chart the recommended performance ranges, calibrated based on the performance data of many PSP students. The green range (good performance) contains the 1/3 best values in the calibration data, and corresponds to a productivity greater or equal than 35.7 LOC/hour. By comparing your data with the control limits shown in the chart , you can make informed comments about your own performance. In the example above, the productivity is almost always in the green range.

You can also check in the “Report View” if ProcessPAIR identified any clear or potencial problem regarding your productivity. In this example, the Report View indicates a potential performance problem with Productivity, as well as the problematic phases, as illustrated below.

Screenshot 2016-05-01 11.45.55

By selecting both the “Actual Size” and “Actual Time” in the list of indicators, you can also visually compare the base measures from which the Productivity is computed, as illustrated below (with Scatter mode selected in the bottom). As expected, higher sizes are associated with higher development efforts (0.84 linear correlation coefficient).

Screenshot 2016-05-01 11.39.06

Is my productivity stable?  Why or why not?

You can check the productivity stability by inspecting the “Productivity” indicator, or by inspecting the “Productivity Stability” indicator computed by ProcessPAIR, as illustrated below.

Screenshot 2016-05-01 11.46.26

In ProcessPAIR, the “Productivity Stability” of a project measures how close the productivity in that project is to the historical productivity (of previous projects), and is given by the ratio of the productivity in that project to the historical productivity, being 1 the optimal value.

Similarly to other performance indicators, ProcessPAIR also shows in the chart the recommended performance ranges, calibrated based on the performance data of many PSP students. In this case, the green range corresponds to a productivity stability between 0.80 and 1.19. By comparing your data with the control limits shown in the chart , you can make informed comments about your own performance. In the example above, the productivity stability is almost always inside the green range.

In PSP training there is usually a productivity decrease in the middle of the training, when some process changes are introduced, followed by a productivity recovery as the new processes are practiced. Since the process changes usually affect specific phases, ProcessPAIR also analyzes the Productivity Stability of each process phase.

The simplest way to identify the phases that are causing productivity instability problems is to look at the “Report View”, as illustrated below. In this example, there is a moderate problem with the overall productivity stability, caused by instability in the Design Review, Design, Unit Test and Plan phases (by decreasing order of importance). This is a typical situation in PSP training, because of the changes introduced in the design phase (introduction of  design templates), design review phase (introduction of design verification techniques), and plan phase (introduction of size estimation, introduction of quality planning, etc.). The instability in the Unit Test phase usually occurs because of the decrease of defects entering the unit test phase.

Screenshot 2016-05-01 11.41.43

To manually identify the problematic phases causing productivity instability problems you can look at different charts in the “Indicator View”, e.g., by switching the X-Axis to “Phase” (first figure below), or by inspecting the child indicators of the Productivity Stability (second figure below, for the Unit Test phase).

Screenshot 2016-05-01 11.44.41

Screenshot 2016-05-01 15.22.08

How can I stabilize my productivity?

As explained before, ProcessPAIR helps identifying the problematic process phases that are causing productivity instability problems. Based on that information, you are in better position to devise relevant improvement actions. In many cases, instability is caused by process changes and consequently can be addressed by repeated practice with a stable (unchanged) process.

How much are my time estimates affected by the accuracy of my size estimates?

The simplest way to answer this question is to look at the Report View, as illustrated below (for student 13, with “Show only leaf causes” unchecked, to visualize intermediate causes).

Screenshot 2016-05-01 11.42.59

The causal analysis conducted by ProcessPAIR, follows the following rational: problems with the Time Estimation Accuracy may be caused by problems with Size Estimation Accuracy or by problems with Productivity Estimation Accuracy (discrepancies between estimated and actual productivity). In the example above, it is identified a clear performance problem with Time Estimation Accuracy, caused mainly (‘high possibility’) by problems with Size Estimation Accuracy, and in second place (‘moderate possibility’) by problems with Productivity Estimation Accuracy. Hence, we conclude that, in this example, time estimation accuracy is highly affected by the accuracy of size estimates.

ProcessPAIR further drills down the causal analyses according to the following rational: productivity estimation problems may occur because of productivity instability problems (making historical productivity not reliable for estimation) or because historical productivity is not used in estimates. In turn, productivity stability problems may be caused by productivity instability in specific phases.

Based on my historical time-estimating accuracy data, what is a realistic time-estimating goal for me?

ProcessPAIR lets you compare your performance to the performance achieved by other people, and hence helps establishing realistic goals. In case you already have a good performance (green range), then probably you just have to keep your current performance. Otherwise, moving to the next range may be a realistic goal (from red to yellow, or from yellow to green).

How can I change my process to meet that goal?

The simplest way to answer this question is to first look at the causes suggested in the Report View, as illustrated below (for student 13, “Show only leaf causes” checked).

Screenshot 2016-05-01 15.22.43

In the example above, the first two causes are much more important than the other ones, so it is a good idea to focus on that causes. For addressing the first one, the historical productivity of previous projects should be used in time estimating for future projects (with the appropriate PROBE method). For addressing the second one, the relevant actions should have already been indicated in a previous section of the report.

In case you found deeper or different causes for the time estimation problems, you should devise actions for addressing them.

Defect and yield analysis

Which defect type accounts for the most time spent in compile?  In test?  In which phase was each type of defect injected most often?

In the “Indicator View”, “Base Measures” group, select the “Fixtime of Defects Removed in Compile” and “Fixtime of Defects Removed in Unit Test” items, and “Defect Type” for the X-Axis, to obtain the answers to the first two questions, as illustrated below.

Screenshot 2016-05-01 15.24.46

In this example there are no defects found in the Compile phase. In the Unit test phase, the defects of type Function take most of the time, followed by the defects of type Interface. Please notice that it is shown the average fix time for all projects.

In this example, to obtain the phases where defects of type Function and Interface where injected most often (third question), select the chart below. Those types of defects are injected mostly in the Code phase, followed by a large distance by the Design and the Unit Test phases.

Screenshot 2016-05-01 15.26.26

What type of defects do I inject during design and coding?

The following chart provides an answer in terms of number of defects. In this example,  the most important type is Function, followed by Interface and Assignment.

Screenshot 2016-05-01 15.27.02

 

What trends are apparent in defects per size unit (e.g., KLOC) found in reviews, compile, and test?

The following charts show the relevant information. In this example there is no Compile phase.

Screenshot 2016-05-01 15.27.34

Screenshot 2016-05-01 15.28.38

In this example, there is a trend for decreasing the defects found in unit test per size unit, with 0 defects in the last two projects. This may be caused by defects being found in earlier phases or because of fewer defects being injected. Usually, fewer defects found in unit test also means that fewer defects remain in the delivered program, so the delivered program is of higher quality.

Regarding defects per size unit found in code and design reviews, there is no clear trend.

What trends are apparent in total defects per size unit?

The relevant chart is shown below. In this example there is a trend for decreasing in the last two projects. Anyways, the total defects per size unit are within the green region (less than 45 defects per KLOC) in all projects. So, the performance is good and improving.

Screenshot 2016-05-01 15.29.00

How do my defect removal rates (defects removed/hour) compare for design review, code review, compile, and test?

The relevant chart is shown below. In this example there is no Compile phase. In this example, the Code Review phase (with 3.85 defects found per hour on average) is slightly more efficient in finding defects than the Unit Test phase (with 3.69 defects found per hour on average). The Design Review phase is less efficient (with 1.38 defects found per hour).

Screenshot 2016-05-01 15.29.37

The following charts allow a better comparison with benchmarks. As indicated in the charts, a good performance for the defect removal rate is a value equal or greater than 8.0 defects/hour in Code Review and a value equal or greater than 4.9 defects/hour in Unit Test (less efficient than Code Review).  In this example, compared to the benchmarks, the weighted average performance in Code Review (3.85 compared to 8.0) is worse than the weighted average performance in Unit Test (3.69 compared to 4.9).  So there is much more room for improvement in the Code Review phase.

Screenshot 2016-05-01 15.29.59

Screenshot 2016-05-01 15.30.38

Screenshot 2016-05-01 15.30.57

What are my review rates (size reviewed/hour) for design review and code review?

The relevant charts are shown below. Please notice that in ProcessPAIR the Design Review Rate and Code Review Rate are also called Design Review Productivity and Code Review Productivity, respectively. In this example, the average Design Review Rate is 324 LOC/hour, which is within the green region.  The average Code Review Rate is 310 LOC/hour, which is also within the green region.  In both cases there is a trend towards getting close to the optimal value.

Screenshot 2016-05-01 15.31.28

Screenshot 2016-05-01 15.31.48

What are my defect-removal leverages for design review, code review, and compile versus unit test?

The relevant values can be picked from the chart shown below. In this example there is no Compile phase. On average, the defect removal leverage is 0.374 for design review versus unit test and 1.042 for code review versus unit test. The interpretation was already done in a previous question.

Screenshot 2016-05-01 15.32.18

Is there any relationship between yield and review rate (size reviewed/hour) for design and code reviews?

The relevant charts are shown below. Please notice that the review yield is undefined when there are 0 defects entering the review phase. In this example, that happens with program 5 (for code and design reviews) and program 4 (for design review only). Regarding the Code Review Yield versus the Code Review Rate, there is a negative correlation (-0.87) as expected (because slower reviews are usually associated with higher yields). However, since there are only 3 data points, the correlation is not statistically significant. As for the Design Review Yield versus Design Review Rate, the trend is different from expected (a slower review rate is associated with a smaller yield), but with only two data points the correlation coefficient is meaningless (always +1 or -1).

Screenshot 2016-05-01 15.33.01

Screenshot 2016-05-01 15.33.22

Is there a relationship between yield and A/FR?

The relevant chart is shown below. In this example there is a very good positive correlation (0.95) between the Process Yield and the Appraisal to Failure Ratio (A/FR). The correlation is also statistically significant (for a 5% confidence level, computed with a web calculator). This correlation is as expected, because spending more time in ‘appraisal’ activities (design and code reviews) is usually associated with a higher process yield (percentage of defects found before compile and test).

Screenshot 2016-05-01 15.33.42

You can also  see the same information in a scatter chart (Scatter check box selected), as illustrated below.

Screenshot 2016-05-01 15.34.02

Quality analysis

How much did the quality of the programs entering unit test change?  Why?

In PSP, the quality of programs entering unit test is usually assessed based on the defects per size unit found in unit test, so the relevant chart is shown below. As already analysed before, in this example there is a trend for decreasing the defects found in unit test per size unit, starting with an average of 27 defects/KLOC in the first two programs, an average of 14 defects/KLOC in programs 3 and 4, and finally 0 defects/KLOC in the last two projects, which is a very good improvement trend.

Screenshot 2016-05-01 15.34.28

Am I finding my defects in design and code reviews?  Why or why not?

The relevant charts are the ones that show the process yield and, for a more detailed information, the design review and code review yields, as shown below. Please notice that the yield is not defined when there are no defects entering a phase. In this example, regarding the process yield, there is a very good evolution from very small values in the first two programs (without review phases) to 60% in programs 3 and 4 (when reviews are introduced) and finally 100% in program 6 (in program 5 the yield is undefined because of no defects recorded). The Code Review Yield follows a similar trend. Regarding the Design Review Yield, it is undefined in programs 5 and 6, and the values observed in the previous two programs still have room for improvement (namely to reach the 78% boundary).

Screenshot 2016-05-01 15.34.46

Screenshot 2016-05-01 15.35.13

Screenshot 2016-05-01 15.35.30

 

Based on my historical data, what are some realistic quality goals for me?

ProcessPAIR lets you compare your performance to the performance achieved by other people, and hence helps establishing realistic goals. In case you already have a good performance (green range), then probably you just have to keep your current performance. Otherwise, moving to the next range may be a realistic goal (from red to yellow, or from yellow to green).

In this example, it seems realistic to be in the green region for the defects found per size unit in unit test (<= 11 defects/KLOC) and process yield (>= 65% in the benchmarks used).

How can I change my process to meet those goals?

The simplest way to answer this question is to first look at the causes suggested in the Report View, as illustrated below (“Show only leaf causes” checked).

Screenshot 2016-05-01 15.35.57

Screenshot 2016-05-01 15.36.28

In this example, it is indicated a potential performance problem with both the process yield and the defect density in unit test, having in both cases the same root causes: potential performance problem with the Code Review Yield  and clear performance problem with the Design Review Yield.

Although the Code Review Yield is better than the the Design Review Yield, the first priority indicated by ProcessPAIR (based on a cost-benefit estimate)  is to improve the Code Review Yield. Since the Code Review Rate is already within the recommended range, the improvement actions should be based on other review best practices (e.g., improving review checklists to focus on the types of defects that escaped from code reviews).

In case you found deeper or different causes for the problems identified, you should devise actions for addressing them.

Share