**5. Results**

#### **5.1 Determination of the examiners' ZAE and initial tests**

Data were obtained from a total of eleven (11) Examiners from the Electricity Division, to be fully analyzed. For each patent application of such examiners with at least one examination already carried out, all variables of interest were obtained, totaling eight hundred and fourteen (814) patent applications to be evaluated and making up the initial test sample. Data from the Initial Test Sample were standardized according to Eq. (1). **Figure 4** shows the structure of data from the initial test sample with standardized data.

In the initial test sample, a total number of 95 main subclasses was found over all 814 patent applications analyzed. However, when considering only the examiners' ZAE (subclasses including 5% or more applications for each examiner), 25 main subclasses were responsible for 636 applications, i.e., about 80% of the total evaluated. It is important to highlight that, given that 3 of such 25 subclasses had very low occurrences, 22 subclasses were used in the set of interest for evaluation, equivalent to 619 applications (76% of total). **Figure 5** shows each one of the 11 examiners' areas of expertise by IPC subclasses. Note that the gray area corresponds to the examiners' Specific Areas of Expertise (ZAE), while the white area corresponds to


#### **Figure 4.**

*Structure of the initial test sample with standardized data.*


**Figure 5.** *Examiners' areas of expertise by IPC subclass.*

subclasses that, despite not being part of the examiner's ZAE, are part of the ZAE of some of the other examiners evaluated.

**Figures 6** and **7** show the results obtained in the PCA.

By analyzing **Figure 6**, when using only the criterion considering eigenvalues higher than one, only the first four (4) components would be selected. However, these would be responsible for about 65% of total variance. It can also be verified that the range sharply drops when we get to the eigenvalue of component 6 (0.73). Additionally, the first five components explain a total variance of 75.05%. Hence, these first five components were selected for the next steps.

In **Figure 7**, the significant factors to each variable (very close to or above 0.4) were hatched in gray. In a first analysis, it can be noticed that component Y1 is quite detached from the others. In addition to explaining virtually 30% of all variance, the component is associated with variables directly related to the volume of data (pages of description, of claims, and of figures, in addition to the number of independent and dependent claims). Such fact is consistent with the initial hypothesis 1, related to the volume variables. As for components Y2 and Y3, although they may be slightly related to the volume of data, they basically represent the influences of variables, year of filing, number of inventors, and priorities. These components appear to be associated with development strategies, management of the applicants, and maturity of the technology involved. On the other hand, components Y4 and Y5


#### **Figure 6.**

*Eigenvalues and variances.*


**Figure 7.** *Weighting coefficients.*

#### *A Methodology for Evaluation and Distribution of Patent Applications to INPI-BR Patent… DOI: http://dx.doi.org/10.5772/intechopen.98400*

complement the others by being associated with the variables, number of subclasses, and pages of third party observations. Such components appear to represent specific influences of the technological area of the patent applications. This result is consistent with the initial hypothesis 2, related to the variables with complementary or indirect influences.

By applying the proposed redistribution logic, a new configuration of samples by examiner was obtained. **Figure 8** shows a comparison between the percentage of applications distributed to each examiner within their ZAE.

By analyzing **Figure 8**, it is possible to note that, for ten out of the eleven examiners, there was a significant increase in the number of applications distributed to them and pertaining to their own ZAE. Only examiner 9 had a small decrease (that could even be corrected with a fine adjustment), due to the fact that his data sample was significantly larger than that of the others. Thus, this new configuration seems to contribute to examiners to work within their specific fields of expertise and knowledge.

Finally, the IBD ratios of the original sample distribution and its redistribution were calculated. Through Eq. (12), an IBD equal to 0.86 for the original case and an IBD equal to 0.88 for the redistribution were obtained, i.e., there was an increase in the IBD with the new distribution, evidencing that the medians of the examiners' applications after redistribution are closer to the general division's median. Such fact corroborates the fact that with the new distribution, we have a tendency towards greater balance regarding volume of data/complexity of applications distributed to the examiners.

#### **5.2 Model validation: simulations using standard sample with time**

Similarly to the procedure carried out for the initial test sample, data of patent applications of the technological area regarding electrical engineering were obtained. However, as, in this case, we intend to obtain a standard sample with time to serve as a reference, all data were obtained from the time for examination form filled by a single examiner. For each patent application of such examiner, all variables of interest were obtained, for a total amount of fifty (50) patent applications, with all first actions already published. Data were collected between January and July 2020, with all sample applications using data from previous searches by international offices.

We note that from the ten possible variables to be analyzed, the only one that was not considered in this case was the number of pages of third parties observations, given that there is no application with such document available in the sample.

For the sensitivity analysis of the model, dozens of simulations were performed considering all cases: from the most complete one, with nine variables, to the simplest ones, with three variables. For all cases, simulations for each ten sample applications were performed, i.e., for each set of cases with three to nine variables, tests were performed considering 10, 20, 30, 40, and 50 patent applications of the standard sample. A minimum of ten test applications was chosen, as it is recommended that the sample should have a population at least larger than the


#### **Figure 8.**

*Percentage of applications within the ZAE before and after the new distribution.*

number of variables in order to apply the PCA method, provided that the larger the sample, in theory, the best for the model.

When executing the simulations, the correlations of all variables and of the IGC ratios with time were verified, and the IGC was calculated both using the criterion of 70% of the variance, being referred to as IGC70%, and using the criterion of eigenvalues higher than one, being referred to as IGCλ<sup>&</sup>gt;1. It is important to note that, when IGC70% and IGCλ><sup>1</sup> are equal, we will refer to it simply as IGC. Finally, an IGC related only to the principal component (Y1) of the cases, the most significant component in terms of variance, being referred to as IGCY1, was also calculated.

After executing all the simulations, and having a gamut of results for dozens of cases, the cases of more relevance and interest in terms of analyzed variables and their correlations with time were selected. Namely:


**Figures 9** and **10** show the results of eigenvalues and cumulative variances for all the five described Cases.

By analyzing **Figures 9** and **10**, it is possible to note that:

• Case 9 Var: simulations using 10 and 20 applications deviate from the others, and the cases using 30 to 50 applications are almost coincident, i.e., the

**Figure 9.** *Eigenvalues of cases 1 to 5.*

*A Methodology for Evaluation and Distribution of Patent Applications to INPI-BR Patent… DOI: http://dx.doi.org/10.5772/intechopen.98400*

eigenvalues of the components only tend to be stable starting in the sample with 30 applications. The same phenomenon can be observed by analyzing the cumulative variances of the samples. These results evidence that, for cases with nine variables that are intended to be executed, a sample of at least 30 patent applications, preferably 50 applications, is recommended to obtain better performance of the PCA method;


**Figures 11** and **12** show the results of correlations of the IGC with time.

#### **Figure 11.**

*Correlation of the IGC with the time for examination – cases 1 to 5.*

#### **Figure 12.**

*Correlation of the IGC with the time for examination – cases 1 to 5 – By samples.*

The analysis of **Figures 11** and **12** indicates that:


*A Methodology for Evaluation and Distribution of Patent Applications to INPI-BR Patent… DOI: http://dx.doi.org/10.5772/intechopen.98400*

0.80 and oscillates around that. It is also worth to highlight that both IGC curves show a good correlation with time even based on a sample with only 10 applications, a result that proves to be consistent with the profile of the eigenvalues and cumulative variances analyzed. Finally, it can be noticed that, although the correlation values of the three curves are close to each other, once again, the IGCY1 (in this case also equal to the IGCλ<sup>&</sup>gt;1) has an advantage over the others;


The analysis of **Figures 11** and **12** also indicates that the profile of the curves is quite similar, with an increasing trend for the correlation of the IGC with time in the beginning of all of them, i.e., when the number of variables decreases from nine to five, and, from five to three variables, the curves stabilize. These results reflect the previous analyses that showed that, when only the direct volume variables are selected (with their combinations varying), the trend was for obtaining higher and more stable correlations with time. Thus, although all nine variables of the study may contribute to the complexity of the patent application, in practice, the direct volume variables already represent well the examination effort/time.

It should also be noted that the correlations of the IGCY1 with time remained high and almost constant for any of the cases analyzed with samples with twenty applications or more, showing a quite stable behavior regardless of the sample size. Consequently, for the problem in particular, the results converge showing that the IGCY1 ratio seems more suitable to represent the examination effort/time. The results obtained suggest that Case 3 Var (2) is the one with the best cost–benefit relation for the performance of even more specific practical tests, whether because it captures the influences of the main variables of direct volume data, because it is simpler regarding obtaining and collecting data (as it does not require a division of the claims into independent and dependent), or because it has higher correlations of the IGC with time.

**Figure 13** shows the classifications of the Sample applications by time and by the IGCY1. **Table 1** shows the applications in which there was divergence in the classification.

**Figure 13.** *Classification of applications of the standard sample (with time).*


#### **Table 1.**

*Applications with divergent classifications.*

By analyzing **Figure 13**, it can be verified that, when classified by time for examination, none of the applications from the sample was considered to be neither very light nor very heavy. Most applications were classified as moderate (36 or 72%), then light (8 or 16%), followed by heavy (6 or 12%). This result proves to be consistent with data obtained, as the standard sample is fairly homogeneous, shows applications of the same type of examination (using data from previous searches), and the time variable presented a moderate coefficient of variation (16.58%). Similarly to the classification by time, when classified by the IGCY1, none of the patent applications from the sample was considered to be neither very light nor very heavy. Most applications were classified as moderate (33 or 66%), then heavy (9 or 18%), followed by light (8 or 16%).

By analyzing **Table 1**, it can be verified that there was a total of seven patent applications with conflicting classifications by time and IGCY1. Therefore, this result shows that the classification of 43 of the 50 applications (86% of the total)

#### *A Methodology for Evaluation and Distribution of Patent Applications to INPI-BR Patent… DOI: http://dx.doi.org/10.5772/intechopen.98400*

converged, in other words, it shows great similarity. It is important to note that the correlation of the IGCY1 with time for the case under analysis was 0.85, i.e., the classification criterion proved to be efficient, managing to keep up with the capture tendency of this relation of the ratio with the time. More specifically to the differences found, there were four new classifications according to the IGCY1 as heavy (applications 8, 11, 20, and 26). Such applications have higher than average number of claim pages, total number of pages, and total number of claims, showing a profile similar to the other five heavy applications with similar classifications and, so, its classification as heavy according to the IGCY1 is warranted. On the other hand, it can be noticed that applications 20 and 26 showed IGC values close to one, i.e., to the classification limit between the moderate and heavy ranges. Consequently, there are two factors that may possibly explain this phenomenon: i) errors inherent in the mathematical model, which, although in small amounts, tend to occur depending on the variables, samples, and criteria adopted; and ii) measurement errors or deviations in time, which could move a classification close to the limit of the ranges.

Regarding application 10, classified as light according to the IGCY1 and as moderate according to time, it is possible to note it is indeed a quite short application that, at first, would actually tend to be classified as light. Specifically in this case, the standardized time was very near to minus one (standardized time = 0.91), i.e., quite near to the limit between moderate and light classes. Unlike the conflicting heavy cases (in which deviations probably occurred for reasons inherent in the model), in this case the tendency is that time measurement deviations may have caused the discrepancy.

In the case of application 3, classified as moderate according to the IGCY1 and as light according to time, we notice that it is an application with few claims and few claim pages, but with a quite high count of total pages. Depending on the specific examination procedure and the need to better understand the description and the figures, time may lead to a moderate or light classification. Hence, it is a type of application difficult to classify *a priori* and, in this case, the model was more conservative, classifying it as moderate.

Finally, application number thirty, classified as moderate according to the IGCY1 and heavy according to time, showed a IGCY1 virtually equal to one (IGCY1 = 0.995), reaching the exact limit between the moderate and heavy classification ranges. It ends up being a case similar to the discrepancies of the heavy applications, due to issues inherent in the mathematical model.

In short, it can be verified that the model manages to represent quite satisfactorily the examination time/effort, and for cases in the threshold of the criteria adopted, few discrepancies occur, and, in these cases, the discrepancies occur only in the adjacent ranges. In other words, any discrepancies that occur are occasional, not rough, and reasonable given the limitations inherent in this kind of model and research.

### **5.3 Analysis of the redistribution logic: simulations with the final redistribution sample**

Ten patent applications of ten examiners under analysis were selected to compose the final redistribution sample, amounting to one hundred (100) patent applications to be examined. The steps of the proposed methodology were strictly followed, but, due to the fact that the purpose of this case was to obtain a sample to apply the model validated with the standard sample with time (our reference), all variables of interest were obtained of first examinations already published, collected between May and July 2020, and all sample applications are also using data from previous searches by international offices (in the context of the "backlog

#### **Figure 14.**

*classifications of the Redistribution sample applications.*


#### **Figure 15.**

*Percentage of applications within the ZAE before and after the new distribution.*

combat plan", i.e., without executing specific prior art search). The case chosen for the redistribution simulation was Case 5 – Case 3 Var (2), given that this case obtained the best results in the validation tests with the standard sample with time. **Figure 14** shows the classifications of the redistribution sample applications.

None of the sample patent applications were classified as very light. Most applications were classified as moderate (71%), followed by light (16%), heavy (11%), and, finally, very heavy, which were only two (2%). It should be noted that the data on the IGCY1 showed normal statistical distribution, similarly to the time and the IGCY1 of the standard sample.

**Figure 15** indicates that with the redistribution there was a better balance in the concentration of applications within the ZAE of each examiner. We note that in the case of six out of the ten examiners, there was an increased number of applications distributed to them and pertaining to their own ZAE, with an emphasis on examiners 5, 6, 9 and 10, with significant increases. Only examiner 8 remained with a poor concentration (10%) of applications within his ZAE, which may be explained by the fact that this examiner is a more "versatile" examiner of the division, and does not have such a well defined ZAE. Thus, the results suggest that this new configuration contributes for the examiners to work within their specific fields of expertise and knowledge.

To complement the cycle of the methodology and make a last comparison between the distributions, the IBD ratios of the original sample distribution and of its redistribution were calculated. Eq. (12) resulted in an IBD equal to 0.83 for the original case and an IBD equal to 0.9 for the redistribution, i.e., there was an increased IBD with the new distribution, showing that the medians of the applications of the examiners after redistribution get closer to the general median of the division. This corroborates the fact that the new distribution results in a trend for better balance regarding the volume of data and time/effort of the applications distributed to the examiners.

#### **6. Final considerations**

In this study, ten possible variables were identified, relevant to the evaluation and distribution of the patent applications to the examiners. Among these variables,

#### *A Methodology for Evaluation and Distribution of Patent Applications to INPI-BR Patent… DOI: http://dx.doi.org/10.5772/intechopen.98400*

the ones directly related to the voluminosity of a patent document, i.e., the volume of data that the examiner has to deal with when examining patents, were identified, namely: the number of pages of description, the number of claim pages, the number of pages of figures, the number of independent claims and the number of dependent claims.

With the application of the PCA in a first data sample, referred to as Initial Test Sample, it was verified that the components were consistent with the initial hypotheses. Based on this initial sample, containing a large number of applications examined over two years, the examiners' Specific Areas of Expertise (ZAE) were determined, that is, the IPC subclasses (technological areas) they examine the most according to their knowledge and work experience. These ZAE are highly relevant, as these subclasses are one of the criteria used to distribute patent applications to the examiners, and their comparison before and after any redistribution is important.

The patent applications were also classified in up to five classes: very light, light, moderate, heavy, and very heavy, and the classification had as a reference the IGC values, considering ranges equivalent to the average ratio plus one, three or more standard deviations. Then the applications were redistributed with emphasis on the examiners' ZAE and on the classifications. The results show that the medians of the examiners' applications approached the general medians of the division, suggesting that the new distribution is more balanced in volume of data than the original one. Moreover, with the new distribution, the examiners had the majority of their applications allocated within their respective ZAE, i.e., they would examine more applications in their specific areas of knowledge and preference, also suggesting that the new distribution contributes to better efficiency, quality, and motivation.

Additionally, the results obtained suggest that, although the five variables directly related to volume of data tend to be the ones that mostly impact the examination process, all ten variables selected, to some extent, influence the analysis of complexity of patent applications.

On the other hand, as complexity is something relative, to investigate if this complexity indeed captures the examination time/effort, a sensitivity analysis of the model developed was performed in order to verify the correlations of variables and IGC with time. In order to do so, it was then necessary to obtain a new sample of patent applications, referred to as Standard Sample, now with the additional collection of the examination time variable. In this context, simulations considering different variables and standard sample sizes were performed, with application of the PCA method and the model developed, including calculation of the IGC with different criteria and their correlations with time. The results obtained suggest that, for our specific problem, the IGC with greater efficiency and stability was IGCY1, i.e., using only the first principal component, the one which is most representative as to total data variance.

It is also worth noting that the case including only three variables (number of claim pages, total number of pages, and total number of claims) is the one recommended to perform even more specific practical tests, whether because it captures the influences of the main variables of direct volume of data, given the simplicity for data acquisition and collection (as it does not require separation of independent claims from the dependent ones), or because it has consistently higher correlations of the IGCY1 with time, always close to 0.85.

Based on this new sample with the collection of the time for examination, the patent applications were once again classified into the five classes defined (very light, light, moderate, heavy and very heavy). Such classifications were carried out twice, the first time using the time for examination variable as a reference, i.e., the standard reference classification, and the second time using the IGCY1 ratio, i.e., the classification suggested by the model. Upon comparison between these classifications according to time and the classifications of the model, the results showed a strong similarity, as the model correctly classified 43 out of the 50 patent applications analyzed, a total of 86%.

After testing the mathematical model and the criteria for classification with the correlations with time, the following step was to perform a first practical complete redistribution test. In order to do so, it was necessary to collect a third and final sample, referred to as Final Redistribution Sample, with 100 patent applications, being 10 applications of 10 different examiners, all using data from previous searches by international offices, so that the profile of this new sample was similar to the profile of the standard sample, as our reference had already been tested. Based on this new sample, we determined the main central tendency statistics of the samples by examiner and calculated the Distribution Balancing Ratios (IBD) both for the original distribution and for the sample redistributed according to the IGCY1.

The results obtained with the new redistribution showed that there was a better balance in the examination concentration within the ZAE of each examiner, and in the samples of six out of the ten examiners analyzed, there was an increased number of applications distributed to them and pertaining to their own ZAE. Thus, there is evidence that this new configuration contributes for the examiners to work within their specific fields of expertise and knowledge and, consequently, to their efficiency and motivation. It should also be noted that the new redistribution produced a positive effect on the medians of the examiners'samples, which was mathematically quantified by calculating the IBD, which, in the original distribution, had a value of 0.83 and, after redistribution, increased to 0.90.

In short, our results suggest that the mathematical model is able to represent quite satisfactorily the examination time/effort for patent applications. Also, the logic proposed managed to achieve the goal of better balancing the examiners' workload distribution.

## **Author details**

Cesar Vianna Moreira Júnior<sup>1</sup> \*, Daniel Marques Golodne<sup>2</sup> and Ricardo Carvalho Rodrigues<sup>1</sup>

1 Brazilian National Institute of Industrial Property (Intellectual Property Academy), Rio de Janeiro, Brazil

2 Brazilian National Institute of Industrial Property (Patent Office—DIRPA), Rio de Janeiro, Brazil

\*Address all correspondence to: cvmjunior@gmail.com

© 2021 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*A Methodology for Evaluation and Distribution of Patent Applications to INPI-BR Patent… DOI: http://dx.doi.org/10.5772/intechopen.98400*
