**3. Methodology**

This section of the research focuses on data selection, variable selection, and the econometrics process employed throughout the study.

#### **3.1 Data source and variables**

Panel data on two endogenous variables form makes it possible to test the validity of EKC for CO2 in Brazil. Thus CO2 emissions (metric tons per capita) and GDP per capita growth (annual %). Carbon dioxide by the combustion of fossil fuels and cement manufacture. Carbon dioxide is created by the consumption of solid, liquid, and gas fuels, as well as gas flaring—also, the annual percentage growth rate of GDP per capita based on constant local currency. The totals are calculated using constant 2020 U.S. dollars. Gross domestic product divided by midyear population equals GDP per capita. The sum of gross value added by all resident producers in the economy, adding any product taxes, subtracting any subsidies not included in the value of the

items, is the GDP at the purchaser's price. It is estimated without considering the depreciation of manufactured assets or natural resource depletion and degradation. The dataset for this study covered 1990 to 2018 due to data constraints—our study's data source World Bank Development Indicators (WDI). The natural logarithms of the two endogenous variables are calculated.

#### *3.1.1 Model specification*

The study's functional nexus form is depicted as:

$$\text{'CO2} = f\left(\text{GDP}, \text{GDP}^2, \text{Z}\right) \tag{1}$$

Eq. 1 forms the basic conceptual foundation for examining the link between variables [73, 74].

According to Stern [42, 52] the standard environmental Kuznets curve hypothesis model is specified as:

$$\mathbf{E}\_{\rm it} = \mathbf{a}\_{\rm i} + \boldsymbol{\gamma}\_{\rm t} + \boldsymbol{\beta}\_{\rm 1} \mathbf{Y}\_{\rm i} + \boldsymbol{\beta}\_{\rm 2} \mathbf{Y}^2{}\_{\rm it} + \boldsymbol{\varepsilon}\_{\rm it} \tag{2}$$

Where E is the natural logarithm of carbon dioxide emissions, Y is the natural logarithm of GDP per capita, and t is the error term. I and t are nation indices and time, respectively. The use of logarithm necessitates a positive or negative prognosis for the experimental variable, which is appropriate.

The first two (2) terms on the right-hand side of the model are country and time impacts. While CO2 per capita may vary by the county at any particular income level, the sensitivity of all pollutants to income in almost all of Brazil at that level, according to country effects. The timing implications are viewed as time-varying omitted variables and random shocks that Brazil is experiencing.

#### *3.1.2 Lag length selection*

The initial step in cointegration is to choose an appropriate lag length criteria. As a result, we conducted a joint test of lag selection, which implies that we should take the two lags of each variable (based on AIC).

#### *3.1.3 Vector auto-regression estimates*

The word Autoregressive comes from the fact that the dependents variable's lagged values show on the right-hand side, and the term vector comes from the fact that the model includes a vector of two or more variables. By treating every variable in the model as endogenous and a function of the actual values of all endogenous variables in the system, the VAR approach avoids the necessity for structural modeling. The VAR is frequently used to anticipate systems of interconnected time series and to analyze the dynamic influence of random disturbances on the system of variables. The VAR model is specified as:

$$\text{LnCO}\_{2t} = \mathbf{a} + \sum\_{i=1}^{k} \phi\_i \ln \text{CO}\_{2t-I} + \sum\_{j=1}^{k} \eta\_j \ln \text{GDP}\_{t-j} + \sum\_{m=1}^{k} \phi\_m \ln \text{GDP}\_{t-m}^2 + \mu\_{\text{It}} \tag{3}$$

$$\text{LnGDPi}\_{\text{t}} = \text{b} + \sum\_{i=1}^{k} \rho\_i \ln \text{GDP}\_{\text{t}-I} + \sum\_{j=1}^{k} \rho\_j \ln \text{CO}\_{\text{2t}-j} + \sum\_{m=1}^{k} \phi\_m \ln \text{GDP}\_{\text{t}-m}^2 + \mu\_{2\text{t}} \tag{4}$$

$$\text{LnGDP}\_{\text{it}}^2 = \text{c} + \sum\_{i=1}^k \phi\_i \ln \text{GDP}\_{t-1}^2 + \sum\_{j=1}^k \phi\_j \ln \text{GDP}\_{t-j} + \sum\_{m=1}^k \phi\_m \ln \text{CO}\_{2t-m} + \mu\_{\text{3t}} \tag{5}$$

In the model, the dependent variable is a function of its lagged values and other variables' lagged values. Where k = the optimal lag length, a,b,c, = intercept, Lngdpt = β*i,*φ<sup>j</sup> ,ϕm,= short run dynamic coefficients of the model's adjustment long run equilibrium,

μ1t μ2t, μ3t represent the impulses, innovation or shocks often called the stochastic error term.

#### *3.1.4 Panel causality*

In the classical sense, regression does not imply causal interaction. As a result, investigating the causal flow within the variables. This is correct, given the test's predictive power. This study applies the widely utilized Granger causality test technique among the elements under investigation. When one variable, for example, X, Granger causes another, the implication is that variable X and its previous expression can forecast the outcome of variable Y, rather than only the historical variable of Y alone, as is generally thought in the literature. A bivariate relationship between (X, Y) can be expressed in a Granger-causality test.:

$$\mathbf{X}\_{t} = \mathbf{p}\_{0} + \mathbf{p}\_{1}\mathbf{X}\_{t-1} + \mathbf{p}\_{2}\mathbf{Y}\_{t-1} + \mathbf{e}\_{t} \tag{6}$$

$$\mathbf{Y}\_{\mathbf{t}} = \mathbf{p}\_0 + \mathbf{p}\_1 \mathbf{Y}\_{\mathbf{t}-1} + \mathbf{p}\_2 \mathbf{X}\_{\mathbf{t}-1} + \mathbf{e}\_{\mathbf{t}} \tag{7}$$

#### **3.2 Data analysis techniques**

The data analysis techniques adopted for the study follow the following simple steps. First, prior to examining the nature of the link between carbon dioxide emissions and economic growth, the study examined the sequences in which the two variables were integrated. The ADF unit root test by [75], PP-Fisher by [76] Im Pesaran, and Chin unit root test were used to check for stationarity. VAR model was used to check the individual endogenous variables' impact and the Wald test determined the collective impact of the variables. The model will prove to be stable through the VAR stability checks. The study made use of Pairwise Dumitrescu Hurlin panel causality (PDHPC) and Pairwise Granger causality test to test all the hypotheses. The Akaike Information Criterion is the lag order utilized for further estimations. Our study will employ the Pairwise Dumitrescu Hurlin panel causality (PDHPC) and Pairwise Granger causality test in the fourth and final phase based on the parameters stability test findings performed in the third phase.
