Total and Partial Differentials as Algebraically Manipulable Entities

*Maria Isabelle Fite and Jonathan Bartlett*

## **Abstract**

Differential operators usually result in derivatives expressed as a ratio of differentials. For all but the simplest derivatives, these ratios are typically not algebraically manipulable, but must be held together as a unit in order to prevent contradictions. However, this is primarily a notational and conceptual problem. The work of Abraham Robinson has shown that there is nothing contradictory about the concept of an infinitesimal differential operating in isolation. In order to make this system extend to all of calculus, however, some tweaks to standard calculus notation are required. Understanding differentials in this way actually provides a more straightforward understanding of all of calculus for students, and minimizes the number of specialized theorems students need to remember, since all terms can be freely manipulated algebraically.

**Keywords:** differentials, differential operators, derivatives, partial derivatives, total derivatives

## **1. Introduction**

Derivatives are usually written in a notation, such as <sup>d</sup>*<sup>y</sup>* <sup>d</sup>*<sup>x</sup>*, where the notation implies that there are two distinct values, d*y* and d*x*, at play. Historically, d*y* and d*x* were considered infinitesimal values—values so small that they are practically zero, but not quite zero, and often became real numbers when put in ratio with each other. This understanding was challenged by practitioners who thought that infinitesimal values were insufficiently rigorous to be used in mathematics.

This led to a reconsideration of derivatives using the concept of a limit. In the limit definition of the derivative, the d*y* and d*x* terms do not have independent existences, but exist only within the ratio itself. In this conception, the ratio is merely suggestive of how the derivative was originally produced but does not represent an actual quotient of two distinct values. The limit definition of the derivative has been reinforced by the fact that treating differentials as distinct values leads to contradictions in many cases.

However, the work of Abraham Robinson in the 1960s showed that there was no fundamental flaw in expanding the number system to include infinitesimals. The hyperreal numbers are an extension of the real numbers which allows for infinitesimals and infinities to be constructed in a manner equally rigorous with the real numbers. Additionally, unlike other conceptions of infinities, the hyperreal numbers have an additional advantage that infinitesimals and infinities can be manipulated using arithmetic and algebraic operations.

However, if infinitesimals can be readily considered without contradiction, why does the notation for derivative operations often lead to contradiction? The flaw here is actually in the notation itself. Because the notation was not considered factual but merely suggestive, practitioners tended to ignore the problematic cases rather than solve them. By considering new and more rigorous approaches to notation, a better notation can be developed which includes infinitesimal values, removes the contradictions, and provides a more straightforward understanding of differential notation and formulas. In these new formulations, differentials such as d*y* and d*x* are fully independent, algebraically manipulable entities.

## **2. Problem of separating differentials in modern Leibniz notation**

While the problems that occur when trying to separate differentials in modern Leibniz notation are well-known, it is worth revisiting them briefly. First of all, it is interesting to note that there are essentially no inconsistencies or contradictions when dealing with first-order total differentials. For instance, taking the equation *<sup>y</sup>* <sup>¼</sup> *<sup>x</sup>*3, the derivative is <sup>d</sup>*<sup>y</sup>* <sup>d</sup>*<sup>x</sup>* <sup>¼</sup> <sup>3</sup>*x*2. Since the derivative of the inverse function is <sup>d</sup>*<sup>x</sup>* d*y* , this can be found simply by inverting both sides of the equation, so that <sup>d</sup>*<sup>x</sup>* <sup>d</sup>*<sup>y</sup>* <sup>¼</sup> <sup>1</sup> d*y* d*x* <sup>¼</sup> <sup>1</sup> <sup>3</sup>*<sup>x</sup>*2. Likewise, integrating is often preceded by multiplying both sides by a differential, so that <sup>d</sup>*<sup>y</sup>* <sup>d</sup>*<sup>x</sup>* ¼ <sup>3</sup>*x*<sup>2</sup> becomes d*<sup>y</sup>* <sup>¼</sup> <sup>3</sup>*x*<sup>2</sup> <sup>d</sup>*x*.

The problems become more apparent on higher-order derivatives. The typical notation for the second derivative of *<sup>y</sup>* <sup>¼</sup> *<sup>x</sup>*<sup>3</sup> is d2 *y* <sup>d</sup>*x*<sup>2</sup> ¼ 6*x*. However, if the notation were taken seriously, this would be seen as a quotient of the higher-order differential d<sup>2</sup> *y* and the square of d*x*. Doing this, however, would break the chain rule. For instance, if you had *x* ¼ *t* 2, then you could calculate d2 *y* <sup>d</sup>*t*<sup>2</sup> by simply multiplying d2 *y* <sup>d</sup>*x*<sup>2</sup> by <sup>d</sup>*<sup>x</sup>* d*t* <sup>2</sup> . Doing so, however, yields an incorrect second derivative of d2 *y* <sup>d</sup>*t*<sup>2</sup> ¼ 24*t* <sup>4</sup> rather than the correct d2 *y* <sup>d</sup>*t*<sup>2</sup> ¼ 30*t* 4. This is normally calculated using the chain rule for the second derivative (or higher derivatives using Fa'a di Bruno's formula [1]). While the second derivative chain rule works, it provides no algebraic intuition for why it works, and seems to be in conflict with the idea of treating differentials as separable values.

Dealing with partial derivatives brings up innumerable problematic cases even for the first derivative. If *f* is a function of *x* and *y*, and *x* and *y* are both functions of *t*, then the total derivative of *f* with respect to *t* is <sup>d</sup>*<sup>f</sup>* <sup>d</sup>*<sup>t</sup>* <sup>¼</sup> *<sup>∂</sup><sup>f</sup> ∂x* d*x* <sup>d</sup>*<sup>t</sup>* <sup>þ</sup> *<sup>∂</sup><sup>f</sup> ∂y* d*y* d*t* . Since *x* is a function of one variable, *<sup>∂</sup><sup>x</sup> <sup>∂</sup><sup>t</sup>* <sup>¼</sup> *dx dt* (likewise for *<sup>y</sup>*). Then the equation becomes <sup>d</sup>*<sup>f</sup>* <sup>d</sup>*<sup>t</sup>* <sup>¼</sup> *<sup>∂</sup><sup>f</sup> ∂x ∂x <sup>∂</sup><sup>t</sup>* <sup>þ</sup> *<sup>∂</sup><sup>f</sup> ∂y ∂y ∂t* . Treating the partial differentials as distinct values, this reduces to <sup>d</sup>*<sup>f</sup>* <sup>d</sup>*<sup>t</sup>* <sup>¼</sup> *<sup>∂</sup><sup>f</sup> <sup>∂</sup><sup>t</sup>* <sup>þ</sup> *<sup>∂</sup><sup>f</sup> ∂t* . <sup>1</sup> Now

<sup>1</sup> A possible objection is that the *<sup>∂</sup><sup>x</sup>* in *<sup>∂</sup><sup>f</sup> <sup>∂</sup><sup>x</sup>* may not be the same infinitesimal as the *<sup>∂</sup><sup>x</sup>* in *<sup>∂</sup><sup>x</sup> ∂t* . However, the value of *<sup>∂</sup><sup>f</sup>* depends on the value of the *<sup>∂</sup><sup>x</sup>* in *<sup>∂</sup><sup>f</sup> <sup>∂</sup>x*, and the value of the *<sup>∂</sup><sup>x</sup>* in *<sup>∂</sup><sup>x</sup> <sup>∂</sup><sup>t</sup>* depends on *<sup>∂</sup>t*. So one could choose the *<sup>∂</sup>x*s to be equal, and the values of *<sup>∂</sup><sup>f</sup>* and *<sup>∂</sup><sup>t</sup>* would adjust accordingly, leaving the values of *<sup>∂</sup><sup>f</sup> <sup>∂</sup><sup>x</sup>* and *∂x <sup>∂</sup><sup>t</sup>* unchanged.

that it is expressed in terms of a single variable, <sup>d</sup>*<sup>f</sup>* <sup>d</sup>*<sup>t</sup>* <sup>¼</sup> *<sup>∂</sup><sup>f</sup> ∂t* , so this yields <sup>d</sup>*<sup>f</sup>* <sup>d</sup>*<sup>t</sup>* <sup>¼</sup> <sup>d</sup>*<sup>f</sup>* <sup>d</sup>*<sup>t</sup>* <sup>þ</sup> <sup>d</sup>*<sup>f</sup>* <sup>d</sup>*<sup>t</sup>* <sup>¼</sup> <sup>2</sup> <sup>d</sup>*<sup>f</sup>* d*t* . Dividing both sides by <sup>d</sup>*<sup>f</sup>* <sup>d</sup>*<sup>t</sup>* yields the contradiction 1 ¼ 2.

As will be described, the issues in these problematic cases stem from deficiencies in the notation, not deficiencies in the concept of differentials as infinitesimals nor in the idea that differentials can be considered independently of each other. By taking a more rigorous approach to the development of the notation of higher order derivatives and partial derivatives, a straightforward notation can be obtained which enables differentials to be considered as fully distinct values.

## **3. Historical formal definitions of the derivative**

The derivative of a function measures how the function changes as the independent variable varies. For instance, if the derivative of a function *f x*ð Þ is 3 when *x* ¼ 5, that means *f x*ð Þ is increasing at a rate of 3 units up to every 1 unit across whenever *x* is 5. Another way to say the same information is that the function's slope at *x* ¼ 5 is 3*=*1 ¼ 3.

Normally, slope is defined with reference to two points. When measuring velocity, for instance, which is the ratio of the change in position to the change in time, one would measure two different times with their positions and compare them. The derivative attempts to calculate the slope using only one point together with an equation. Since only one point is used, the change in *x* is infinitely small, and so is the change in *y*. Different ways of dealing with these infinities lead to different formal definitions of the derivative.

## **3.1 Newton's definition**

Isaac Newton provided one of the first definitions of a derivative in his book *Methodus fluxionum et serierum infinitarum*, or "The Method of Fluxions and Infinite Series" in English [2, 3]. Newton thought of his graphs as being drawn over time, with the *x*-coordinate increasing at a constant speed while the rate of increase in the *y*-coordinate varied. A variable's rate of change with respect to time (what we would now call a derivative with respect to time) was called a "fluxion," which was denoted by applying a dot above a variable, such as *x*\_ (which represents the derivative of *x* with respect to time) [3].

To avoid having to define an infinitely small quantity, Newton worked with full derivatives, ratios of infinitesimals. Since Newton assumed all his variables depended on time, he could then switch out the infinitesimal change in *x* and change in *y* for the change in *x* over time and the change in *y* over time, which were both real numbers. The ratio remained the same, and the infinities were avoided [3].

## **3.2 Leibniz's definition**

Unlike Newton, Gottfried Leibniz preferred to consider the change in *x* and the change in *y* separately. He used the notation d*x* for an infinitesimal difference in *x* and d*y=*d*x* for a ratio of infinitesimals, which represented the slope of a curve at a point. Leibniz considered d an operator, with d*x* ¼ dð Þ *x* being the output of d acting on the variable *x*. This allowed him to apply d more than once, resulting in d<sup>2</sup> *x* ¼ d dð Þ ð Þ *x* ,

d3 *<sup>x</sup>* <sup>¼</sup> ddd ð Þ ð Þ ð Þ *<sup>x</sup>* , and so on. Just like d*<sup>x</sup>* was infinitely smaller than *<sup>x</sup>*, Leibniz said d*<sup>n</sup> x* was infinitely smaller than d*n*�<sup>1</sup> *x* [3].

Although his calculus relied on the concept of an infinitesimal, Leibniz regarded infinitesimals as only "purely ideal entities... useful fictions, introduced to shorten arguments and aid insight" [3]. However, Leibniz was never able to rigorously define his infinitesimals nor how they behaved. Therefore, while they seemed to work well, the lack of clarity caused some skeptics to regarded them with suspicion, ridiculing them as "ghosts of departed quantities" [4].

## **3.3 Delta-epsilon (limit) definition**

Concerns about the fishy nature of infinitesimals, treated like nonzero numbers when dividing but also like zero when adding, led to the reformulation of calculus using the idea of limits. The limit of *f x*ð Þ as *x* approaches *a* is the value *f x*ð Þ approaches as *x* becomes closer to *a*.

More precisely, the limit of *f x*ð Þ as *x* approaches *a* is *L* if for any given positive number *ε* there is a corresponding positive number *δ* such that the difference between *f x*ð Þ and *L* is less than *ε* whenever the difference between *x* and *a* is less than *δ* [5].

Limits can then be used to define the derivative of a function *f x*ð Þ as

$$f'(\mathbf{x}) = \lim\_{h \to 0} \frac{f(\mathbf{x} + h) - f(\mathbf{x})}{h} \tag{1}$$

When limits are used to define a derivative, it makes no sense to pull apart the change in *x* and the change in *y*, as both the limit of the numerator and the limit of the denominator evaluate to zero, and division by zero is undefined.

## **4. Hyperreal numbers and the definition of the derivative**

While the limit definition of a derivative solves the philosophical problems of infinitesimals, it does not allow the change in *y* to be separated from the change in *x*. This led Abraham Robinson to return to Leibniz's infinitesimals in 1958, putting them on a new set-theoretic foundation and creating the field of nonstandard analysis [3].

While there are different ways to construct hyperreal numbers, the approach we will take here is based on the set theory approach described by Herrmann in [6], with many of the definitions taken from there as well. We will begin by describing hyperreal numbers (including infinitesimals), and then describe the differential operator as being an operator that can be applied using infinitesimals.

For defining the infinitesimals, the core idea is to take the set of all infinitely long sequences of real numbers, denoted <sup>ℕ</sup>. Some of these sequences match other sequences so closely they can be considered equivalent. Each real number is then assigned to a set of equivalent sequences. Then, some of the remaining sets of equivalent sequences can be assigned to infinitesimals. Finally, all the operations normally done on real numbers can be translated to operations between sets of equivalent sequences.

## **4.1 Filters, the cofinite filter, and free ultrafilters: Defining big enough**

A *filter* provides a way to classify subsets of a set as either big enough or not big enough.

Let *X* be a nonempty set. A nonempty subset *F* of the set of all subsets of *X* is a proper filter on *X* if and only if:

$$(\text{i) for each } A, B \in F, A \cap B \in F \tag{2}$$

$$(\text{ii) if } A \subset B \subset X \text{ and } A \in F \text{, then } B \in F \tag{3}$$

$$(\text{iii}) \not\subset \emptyset \not\subset F \tag{4}$$

The *cofinite filter C* is defined as

$$C = \mathfrak{x} \mid (\mathfrak{x} \subset X) \text{ and } (X - \mathfrak{x})\text{ is finite} \tag{5}$$

where *X* is an infinite set. *C* is called the cofinite filter because a subset *x* of *X* gets to be in the filter *C* if and only if *X* without *x* is a finite set. *C* gives a mathematical way to define whether an infinite set is considered big enough.

For instance, if *C* is the cofinite filter on , the real numbers, the set of all integers ℤ is not big enough to be in *C*, even though it is an infinite subset of , because there are infinitely many real numbers that are not integers. However, <sup>∗</sup> , the real numbers excluding zero, is big enough to be a member of *C*, because there is only one real number, zero, that is not in the real numbers excluding zero.

An *ultrafilter* is the biggest filter on a given infinite set *X*. An ultrafilter that has *C* as a subset is called a *free ultrafilter*.

## **4.2 Equivalence classes of ℕ: Classifying equivalent sequences together**

Let <sup>ℕ</sup> represent the set of all sequences with domain ℕ and range values in . Let *<sup>A</sup>* and *<sup>B</sup>* be two sequences in <sup>ℕ</sup>. *<sup>A</sup>* is said to be equivalent to *B A*ð Þ <sup>¼</sup>*UB* if a sufficiently large number of their elements match, or

$$A =\_U B \Leftrightarrow n \mid \{A\_n = B\_n\} = S \in U \tag{6}$$

The free ultrafilter *U* determines whether the set of matching elements is big enough.

This relation <sup>¼</sup>*<sup>U</sup>* is an equivalence relation on <sup>ℕ</sup>, so it can partition <sup>ℕ</sup> into equivalence classes. Each equivalence class ½ � *<sup>A</sup>* contains all the sequences in <sup>ℕ</sup> that are equivalent to *A*, including *A* itself.

The set of all these equivalence classes is called the set of the hyperreal numbers, denoted <sup>∗</sup> .

## **4.3 Connecting the real numbers to the hyperreals**

We can define a function *f* that takes each *x*∈ and gives the unique ½ � *R* , where f g *n*j*Rn* ¼ *x* ∈ *U*. This function *f* assigns to each real number *x* a hyperreal number ½ � *R* , namely that set of all sequences where a sufficiently large number of each sequence's elements is *<sup>x</sup>*. Often, *f x*ð Þ is represented by <sup>∗</sup> *<sup>x</sup>*. For instance, the hyperreal <sup>∗</sup> 3 is the set of all sequences equivalent (¼*U*) to 3,3,3, f g … .

Most applications of math use real numbers, so it is helpful to define the subset of the hyperreals that corresponds to the real numbers. The image of a subset *X* of under *f* is denoted *<sup>σ</sup>X*. Each hyperreal number <sup>∗</sup> *x* in *<sup>σ</sup>X* corresponds to a real number *x* in *X*. Since is a subset of , *<sup>σ</sup>* is the subset of the hyperreals that corresponds to the real numbers.

## **4.4 Operations on the hyperreals**

In order for algebra in <sup>∗</sup> to replace algebra in the real numbers, operations like + and �, among others, have to be defined between members of <sup>∗</sup> . It is also useful to define the relation ≤ and the absolute value function.

Let *<sup>a</sup>*, *<sup>b</sup>*, and *<sup>c</sup>* be elements of <sup>∗</sup> , and let <sup>∗</sup> <sup>þ</sup> : <sup>∗</sup> !<sup>∗</sup> be defined as

$$a^\* + b = c \Leftrightarrow \{ n \, | \, A\_n + B\_n = C\_n \} \in U \tag{7}$$

for any *An* ∈ *a*, *Bn* ∈ *b*, and *Cn* ∈*c*. That is, the sum of 2 elements of <sup>∗</sup> , *a* and *b*, are equal to another element of <sup>∗</sup> , *c*, if and only if a sufficiently large number of the elements of the sequences *An* þ *Bn* and *Cn* match, for any sequence *An* in *a*, *Bn* in *b*, and *Cn* in *<sup>c</sup>*. Hyperreal multiplication ( <sup>∗</sup> �) can be defined similarly.

To construct a hyperreal greater than relation, for each *<sup>a</sup>* <sup>¼</sup> ½ � *<sup>A</sup>* ,*<sup>b</sup>* <sup>¼</sup> ½ � *<sup>B</sup>* <sup>∈</sup> <sup>∗</sup> define

$$a^\* \le b \Leftrightarrow \{ n \mid A\_n \le B\_n \} \in U \tag{8}$$

*a*<sup>∗</sup> ≤*b* if and only if, given any sequence in *a* and any sequence in *b*, a sufficiently large number of elements in *a*'s sequence are less than or equal to their corresponding elements in *b*'s sequence.

These operations establish the structure <sup>∗</sup> , <sup>∗</sup> <sup>þ</sup> , <sup>∗</sup> � , <sup>∗</sup> h i <sup>≤</sup> as a totally ordered field, with 0½ � as the identity for <sup>∗</sup> <sup>þ</sup> and 1½ � as the identity for <sup>∗</sup> � ([6], p. 11).

Finally, the absolute value function can be defined for members of *a*∈ <sup>∗</sup> with

$$a^\*|a| = |a| = b \Leftrightarrow \{ n \, |A\_n| = B\_n | \} \in U \tag{9}$$

The absolute value of a hyperreal number *a* is a hyperreal number *b* if and only if, given a sequence in *a* and a sequence in *b*, a sufficiently large number of elements in *b*'s sequence match the absolute value of their corresponding elements in *a*'s sequence.

In summary, þ, �, ≤ and the absolute value function, which are defined on the real numbers, can be translated to operations on the hyperreal numbers.

## **4.5 Infinitesimals in the hyperreals**

Not all of the members of <sup>∗</sup> correspond to real numbers, because not all sequences of real numbers are constant sequences. Some of the remaining hyperreals correspond to infinitesimals.

A hyperreal number *a* is infinitely large if

$$\mathbf{x}^\* \propto |a| \text{for each } \mathbf{x} \in \mathbb{R} \tag{10}$$

or in other words, if its absolute value is bigger than every hyperreal that corresponds to a real number.

A hyperreal number *b* is an infinitesimal or as Newton stated infinitely small if

$$0 \le |b| <^\* \pi \text{ for each } 0 < \mathfrak{x} \in \mathbb{R}. \tag{11}$$

Similarly, a hyperreal is an infinitesimal if its absolute value is bigger than or equal to <sup>∗</sup> 0 and yet smaller than every hyperreal that corresponds to a positive real number.

Notice that <sup>∗</sup> 0, which is the equivalence class that contains 0,0,0, f g … , is the trivial infinitesimal.

For a nontrivial example of an infinitesimal, consider the equivalence class *g* containing the sequence 0, 1, <sup>1</sup> <sup>2</sup> , <sup>1</sup> <sup>3</sup> , <sup>1</sup> <sup>4</sup> … . "Then *<sup>g</sup>*6¼<sup>∗</sup> 0. Now for each *<sup>x</sup>*<sup>∈</sup> <sup>þ</sup> there is some *<sup>m</sup>* <sup>∈</sup>ℕ, *<sup>m</sup>* 6¼ 0 such that 0<sup>&</sup>lt; <sup>1</sup> *<sup>m</sup>* <sup>&</sup>lt;*x*. Thus <sup>∗</sup> <sup>0</sup><sup>&</sup>lt; <sup>∗</sup> <sup>1</sup> <sup>∗</sup> *<sup>m</sup>* <sup>&</sup>lt; <sup>∗</sup> *<sup>x</sup>*. ... [and] *<sup>g</sup>* is an infinitesimal" ([6], p. 17).

## **4.6 Division with infinitesimals**

If infinitesimals are smaller than every real number, can you still divide by them? Consider a nonzero infinitesimal, say *ε*, and a sequence in *ε*, say *A*. Even if some of *<sup>A</sup>*'s elements are zeros, *<sup>ε</sup>*6¼<sup>∗</sup> 0, so the set of all zeros in *<sup>A</sup>* is not big enough to be in the ultrafilter *U*. So, the nonzero elements of *A are* in *U*, since *U* is an ultrafilter. It is then possible to define another sequence *<sup>B</sup>* where *Bn* <sup>¼</sup> <sup>1</sup> *An* if *An* 6¼ 0 and *Bn* ¼ 0 if *An* ¼ 0. *B* satisfies the property ½ � *<sup>A</sup>* <sup>∗</sup> � ½ �¼ *<sup>B</sup>* ½ � <sup>1</sup> , and so ½ � *<sup>B</sup>* is the multiplicative inverse of ½ � *<sup>A</sup>* .

In summary, even if there are sequences in *ε* with zeros, ½ � <sup>1</sup> *<sup>ε</sup>* is still defined, and so it is still possible to divide by *ε* ([6], p. 11).

## **4.7 The standard and principal part functions**

Hyperreal expressions can be converted into real expressions using the standard part function, stðÞ, which yields the closest real number to the hyperreal expression. The standard part of an infinitesimal number is always zero. For infinite values, the standard part yields þ∞ or �∞, which is the non-specific infinity indicating that the value is out of range of the real numbers.

The principal part function, ptðÞ, will yield the most significant component of a hyperreal expression [7]. In a hyperreal expression, imagine *ω* representing a benchmark infinite value, with *<sup>ε</sup>* <sup>¼</sup> <sup>1</sup> *<sup>ω</sup>* representing an associated benchmark infinitesimal. The hyperreal expression �2*ω*<sup>2</sup> <sup>þ</sup> *<sup>ω</sup>* � <sup>5</sup> <sup>þ</sup> <sup>3</sup>*<sup>ε</sup>* represents four different orders of infinity. The most significant one is �2*ω*2, and, thus, it is the principal part. For the infinitesimal expression 5*ε*<sup>2</sup> <sup>þ</sup> *<sup>ε</sup>*3, 5*ε*<sup>2</sup> is the principal part.

The principal part of a hyperreal expression is important because non-principal parts, being infinitely less significant than the principal part by definition, do not affect the large-scale behaviors of smooth and continuous functions.

## **4.8 Differentials and derivatives using hyperreals**

The derivative of a function *<sup>y</sup>* <sup>¼</sup> *f x*ð Þ using the hyperreals is denoted <sup>d</sup>*<sup>y</sup>* <sup>d</sup>*<sup>x</sup>*, the change in *y* divided by the change in *x*, just like using Leibniz's notation. However, we can actually define the differentials themselves as infinitesimals, without referring to ratios.

Many have a hard time conceiving of just what a differential is and means. It is easy enough to say that a differential is an infinitesimal, but how exactly are individual differentials defined, especially when not being examined in the context of a derivative? What exactly does the higher-order notation d<sup>2</sup> *y* mean?

Let us first remember that, in order to be in a relation, two (or more) variables have to be related to each other in some way. Therefore, we can imagine some variable, let us call it *q*, not explicitly mentioned in the equation, which is in some sense the "ultimate" independent variable.

Note that this variable does not need to be explicitly defined. In fact, it is better if it is not defined explicitly. The reason for this is that defining *q* explicitly means that there is some chance that there exists yet another deeper, more fundamental variable. What we are looking for is the deepest, most fundamental, most independent variable. Keeping *q* as a hypothetical independent variable means that our reasoning will continue to hold in the face of finding more and more fundamental quantities. Our reasoning about an *actual* variable may fail to hold if it is found to not be the fundamental quantity. We will imagine *q* to be smoothly increasing by the infinitesimal *ε*.

Since *q* is the ultimate variable that relates every other variable in the equation, every variable can (theoretically) be written in terms of *q*. *y* is actually shorthand for *y q*ð Þ, *x* is a shorthand for *x q*ð Þ, and so on. We can then define the differential of an expression (including just a variable) to be the simple difference between the expression at some value *q* þ *ε* and the expression at some value *q*. When taking the differential of a variable, we will use the shorthand d*y* to mean dð Þ*y* .

$$\mathbf{d}\mathbf{y} = \mathbf{d}(\mathbf{y}) = \mathbf{y}(\mathbf{q} + \boldsymbol{\varepsilon}) - \mathbf{y}(\mathbf{q}) \tag{12}$$

Note that d*y* is also a function of *q* (this fact will become useful when finding the second differential). Additionally, assuming that *y* is a smooth and continuous function of *q*, an infinitesimal change in *q* will lead to an infinitesimal change in in *y*, so d*y* will also be infinitesimal.

We can also rearrange (12) and obtain

$$
\mathfrak{y}(q+\varepsilon) = \mathfrak{y}(q) + \mathfrak{dy} \tag{13}
$$

These definitions provide a generic definition for the differential and consequent manipulation techniques that can be applied to any expression. Let us take the simple example *<sup>y</sup>* <sup>¼</sup> *<sup>x</sup>*<sup>2</sup> (which is *y q*ð Þ¼ *x q*ð Þ<sup>2</sup> ) and apply this differential operator to it. We will also apply the principal part function at the end in order to simplify the expression to its most consequential portion.

$$\begin{aligned} \mathbf{y} &= \mathbf{x}^2 \\ \mathbf{d}(\mathbf{y}) &= \mathbf{d}\left(\mathbf{x}^2\right) & \text{differential operator} \\ \mathbf{y}(q+e) - \mathbf{y}(q) &= \mathbf{x}(q+e)^2 - \mathbf{x}(q)^2 & \text{approx} (12) \\ \mathbf{d}\mathbf{y} &= \left(\mathbf{x}(q) + \mathbf{dx}\right)^2 - \mathbf{x}(q)^2 & \text{approx} (13) \\ \mathbf{d}\mathbf{y} &= \mathbf{x}(q)^2 + 2\mathbf{x}(q)\,\mathbf{dx} + \mathbf{dx}^2 - \mathbf{x}(q)^2 & \text{simplifying} \\ \mathbf{d}\mathbf{y} &= 2\mathbf{x}(q)\,\mathbf{dx} + \mathbf{dx}^2 \\ \mathbf{d}\mathbf{y} &= 2\mathbf{x}(q)\,\mathbf{dx} & \text{principal part} \\ \mathbf{d}\mathbf{y} &= 2\mathbf{x}\,\mathbf{dx} & \text{shorthand} \end{aligned}$$

The second differential is the same process. It is merely the differential operator applied where differentials are concerned. d*y* is actually dð Þ *y q*ð Þ ), but we will refer to it as d*y q*ð Þ and d*y q*ð Þ <sup>þ</sup> *<sup>ε</sup>* for a compromise of brevity and clarity. The notation d<sup>2</sup> *y* will likewise be shorthand for d dð Þ ð Þ *y q*ð Þ .

*Total and Partial Differentials as Algebraically Manipulable Entities DOI: http://dx.doi.org/10.5772/intechopen.107285*

$$\begin{aligned} \mathbf{dy} &= 2\mathbf{x} \, \mathrm{d}\mathbf{x} \\ \mathbf{d}(\mathbf{dy}) &= \mathbf{d}(2\mathbf{x} \, \mathrm{d}\mathbf{x}) \\ &= 2\mathbf{x}(q+\epsilon)\mathrm{d}\mathbf{x}(q+\epsilon) - 2\mathbf{x}(q)\mathrm{d}\mathbf{x}(q) & \mathrm{approx} \\ &= 2(\mathbf{x}(q) + \mathbf{d}\mathbf{x}(q))(\mathrm{d}\mathbf{x}(q) + \mathbf{d}(\mathrm{d}\mathbf{x}(q))) - 2\mathbf{x}(q)\, \mathrm{d}\mathbf{x}(q) & \mathrm{approx} \\ &= 2\mathbf{x}(q)\, \mathrm{d}\mathbf{x}(q) + 2\mathbf{x}(q)\mathrm{d}(\mathrm{d}\mathbf{x}(q)) & \mathrm{sim} \mathrm{p}\mathbf{l} \\ &+ 2\mathbf{d}\mathbf{x}(q)^{2} + 2\mathbf{d}\mathbf{x}\, \mathrm{d}(\mathbf{dx}(q)) - 2\mathbf{x}(q)\mathbf{d}\mathbf{x}(q) \\ &= 2\mathbf{x}(q)\, \mathrm{d}(\mathbf{dx}(q)) + 2\mathbf{d}\mathbf{x}(q)^{2} + 2\mathbf{d}\mathbf{x}\, \mathrm{d}(\mathbf{dx}(q)) \\ &= 2\mathbf{x}(q)\, \mathrm{d}(\mathbf{dx}(q)) + 2\, \mathrm{d}\mathbf{x}(q)^{2} & \text{principal part} \\ \mathbf{d}^{2}\mathbf{y} &= 2\mathbf{x}^{2}\mathbf{d}^{2} + 2\mathbf{d}\mathbf{x}^{2} & \text{should} \end{aligned}$$

This second differential will typically be a second order infinitesimal. The process can be further repeated for higher order differentials.

The 2*x*d<sup>2</sup> *x* term here may be surprising, but the reason for it will become clear in Section 5 when we eliminate the contradictions present in the standard notation for higher-order differentials.

Since all variables in the equation are related to each other, they also share some relationship to *q*. Therefore, the definition of a differential can be defined universally within an equation without taking into account the specifics of the variables encountered.

Ultimately, taking the differential of a function results in a d*y*, d*x*, or some other term. However, these terms' definitions are ultimately rooted in this ultimate independent variable *q*, and the results of incrementing it by some hyperreal infinitesimal *ε*.

The derivative, then, is simply a ratio of differentials defined in this way. While the terminology of "taking the derivative with respect to *x*" can still be used, there is no longer anything special about taking the derivative with respect to a variable as opposed to simply dividing by that variable's differential. Additionally, this expands the ability to take total differentials straightforwardly into multivariable situations, providing that all variables can be, in principle, tied back to some underlying construct like *q*.

## **5. Extending the total derivative's algebraic manipulability**

The hyperreal definition of the derivative has several advantages. Once hyperreal numbers are defined, the definition of the derivative arises naturally from considering the change in a function when its (theoretical) independent variable changes infinitesimally. Unlike the limit definition, the change in *y* and the change *x* are separate entities. Using hyperreal numbers, we can rigorously define these entities so that they are manipulable using standard algebraic operators.

However, this requires that we rethink some of the notations from first principles. First of all, now that d*y* and d*x* are reified entities, they now must be considered in applying such rules as the product rule and the quotient rule. This is straightforward, and the rules are identical to normal calculus rules. The differential of *x*<sup>2</sup> d*x* is the result of applying the product rule to the product of *<sup>x</sup>*<sup>2</sup> and d*x*, namely 2*x*d*x*<sup>2</sup> <sup>þ</sup> *<sup>x</sup>*<sup>2</sup> <sup>d</sup><sup>2</sup> *x*.

When this is taken into account, differentials of any order become algebraically manipulable.

## **5.1 The second derivative**

Before taking this idea of algebraically manipulable differentials too far, we need to note that the standard notation for the second derivative, d2 *y* <sup>d</sup>*<sup>x</sup>*2, does not work in this manner. The problem, here, is that it implies an improper order of operations [8].

Order of operations is very important when doing derivatives. When doing a derivative, one *first* takes the differential and *then* divides by d*x*. The second derivative is the derivative of the first, so the next differential occurs *after the first derivative is complete*, and the process finishes by dividing by d*x* again.

However, what does it look like to take the differential of the first derivative? Basic calculus rules tell us that the quotient rule should be used:

$$\begin{split} \mathbf{d} \left( \frac{\mathbf{d}\mathbf{y}}{\mathbf{d}\mathbf{x}} \right) &= \frac{\mathbf{d}\mathbf{x}(\mathbf{d}(\mathbf{d}\mathbf{y})) - \mathbf{d}\mathbf{y}(\mathbf{d}(\mathbf{d}\mathbf{x}))}{\left(\mathbf{d}\mathbf{x}\right)^{2}} \\ &= \frac{\mathbf{d}^{2}\mathbf{y}}{\mathbf{d}\mathbf{x}} - \frac{\mathbf{d}\mathbf{y}}{\mathbf{d}\mathbf{x}} \frac{\mathbf{d}^{2}\mathbf{x}}{\mathbf{d}\mathbf{x}} \end{split}$$

Then, for the second step, this can be divided by d*x*, yielding:

$$\frac{\mathbf{d}\left(\frac{\mathrm{d}\mathbf{y}}{\mathrm{d}\mathbf{x}}\right)}{\mathrm{d}\mathbf{x}} = \frac{\mathbf{d}^2 \mathbf{y}}{\mathrm{d}\mathbf{x}^2} - \frac{\mathrm{d}\mathbf{y}}{\mathrm{d}\mathbf{x}} \frac{\mathrm{d}^2 \mathbf{x}}{\mathrm{d}\mathbf{x}^2} \tag{14}$$

This, in fact, yields a notation for the second derivative which is equally algebraically manipulable as the first derivative. It is not very pretty or compact, but it works algebraically.

The chain rule for the second derivative fits this algebraic notation correctly, provided we replace each instance of the second derivative with its full form (cf. (30)):

$$\frac{\text{d}^2 \text{y}}{\text{d}t^2} - \frac{\text{dy}}{\text{d}t} \frac{\text{d}^2 t}{\text{d}t^2} = \left(\frac{\text{d}^2 \text{y}}{\text{d} \text{x}^2} - \frac{\text{dy}}{\text{d} \text{x}} \frac{\text{d}^2 \text{x}}{\text{d} \text{x}^2}\right) \left(\frac{\text{dx}}{\text{d}t}\right)^2 + \frac{\text{dy}}{\text{d} \text{x}} \left(\frac{\text{d}^2 \text{x}}{\text{d}t^2} - \frac{\text{dx}}{\text{d}t} \frac{\text{d}^2 t}{\text{d}t^2}\right) \tag{15}$$

This in fact works out perfectly algebraically.<sup>2</sup>

## **5.2 Higher order derivatives**

The notation for the third and higher derivatives can be found using the same techniques as for the second derivative. To find the third derivative of *y* with

$$\frac{\text{d}\left(\frac{\text{d}\mathbf{x}}{\text{d}\mathbf{x}}\right)}{\text{d}\mathbf{x}} = \frac{\text{d}^2\mathbf{x}}{\text{d}\mathbf{x}^2} - \frac{\text{d}\mathbf{x}}{\text{d}\mathbf{x}}\frac{\text{d}^2\mathbf{x}}{\text{d}\mathbf{x}^2} \tag{16}$$

<sup>2</sup> Some may be concerned that, in the formula presented in (14), the ratio <sup>d</sup><sup>2</sup> *x* <sup>d</sup>*x*<sup>2</sup> reduces to zero. However, this is not necessarily true. The concern is that, since <sup>d</sup>*<sup>x</sup>* <sup>d</sup>*<sup>x</sup>* is always 1 (i.e., a constant), then d2 *x* <sup>d</sup>*x*<sup>2</sup> should be zero. The problem with this concern is that we are no longer taking d2 *x* <sup>d</sup>*x*<sup>2</sup> to be the derivative of <sup>d</sup>*<sup>x</sup>* <sup>d</sup>*<sup>x</sup>*. Using the notation in (14), the derivative of <sup>d</sup>*<sup>x</sup>* <sup>d</sup>*<sup>x</sup>* would be:

In this case, since <sup>d</sup>*<sup>x</sup>* <sup>d</sup>*<sup>x</sup>* reduces to 1, the expression is self-evidently zero. However, in (16), the term d2 *x* <sup>d</sup>*x*<sup>2</sup> is not itself necessarily zero, since it is *not* the second derivative of *x* with respect to *x*.

*Total and Partial Differentials as Algebraically Manipulable Entities DOI: http://dx.doi.org/10.5772/intechopen.107285*

respect to *x*, one starts with the second derivative, takes the differential, and divides by d*x*:

$$\frac{\mathbf{d}\left(\frac{\mathbf{d}\left(\frac{\mathbf{d}\mathbf{y}}{\mathrm{d}x}\right)}{\mathrm{d}x}\right)}{\mathrm{d}x} = \frac{\mathbf{d}\left(\frac{\mathbf{d}^{2}y}{\mathrm{d}x^{2}} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}^{2}x}{\mathrm{d}x^{2}}\right)}{\mathrm{d}x} = \frac{\mathbf{d}^{3}y}{\mathrm{d}x^{3}} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}^{3}x}{\mathrm{d}x^{3}} - 3\frac{\mathrm{d}^{2}x}{\mathrm{d}x^{2}}\frac{\mathrm{d}^{2}y}{\mathrm{d}x^{2}} + 3\frac{\mathrm{d}y}{\mathrm{d}x}\frac{\left(\mathrm{d}^{2}x\right)^{2}}{\mathrm{d}x^{4}}\tag{17}$$

Because the expanded notation for the second and higher derivatives is much more verbose than the first derivative, it is often useful for clarity and succinctness to write derivatives using a slight modification of Arbogast's *D* notation (see [9]) for the total derivative instead of writing it as algebraic differentials. Here, we will also be subscripting the *D* with the variable with which the derivative is being taken with respect to and supplying in the superscript the number of derivatives we are taking. Therefore, where Arbogast would write simply *D*, this notation would be written as *D*<sup>1</sup> *x*.

Below is the second and third derivative of *y* with respect to *x* written using both the enhanced Arbogast notation and as a ratio of differentials.

$$D\_x^2 y = \frac{\text{d}^2 y}{\text{d}x^2} - \frac{\text{d}y}{\text{d}x} \frac{\text{d}^2 x}{\text{d}x^2} \tag{18}$$

$$D\_x^3 y = \frac{\mathbf{d}^3 y}{\mathbf{d}x^3} - \frac{\mathbf{d}y}{\mathbf{d}x} \frac{\mathbf{d}^3 x}{\mathbf{d}x^3} - 3 \frac{\mathbf{d}^2 x}{\mathbf{d}x^2} \frac{\mathbf{d}^2 y}{\mathbf{d}x^2} + 3 \frac{\mathbf{d}y}{\mathbf{d}x} \frac{\left(\mathbf{d}^2 x\right)^2}{\mathbf{d}x^4} \tag{19}$$

This gets even more important as the number of derivatives increases. Each one is more unwieldy than the previous one. However, each level can be converted to differential notation as follows:

$$D\_x^n y = \frac{\mathbf{d}\left(D\_x^{n-1} y\right)}{\mathbf{d}x} \tag{20}$$

The advantage of Arbogast's notation over Lagrangian notation are that this modification of Arbogast's notation clearly specifies both the variable/expression whose derivative is being taken and the variable/expression it is being taken with respect to.

Therefore, when a compact representation of higher order derivatives is needed, this paper will use Arbogast's notation for its clarity and succinctness. This notation can be easily expanded to its differentials when necessary for manipulation.

## **6. Extending the partial derivative's algebraic manipulability**

The derivative gives the rate at which a function *f* changes when *x* is increased. But what if *f* depends on both *x* and *y*? Imagine a hill where *f* is the distance above sea level, *x* is the distance east from the origin, and *y* is the distance north from the origin. To find how *f* is changing, a direction to measure the slope must be picked. Along the direction straight east, only *x* is changing while *y* stays constant. This slope is the partial derivative of *<sup>f</sup>* with respect to *<sup>x</sup>*, denoted *<sup>∂</sup><sup>f</sup> <sup>∂</sup>x*, the change in *f* over the change in *x* when *x* is the only variable allowed to change ([5], pp. 940–941). A derivative where all the independent variables are allowed to change is called a total derivative, like the two-dimensional derivative <sup>d</sup>*<sup>y</sup>* <sup>d</sup>*<sup>x</sup>*. This partial derivative can be formally defined using limits or using hyperreals.

Using limits, the partial derivative of *f x*ð Þ , *y* at the point ð Þ *a*, *b* with respect to *x* is lim *h*!0 *f a*ð Þ� þ*h*, *b f a* ðð Þ , *b <sup>h</sup>* ([5], p. 941). Likewise, the partial derivative of *f x*ð Þ , *y* with respect to *x* is lim *h*!0 *f x*ð Þ� þ*h*, *y f x*ð Þ , *y <sup>h</sup>* . For more than two variables, the partial derivative of *f x*ð Þ 1, *x*2, … with respect to *x*<sup>1</sup> is

$$\frac{\partial f}{\partial \mathbf{x}\_1} = \lim\_{h \to 0} \frac{f(\mathbf{x}\_1 + h, \mathbf{x}\_2, \dots) - f(\mathbf{x}\_1, \mathbf{x}\_2, \dots)}{h} \tag{21}$$

Like the with the total derivative, using limits to define the partial derivative means the change in *f* and the change in *x* are not defined separately and must be kept together. Using hyperreals, the partial derivative of *f* with respect to *x*<sup>1</sup> is

$$\frac{\partial f}{\partial \mathbf{x}\_1} = \frac{f(\mathbf{x}\_1 + d\mathbf{x}\_1, \mathbf{x}\_2, \dots) - f(\mathbf{x}\_1, \mathbf{x}\_2, \dots)}{d\mathbf{x}\_1} \tag{22}$$

Also, d*x*<sup>1</sup> can equal *∂x*<sup>1</sup> assuming both of them denote the smallest change in *x*<sup>1</sup> possible. This is not an equation in the real numbers; it is an equation in the hyperreals.

Both the numerator and denominator of *<sup>∂</sup><sup>f</sup> <sup>∂</sup>x*<sup>1</sup> have meaning on their own, and they both are specific hyperreals. So it should be possible to separate the fraction without problems.

However, the current notation for *∂f* does not distinguish between the change in *f* when *x*<sup>1</sup> is allowed to change and the change in *f* when another variable, say *x*2, is allowed to change. In other words, the *<sup>∂</sup><sup>f</sup>* in *<sup>∂</sup><sup>f</sup> <sup>∂</sup>x*<sup>1</sup> is a different hyperreal from the *<sup>∂</sup><sup>f</sup>* in *∂f ∂x*<sup>2</sup> , even though they both use the exact same symbol. This can cause problems if the notation is taken seriously (see the contradiction noted in Section 2). Adding more information to the notation resolves this issue.

The notation for the partial derivative should be changed from *<sup>∂</sup><sup>f</sup> <sup>∂</sup><sup>x</sup>* to *<sup>∂</sup>*ð Þ *<sup>f</sup>*, *<sup>x</sup>* <sup>d</sup>*<sup>x</sup>* in order to preserve the information in the numerator when the fraction is separated.

This makes it clear that *∂* is an operator that takes as an argument not only *f* but also the choice of which variable to vary. The function that *∂* acts on, in this case *f*, is the first argument of *∂* and every argument after the first is a variable allowed to change. This can lead to expressions like *<sup>∂</sup>*ð Þ *<sup>f</sup>*, *<sup>x</sup>*, *<sup>y</sup>* , the change in *<sup>f</sup>* when both *<sup>x</sup>* and *<sup>y</sup>* are allowed to vary.

Using this notation, <sup>d</sup>*<sup>f</sup>* <sup>d</sup>*<sup>t</sup>* equals *<sup>∂</sup>*ð Þ *<sup>f</sup>*, *<sup>x</sup>* <sup>d</sup>*<sup>t</sup>* <sup>þ</sup> *<sup>∂</sup>*ð Þ *<sup>f</sup>*, *<sup>y</sup>* <sup>d</sup>*<sup>t</sup>* , not <sup>d</sup>*<sup>f</sup>* <sup>d</sup>*<sup>t</sup>* <sup>þ</sup> <sup>d</sup>*<sup>f</sup>* d*t* . The contradictions are resolved, and the partial derivative fraction can be separated. The numerator and denominator can be moved around just like any other algebraic expression, keeping in mind both of them are hyperreals, so technically any operations on them should be hyperreal operations.

Because the new notation can be algebraically manipulated without contradictions, it makes possible new equations where infinitesimals are not confined to ratios. For instance, the resolved contradiction proof gave the equation d*<sup>f</sup>* <sup>¼</sup> *<sup>∂</sup>*ð Þþ *<sup>f</sup>*, *<sup>x</sup> <sup>∂</sup>*ð Þ *<sup>f</sup>*, *<sup>y</sup>* . This is reminiscent of one of the conditions for differentiability, Δ*f* ¼ *f <sup>x</sup>*ð Þ *a*, *b* Δ*x* þ *f <sup>y</sup>*ð Þ *a*, *b* Δ*y* þ *ε*1Δ*x* þ*ε*2Δ*y*, where for fixed *a* and *b*, *ε*<sup>1</sup> and *ε*<sup>2</sup> are functions that depend only on Δ*x* and Δ*y*, with *ε*1,*ε*<sup>2</sup> ! ð Þ 0, 0 asð Þ! Δ*x*, Δ*y* ð Þ 0, 0 ([5], p. 947).

Besides simplifying old equations, with the new notation it is possible to consider individual partial changes when building equations, just like considering individual total changes.

*Total and Partial Differentials as Algebraically Manipulable Entities DOI: http://dx.doi.org/10.5772/intechopen.107285*

The new notation can also denote expressions like *<sup>∂</sup>*ð Þ *<sup>f</sup>*, *<sup>x</sup>*1, *<sup>x</sup>*<sup>2</sup> , the change in *f x*ð Þ 1, *x*2, *x*<sup>3</sup> when *x*<sup>1</sup> and *x*<sup>2</sup> are allowed to vary, but *x*<sup>3</sup> must stay constant. With the current notation *∂f*, dealing with these situations is clumsy at best.

*<sup>∂</sup>*ð Þ *<sup>f</sup>*, *<sup>x</sup>*<sup>1</sup> is an infinitesimal with meaning on its own. It can be defined analogously to Eq. 12:

$$\partial(f,\boldsymbol{x}\_{1}) = f(\boldsymbol{x}\_{1} + \mathbf{d}\boldsymbol{x}\_{1}, \boldsymbol{x}\_{2} \dots) - f(\boldsymbol{x}\_{1}, \boldsymbol{x}\_{2} \dots) \tag{23}$$

The total differential of *f* is usually defined as the combination of all of the changes in *f* depending on each variable. Typically, the total differential of a multivariate function is found using the sum of its partial *derivatives* multiplied by their respective differentials.

$$\operatorname{df}(\mathbf{x}\_1, \mathbf{x}\_2 \dots) = \frac{\partial f}{\partial \mathbf{x}\_1} \operatorname{d} \mathbf{x}\_1 + \frac{\partial f}{\partial \mathbf{x}\_2} \operatorname{d} \mathbf{x}\_2 + \dots \tag{24}$$

Using the new definition of the partial differential, we can rewrite the formula much more straightforwardly, where the total differential is simply a sum of its partial differentials.

$$\text{df}(\mathbf{x}\_1, \mathbf{x}\_2 \dots) = \partial(f, \mathbf{x}\_1) + \partial(f, \mathbf{x}\_2) + \dots \tag{25}$$

## **7. Building differential formulas**

Using the notation established in this paper, we can build standard calculus formulas in a clear, algebraic manner. The notation and the formulas will flow directly from the basic truths of calculus and the algebraic reasoning of differentials.

## **7.1 The inverse function theorem for second derivatives**

The standard inverse function theorem simply states that <sup>d</sup>*<sup>x</sup>* <sup>d</sup>*<sup>y</sup>* <sup>¼</sup> <sup>1</sup> d*y* d*x* . In other words, as implied by the algebraic arrangement of its terms, the derivative of *x* with respect to *y* is simply the inverse of the derivative of *y* with respect to *x*. Using the hyperreal understanding of derivatives allows for a more straightforward way of considering this fact.

More importantly, the new notation for the second derivative likewise allows for a straightforward algebraic construction of an inverse function theorem for the second derivative. Since the second derivative of *y* with respect to *x* is *D*<sup>2</sup> *<sup>x</sup> <sup>y</sup>* <sup>¼</sup> d2 *y* <sup>d</sup>*x*<sup>2</sup> � <sup>d</sup>*<sup>y</sup>* <sup>d</sup>*<sup>x</sup>* d2 *x* <sup>d</sup>*<sup>x</sup>*2, then the second derivative of *x* with respect to *y* will likewise be *D*<sup>2</sup> *<sup>y</sup> <sup>x</sup>* <sup>¼</sup> d2 *x* <sup>d</sup>*y*<sup>2</sup> � <sup>d</sup>*<sup>x</sup>* d*y* d2 *y* <sup>d</sup>*<sup>y</sup>*2. Is there a way to construct a formula for converting one to the other? A simple multiplication by � <sup>d</sup>*<sup>x</sup>* d*y* <sup>3</sup> yields

$$-D\_x^2 \mathcal{Y} \left(\frac{\mathbf{d}x}{\mathbf{d}y}\right)^3 = \frac{\mathbf{d}^2 x}{\mathbf{d}y^2} - \frac{\mathbf{d}^2 y}{\mathbf{d}y^2} \frac{\mathbf{d}x}{\mathbf{d}y}$$

Here, <sup>d</sup>*<sup>x</sup>* <sup>d</sup>*<sup>y</sup>* can be trivially recognized as <sup>1</sup> *D*1 *xy* , and the right-hand side of the equation can be recognized as *D*<sup>2</sup> *yx*. Therefore, this can be rewritten as

$$-D\_{\mathbf{x}}^{2}\mathcal{Y}\left(\frac{\mathbf{1}}{D\_{\mathbf{x}}^{1}\mathbf{y}}\right)^{3} = D\_{\mathbf{y}}^{2}\mathbf{x} \tag{26}$$

which is the inverse function theorem for the second derivative.

## **7.2 The chain rule for the second derivative**

The chain rule for the second derivative can also be easily derived from the new notation. Starting with the notation for the second derivative of *y* with respect to *x*, we can look at the transformations needed to generate a second derivative of *y* with respect to *t*. We will start by multiplying by <sup>d</sup>*x*<sup>2</sup> <sup>d</sup>*t*<sup>2</sup> in order to match the leading term to what is needed for the final result.

$$D\_x^2 y = \frac{\mathbf{d}^2 y}{\mathbf{d}x^2} - \frac{\mathbf{d}y}{\mathbf{d}x} \frac{\mathbf{d}^2 x}{\mathbf{d}x^2} \tag{27}$$

$$\left(D\_x^2 y \left(D\_t^1 x\right)\right)^2 = \frac{\text{d}^2 y}{\text{d}x^2} \frac{\text{d}x^2}{\text{d}t^2} - \frac{\text{d}y}{\text{d}x} \frac{\text{d}^2 x}{\text{d}x^2} \frac{\text{d}x^2}{\text{d}t^2} \tag{28}$$

$$\left(D\_{\mathbf{x}}^{2}y\left(D\_{t}^{1}\mathbf{x}\right)^{2}=\frac{\mathbf{d}^{2}y}{\mathbf{d}t^{2}}-\frac{\mathbf{d}y}{\mathbf{d}\mathbf{x}}\frac{\mathbf{d}^{2}\mathbf{x}}{\mathbf{d}t^{2}}\tag{29}$$

In (29) we see that the leading term is what we want, but the second term is problematic. However, it looks a little like the leading term of the second derivative of *x* with respect to *t* multiplied by the first derivative of *y* with respect to *t*. Adding that combination to our existing result will yield the desired effect.

$$\left(D\_x^2 y\right) \left(D\_t^1 x\right)^2 + \left(D\_x^1 y\right) \left(D\_t^2 x\right) = \frac{d^2 y}{dt^2} - \frac{d y}{d x} \frac{d^2 x}{dt^2} + \frac{d y}{d x} \frac{d^2 x}{dt^2} - \frac{d y}{d x} \frac{d x}{dt} \frac{d^2 t}{dt^2} \tag{30}$$

$$\left(D\_x^2 \mathcal{y}\right) \left(D\_t^1 \mathcal{x}\right)^2 + \left(D\_x^1 \mathcal{y}\right) \left(D\_t^2 \mathcal{x}\right) = \frac{\mathbf{d}^2 \mathcal{y}}{\mathbf{d}t^2} - \frac{\mathbf{d}\mathcal{y}}{\mathbf{d}t} \frac{\mathbf{d}^2 t}{\mathbf{d}t^2} \tag{31}$$

As is evident, the right-hand side is the desired result—the second derivative of *y* with respect to *t*.

## **7.3 The chain rule for multivariate derivatives**

Building the chain rule for multivariate derivatives is even more straightforward. Consider a function *f x*ð Þ , *y* where *x* and *y* are both functions of *t*. As noted in (25), The total change in *f*, d*f*, has two parts: the change due to *x* changing and the change due to *y* changing. So,

$$\mathbf{df} = \partial(f, \mathfrak{x}) + \partial(f, \mathfrak{y}) \tag{32}$$

Dividing both sides by d*t*,

$$\frac{\text{df}}{\text{dt}} = \frac{\partial(f,\text{x})}{\text{dt}} + \frac{\partial(f,y)}{\text{dt}}\tag{33}$$

This is a valid equation, but it is difficult to calculate a value like *<sup>∂</sup>*ð Þ *<sup>f</sup>*, *<sup>x</sup>* <sup>d</sup>*<sup>t</sup>* directly. To make it easier to work with, we can multiply the first term by <sup>d</sup>*<sup>x</sup>* <sup>d</sup>*<sup>x</sup>* and the second by <sup>d</sup>*<sup>y</sup>* d*y* : 3

$$\frac{\mathrm{d}f}{\mathrm{d}t} = \frac{\partial(f,\boldsymbol{x})}{\mathrm{d}t} \cdot \frac{\mathrm{d}\boldsymbol{x}}{\mathrm{d}\boldsymbol{x}} + \frac{\partial(f,\boldsymbol{y})}{\mathrm{d}t} \cdot \frac{\mathrm{d}\boldsymbol{y}}{\mathrm{d}\boldsymbol{y}}\tag{34}$$

$$=\frac{\partial(f,\mathbf{x})}{\mathbf{dx}}\cdot\frac{\mathbf{dx}}{\mathbf{dt}}+\frac{\partial(f,y)}{\mathbf{dy}}\cdot\frac{\mathbf{dy}}{\mathbf{dt}}\tag{35}$$

This is the standard chain rule for multivariate derivatives.

## **8. Conclusion**

While treating derivatives as ratios of differentials has been long viewed as problematic, small changes in both the understanding and notation of derivatives straightforwardly leads to algebraically manipulable differentials for both total and partial differentials. These differentials provide a more straightforward basis for both doing calculus operations and deriving standard calculus rules. It eliminates exceptions and memorized formulas in favor of simply using algebra with differentials.

Our hope is that the flexibility and freedom of manipulability that this notation allows will both reduce the cognitive load for learning to use differential operators as well as allow for easier exploration of possibilities for practitioners.

## **Acknowledgements**

The authors wish to thank Dr. Enrique Valderrama for his comments on early drafts of this manuscript.

<sup>3</sup> Technically, both <sup>d</sup>*<sup>x</sup>* <sup>d</sup>*<sup>x</sup>* and <sup>d</sup>*<sup>y</sup>* <sup>d</sup>*<sup>y</sup>* equal [1], not 1. But, since this is an equation in the hyperreals (with hyperreal multiplication), multiplying by the hyperreal multiplication identity does not change the value of the right side of the equation.

## **Author details**

Maria Isabelle Fite1† and Jonathan Bartlett<sup>2</sup> \*†

1 University of Tulsa, Tulsa, Oklahoma, United States

2 The Blyth Institute, Tulsa, Oklahoma, United States

\*Address all correspondence to: jonathan.bartlett@blythinstitute.org

† These authors contributed equally.

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Total and Partial Differentials as Algebraically Manipulable Entities DOI: http://dx.doi.org/10.5772/intechopen.107285*

## **References**

[1] Johnson WP. The curious history of fa'a di Bruno's formula. The American Mathematical Monthly. 2002;**109**(3): 217-234. DOI: 10.1080/00029890. 2002.11919857

[2] Newton I. The Method of Fluxions and Infinite Series; with its Application to the Geometry of Curve-Lines, (Translated by John Colson). London: Henry Woodfall and John Nourse; 1736

[3] Bell JL. Continuity and Infinitesimals. The Stanford Encyclopedia of Philosophy. 2022 ed. Stanford, CA: Spring, The Metaphysics Research LabPhilosophy Department Stanford University; 2022

[4] Berkeley G. The Analyst: A Discourse Addressed to an Infidel Mathematician. London: J. and R. Tonson and S. Draper; 1734

[5] Briggs W, Cochran L, Gillett B, Schulz E. Calculus: Early Transcendentals. 3rd ed. New York: Pearson Education; 2019

[6] Herrmann RA. Nonstandard analysis: A simplified approach. arXiv. 2010; **math/0310351v6**: 1-82

[7] Bartlett J, Gaastra L, Nemati D. Hyperreal numbers for infinite divergent series. Communications of the Blyth Institute. 2020;**2**(1):7-15. DOI: 10.33014/ issn.2640-5652.2.1.bartlett-et-al.1

[8] Bartlett J, Khurshudyan AZ. Extending the algebraic manipulability of differentials. Dynamics of Continuous, Discrete and Impulsive Systems Series A: Mathematical Analysis. 2019;**26**:217-230

[9] Cajori F. A History of Mathematical Notations. Vol. II. Chicago: Open Court Publishing; 1929

## **Chapter 5**
