The Infinitesimal Calculus

Do you know what a removable singularity is? The chain rule introduces a change of variables in order to simplify the process of differentiating. This can, indeed, create a discontinuity. But all of these discontinuities are removable, as my algebra for the second derivative shows. And the underlying derivative that we are using the chain rule to help calculate doesn't have those discontinuities. This is so for the chain rules of the first, second, and really any order of derivatives.

Also I am using limits, I'm just not explicitly calling them that. When I say that it's important that the "appreciable part" of the differential quotient not depend on the choice of value for the infinitesimal dx, or we can't infer a unique slope for the tangent line by rounding off the infinitesimal part? I am invoking a limit concept that can be made fully rigorous via hyperreals and the standard part function. It's just not the epsilon delta conception.

(1821, when Cauchy published his Cours d'Analyse, is pretty modern, right?)

I'm not sure you want to invoke Cours d'Analyse here, considering that Cauchy founded that conception of calculus on infinitesimals. It's true that he used an epsilon delta argument once or twice, but the definitions of such fundamental notions as continuity and differentiability were in terms of infinitesimals. Hell, he even gives an explicit delta function that is a proper function... with infinitesimal coefficients. Epsilon-Delta limits as a foundation to calculus don't actually show up in Cauchy, being more pioneered by Bolzano and popularized by Weierstrass. Hell, even Cauchy's famous error with that one proof that only works if you assume uniform continuity and not pointwise continuity... well it turns out that if your in a hyperreal setting Cauchy's definition of continuity is equivilent to uniform continuity.

It is true that I am invoking a certain degree of "generality of algebra", but unless your going deep into the weeds of discontinuous functions as solutions to partial differential equations and the Fourier Series needed to deal with them... then the functions involved are well behaved enough that you can get away with it. Ultimately this is a primer on the Infinitesimal Calculus, as practiced by the likes of Leibniz, the Bernoulli family, and Euler. The star of the show is not the derivative, which only became prominent after Lagrange's work on taylor series, but the differential. Where the Integral is an infinite sum rather than the limit of a sum of rectangles as the mesh goes to zero.
 
Last edited:
Do you know what a removable singularity is? The chain rule introduces a change of variables in order to simplify the process of differentiating. This can, indeed, create a discontinuity. But all of these discontinuities are removable, as my algebra for the second derivative shows. And the underlying derivative that we are using the chain rule to help calculate doesn't have those discontinuities. This is so for the chain rules of the first, second, and really any order of derivatives.

Also I am using limits, I'm just not explicitly calling them that. When I say that it's important that the "appreciable part" of the differential quotient not depend on the choice of value for the infinitesimal dx, or we can't infer a unique slope for the tangent line by rounding off the infinitesimal part? I am invoking a limit concept that can be made fully rigorous via hyperreals and the standard part function. It's just not the epsilon delta conception.



I'm not sure you want to invoke Cours d'Analyse here, considering that Cauchy founded that conception of calculus on infinitesimals. It's true that he used an epsilon delta argument once or twice, but the definitions of such fundamental notions as continuity and differentiability were in terms of infinitesimals. Hell, he even gives an explicit delta function that is a proper function... with infinitesimal coefficients. Epsilon-Delta limits as a foundation to calculus don't actually show up in Cauchy, being more pioneered by Bolzano and popularized by Weierstrass. Hell, even Cauchy's famous error with that one proof that only works if you assume uniform continuity and not pointwise continuity... well it turns out that if your in a hyperreal setting Cauchy's definition of continuity is equivilent to uniform continuity.

It is true that I am invoking a certain degree of "generality of algebra", but unless your going deep into the weeds of discontinuous functions as solutions to partial differential equations and the Fourier Series needed to deal with them... then the functions involved are well behaved enough that you can get away with it. Ultimately this is a primer on the Infinitesimal Calculus, as practiced by the likes of Leibniz, the Bernoulli family, and Euler. The star of the show is not the derivative, which only became prominent after Lagrange's work on taylor series, but the differential. Where the Integral is an infinite sum rather than the limit of a sum of rectangles as the mesh goes to zero.

The discontinuities in ddy/du^2 I mentioned in my previous post can't be generally removable. In fact, they can't be bounded in any neighbourhood of the critical point if they (ddy/du^2)(du/dv)^2 = ddy/dv^2 were to hold when ddy/dv^2 is non-zero. You can't just claim whatever attributes that support your proposition.

Also, if you knew dy/dx is not dependent on dx, why are you treating dx as a variable instead of an arbitrarily chosen constant when you take the derivative of dy/dx?
 
Scroll 2
PART 7: IMPLICIT FUNCTIONS AND RELATED RATES​

In calculus class you may have learned about "implicit differentiation". In the infinitesimal calculus there is no such thing as implicit differentiation, you just apply the differential operator to the entire equation and solve for the desired differential quotient. Let's demonstrate with a simple implicit function: the circle.
LaTeX:
\[(x-x_0)^2+(y-y_0)^2=C \\ 2(x-x_0)dx+2(y-y_0)dy=0 \\ (x-x_0)dx+(y-y_0)dy=0 \\ (y-y_0)dy=-(x-x_0)dx \\ \frac{dy}{dx}=-\frac{x-x_0}{y-y_0}\]

Which one will note is the negative inverse of the slope from the center of the circle to the point of tangency. Since the product of the slopes of perpendicular lines is -1, this concurs with Euclid's finding that the tangent to a circle at a point is perpendicular to the radius from the center to the point.

Likewise, if we are interested in finding dy/dt or something, we can just divide both sides by dt in order to introduce the desired differential. Since that's a thing you can just do in algebra.


PART 8: RECTIFICATION OF CURVES

A question that has occupied the minds of mathematicians for a very long time is the question of arclength. It is easy to measure a line, because a line is self similar in ways that puts even fractals to shame. But a curve? Curves seem impossible to do more than poorly approximate. The ancient Egyptians knew that the circumference of a circle was about 6.3 times greater than the radius, but that is a fuzzy empirical measurement, lacking the certainty of a mathematical determination. The greeks pondered this problem for a very long time. Only Archimedes made any notable progress.

So how does the Infinitesimal Calculus deal with rectification? Why with differentials of course. Letting s be arclength, and a and b the endpoints of a curve, we just need to evaluate a simple integral:
LaTeX:
\[ \int_a^bds \]

"That doesn't help at all!" I hear you cry. We don't have any idea of what ds is. Ah, but we do. ds is the infinitesimally small line segment that we use for the secant when we're finding the derivative. At such a high magnification, the line segment and the curve are indistinguishable. And we know how to measure the distance covered by a line. The pythagorean theorem is still true, even at this scale. Thus:

LaTeX:
\[ ds^2=dx^2+dy^2 \\ ds = +\sqrt{dx^2+dy^2}\]
From here we can pull out a dx, a dy, or even a dt if we want. Whichever would be most convenient to calculate. Thus the formulas for arclength are:
LaTeX:
\[ s=\int_a^b\sqrt{1+(\frac{dy}{dx})^2}dx \\ s=\int_a^b\sqrt{(\frac{dx}{dy})^2+1}dy \\ s=\int_a^b\sqrt{(\frac{dx}{dt})^2+(\frac{dy}{dt})^2}dt \]

Which will, as per usual, differ from the actual arclength by some infinitesimal, but the appreciable part will be the same for whatever dx, dy, or dt chosen, allowing us to infer what the actual arclength is by rounding off the infinitesimal error term.

Note, however, that rectification of curves is much harder than just finding areas. Even for such simple curves as the conic sections, half of them cannot be rectified using only elementary functions. The circle and the parabola can, the ellipse and hyperbola cannot.

PART 9: DIMENSIONAL ANALYSIS​

Something that doesn't come up that often in pure math but is extremely important in practical applications such as physics and engineering is the matter of dimensionality. For you see, there is more than one kind of multiplication. There is multiplication as repeated addition, such as when you multiply a length by two by adding to it a length of equal size. But there is also the kind of multiplication that results in an area instead of a length. The first is multiplication by a pure number, and the other is multiplication by a dimensional unit.

The interesting thing about dimensional units is that they can be multiplied and divided but they cannot be added or subtracted. It makes sense to speak of apples per dollar, or distance per time, but if you add a distance to an area then they just stay separate and don't interact at all even though if you were to multiply them you would get a volume. Obviously, if an equation is not dimensionally consistent, it cannot possibly be a vaild description of physical phenomenon. This seems like a truism, but it serves as a fantastic sanity check and it also dramatically restricts the solution space to physical problems.

So how does this interact with the infinitesimal calculus? The inventors of the calculus, as geometers, were keenly aware of problems of dimensionality. As such Leibniz would never have written something of the form y=x^2. Instead he would have written ay=xx, where a, y, and x are all lengths. So "a" is just an appropriately dimensioned constant, typically equal to unity.

The differential of a variable has the same units and dimensions as the variable itself. Differentiating causes no change in units. Constants of proportionality, including unit bearing ones such as a above, can be pulled in and out of a differential freely. Which is why dimensional analysis isn't that big of a deal in pure math. That said, the derivative is a quotient and the integral a sum of products, and multiplication and division do cause changes of units.

Work is the integral of Force with respect to distance, and so it has the dimensions of force times distance. Velocity is the derivative of distance with respect to time, so it has units of distance per time. These are not "dummy" variables, but vital parts of maintaining dimensional consistency. This is also why you see physicists write e^rt instead of e^t, r has units of per time, so that the exponential function can take a pure number as its argument. Likewise for the omegas you see in the trig functions, they are there to maintain dimensional consistency. For an illustrative example of the importance of dimensional consistency in the infinitesimal calculus, work out the dimensionality of the arclength formula above.

I have ragged on the traditional notation for the second and higher derivatives in Leibniz notation, but it is very good at telling you what the dimensions are at a glance. This likely contributed to its survival in the face of the more compact Lagrange notation.

PART 10: CONTINUITY​

One of the most important ideas in analysis is that of continuity and discontinuity. In the Infinitesimal calculus, continuity is defined as such:
LaTeX:
\[ f(x) \approx f(x+dx) \]
and this must hold for any and all infinitesimal values of dx. Where this is true, the function is continuous, where it is false the function is discontinuous.

A simple example of a discontinuous function is the floor function, which returns the greatest integer less than or equal to the input. This is discontinuous at the integers, because while a positive infinitesimal displacement is infinitely close to the integer, a negative infinitesimal displacement is an entire integer away.

There is more than one type of continuity of course. A function is regular continuous if the condition holds at all appreciable points, but it is uniformly continuous if it holds at all points appreciable and inappreciable. For instance, x^2 is not uniformly continuous.
LaTeX:
\[ \mathrm{let} \; H=\frac{1}{h} \\ \mathrm{then} \\ (H+h)^2=H^2+2Hh+h^2=H^2+2+h^2 \\ H^2 \not\approx H^2+2\]
Thus at infinite points, the function f(x)=x^2 fails to be continuous, and is therefore continuous but not uniformly continuous.

PART 11: L'HOPITAL'S RULE​

For any continuous function it is by definition true that:
LaTeX:
\[f \approx f+df\]
but how does this help us? df is infinitesimal, and thus f completely dominates it, just as df would completely dominate ddf or df^2. After all, f is a real number and there are no infinitesimals among the reals.

Well, that's not entirely true. There is one infinitesimal among the real numbers, the most negligible of all: zero. So if f evaluated at a is equal to 0, then df will dominate. Of course, the appreciable part of any infinitesimal is also zero, it's only when one divides an infinitesimal by another that the appreciable part could be a non-zero real number.

So we need two continuous functions, one divided by the other, which are both equal to zero when evaluated at some point to use this.

LaTeX:
\[ \mathrm{let} \;f|_a=g|_a=0 \\ \mathrm{then} \\ \frac{f}{g}|_a \approx \frac{f+df}{g+dg}|_a=\frac{0+df}{0+dg}|_a=\frac{df}{dg}|_a=\frac{df/dx}{dg/dx}|_a \\ \mathrm{thus} \\ \frac{f}{g}|_a \approx \frac{df/dx}{dg/dx}|_a \]

Note again that there are a lot of conditions on this. You need two continuous functions, in a ratio, and they have to both equal zero at the same time. If any of these conditions fail, then the entire thing fails to work. And of course if this is a sharp point or a self intersection on either of the curves, then there isn't enough information to evaluate the second part of the equation, so the derivatives df/dx and dg/dx have to both exist as well to get anything useful out of this.

But even so, this is a big deal. Under these very restrictive conditions we are able to evaluate 0/0!

An example of this that comes up decently often in an engineering context is:
LaTeX:
\[ \frac{\sin (x)}{x}|_0 \]
Taking the derivative of both the top and bottom we get cos(0)/1=1/1=1.

Note that if you try this in a modern calc class the teacher will yell at you, because the epsilon delta approach to calc requires you to solve this limit to figure out what the derivative of sin is. We, however, are free to use l'hopital for this because our derivation didn't involve such a limit at all.

But what if we use l'hopital's rule and get 0/0 again? Well, if this 0/0 also satisfies the conditions we can just use l'hopital's rule again. We can use it as many times as it takes to either get a non-indeterminate form, or for one of the conditions to break down in which case the limit probably doesn't exist.

Furthermore, 0/0 isn't the only indeterminate form, but all of the others can be massaged into it via various tricks and applications of the logarithm. So l'hopital's rule is one of the most powerful tools available for evaluating what would otherwise be indeterminate.
 
I didn't like the treatment of the exponential and logarithmic function in the OP so I changed it, here's the original version for posterity:
That's for trig functions. Now for exponentials:
LaTeX:
\[ d(n^x)=n^{x+dx}-n^x\\=n^x(n^{dx}-1) \]
We are at something of an impass now, since we have no way to reduce n^(dx)-1 any further. But we do know that n^x appears in the differential of n^x, which implies there's some chain rule stuff going around. So let e^(dx)=1+dx. This is in some sense, the natural base. The above thus simplifies to
LaTeX:
\[ d(e^x)=e^xdx \]
The exact numerical value of e is not clear right now, but we can figure it out with some doing. What is important is that we now have a base for the natural logarithm.

The derivative of the inverse function is the reciprocal of the derivative. This makes sense because the derivative of y wrt x is the change in y divided by the change in x. Thus if you divide dx by dy instead it's like flipping the axes
LaTeX:
\[ y=e^x \Rightarrow \ln(y)=x \\ \frac{dy}{dx}=e^x \Rightarrow \frac{dx}{dy}=\frac{1}{e^x}\\ =\frac {1}{e^{\ln(y)}}=\frac{1}{y}\\ \Rightarrow d(\ln(u))=\frac{du}{u} \]
This finally lets us prove the power rule for all real numbers, rational and irrational:
LaTeX:
\[ d(x^n)=d(e^{n \ln (x)}) \\ =e^{n \ln (x)}d(n\ln(x)) \\ =x^n(\ln(x)dn+nd(\ln(x))) \\ =x^n(0+n\frac{dx}{x}) \\ =n\frac{x^n}{x}dx\\ =nx^{n-1}dx \]

and here's the new version for those who have already read it and have the thread watched:
The logarithm was invented for a practical purpose: addition is easier than multiplication, so if we had a way to turn multiplication into addition we could spend less time in tedious computation and more time doing the interesting stuff. As such a logarithmic function is any non-zero function that obeys the following relationship:
LaTeX:
\[f(xy)=f(x)+f(y)\]
This is where we bring back the idea that the Integral is an area and not just the reverse of differentiation. Back before the days of calculus, when a master was explaining his findings on the quadrature of the hyperbola y=1/x, his student noticed the following:
LaTeX:
\[ \int_1^{ab}\frac{1}{x}dx=\int_1^{a}\frac{1}{x}dx+\int_a^{ab}\frac{1}{x}dx \\ \mathrm{let} \; x=au, u=\frac{x}{a} \\ dx=adu \\ \int_a^{ab}\frac{1}{x}dx=\int_1^{b}\frac{1}{au}adu =\int_1^{b}\frac{1}{u}du \\ \mathrm{so} \\ \int_1^{ab}\frac{1}{x}dx=\int_1^{a}\frac{1}{x}dx+\int_1^{b}\frac{1}{u}du \]
Which is to say that the function for the area under the curve of the hyperbola, measured starting from x=1, is given by a logarithmic function. Presently we don't know the base of the logarithm, but we do know that the derivative of this logarithm must be 1/x. Which is interesting, because a logarithmic function in any base is a constant multiple of any other base's logarithm. So the lack of any multiple means that this logarithm is in some sense natural, and it's corresponding exponential function must also be natural in the same sense. Lets call this "natural logarithm" ln(x).

The derivative of the inverse function is the reciprocal of the derivative. This makes sense because the derivative of y wrt x is the change in y divided by the change in x. Thus if you divide dx by dy instead it's like flipping the axes. Thus, for the exponential function of the same base, call that base "e" for now:
LaTeX:
\[ y=e^x \\ log_e(y)=x \\ \frac{dy}{y}=dx \\ \frac{1}{y}=\frac{dx}{dy} \\ \frac{dy}{dx}=y=e^x \\ d(e^x)=e^xdx \]

The differentials of the inverse trig functions can be found in the same way, which is left as an exercise to the reader.

This finally lets us prove the power rule for all real numbers, rational and irrational:
LaTeX:
\[ d(x^n)=d(e^{n \ln (x)}) \\ =e^{n \ln (x)}d(n\ln(x)) \\ =x^n(\ln(x)dn+nd(\ln(x))) \\ =x^n(0+n\frac{dx}{x}) \\ =n\frac{x^n}{x}dx\\ =nx^{n-1}dx \]
 
Last edited:
@Altaigne

I have only scanned over some of your posts, but it seems to me that part of the miscommunication here may be because @Aranfan is trying to communicate the infinitesimal calculus as it was understood in the 18th century. Many of the issues you're raising seem valid based on a modern concept of rigour and limit-based calculus where you can't be quite as cavalier with your "dx"s. That seems to me like it may be causing some of the disconnect here?

Like, I don't think @Aranfan is saying that this is a 100% rigorous treatment in terms of modern limit-based calculus, he's trying to give a flavour of how mathematicians like Leibniz or Euler would have conceptualised things like the differential. If you've ever taken a history of mathematics module (I did because it was an easy course on my joint degree), you quickly learn that historical mathematicians were often happy to play pretty fast and loose with stuff in a way modern mathematicians are much more pernickety about.
 
@Altaigne

I have only scanned over some of your posts, but it seems to me that part of the miscommunication here may be because @Aranfan is trying to communicate the infinitesimal calculus as it was understood in the 18th century. Many of the issues you're raising seem valid based on a modern concept of rigour and limit-based calculus where you can't be quite as cavalier with your "dx"s. That seems to me like it may be causing some of the disconnect here?

Like, I don't think @Aranfan is saying that this is a 100% rigorous treatment in terms of modern limit-based calculus, he's trying to give a flavour of how mathematicians like Leibniz or Euler would have conceptualised things like the differential. If you've ever taken a history of mathematics module (I did because it was an easy course on my joint degree), you quickly learn that historical mathematicians were often happy to play pretty fast and loose with stuff in a way modern mathematicians are much more pernickety about.

Correct. Infinitesimals can be made rigorous to modern standards with the Hyperreals and the Standard Part Function. But this is more about the calculus as practiced in the 18th century. Where people said "assuming dx constant" to say "x is the independent variable", felt no compunction about swapping the differentials in the denominators in physics equations, and didn't bother to find out if an infinite sum was convergent or not before evaluating them (Euler was especially notable for that one). It wasn't until Fourier demonstrated you could use infinite trig series to approximate discontinuous and jagged functions just like you can use infinite polynomial series to approximate smooth ones that people had to get a lot more rigorous about things because discontinuous and jagged functions are a lot less well behaved than the kinds of functions people were dealing with up until then.

The thing about the notation for the second derivative? That actually mattered, Leibniz never wrote anything like the modern notation for the second derivative without the latin for "assume dx/dy/ds[delete as appropriate] constant" showing up earlier on the page. Euler had rules for changing the "progression of the variables", and the Burnoullis and Leibniz argued that being able to assume other variables than time were independent variables was an advantage of the differential calculus over the calculus of fluxions. We have records of an argument between Leibniz and Huygens where Leibniz recasts Newtonian Mechanics with space being the independent variable rather than time and argues that it makes more sense to do it that way even though they are mathematically equivalent, Huygens thought time being the independent variable for physics made more sense but he didn't dispute the mathematical equivalence.

And yes, this little essay is not aiming for full rigor. You'd be able to do physics and engineering with this level of rigor, but you would have problems with the sorts of pathological functions that come up in pure math.
 
Last edited:
I'm afraid that I'm going to have to side with @Altaigne on this one, @Aranfan.

Just to be clear, I do not say this because you made a mistake somewhere in your formalism - infact, in so far as the equations and treatment of infinitesimals go, you are broadly correctly treating them as algebraic representations of geometric quantities as these symbols would've been understood as in historical 17th-18th century Leibniz derived infinitesimal calculus. People like Leibniz, Bernoulli, and Euler probably would've agreed that unless the progression of x was specified to be arithimetic (ie. the dx sequence is a constant), it is indeed true that:
Second order differential (historical leibnizian calculus):
\[ d(\frac{dy}{dx}) = \frac{dxddy - dyddx}{(dx)^2} \\ \Rightarrow \frac{d(\frac{dy}{dx})}{dx} = \frac{1}{dx}d(\frac{dy}{dx}) = \frac{ddy}{dx^2} - \frac{dy}{dx}\frac{ddx}{dx^2} \]

And they probably wouldn't have batted an eye at algebraic cancellations of differentials like the sort discussed on the previous page:
Of course this works :V:
\[ \textrm{Let } du \neq 0 \\ \frac{ddy}{du^2}\frac{du^2}{dx^2} = \frac{ddy}{dx^2} \]

Indeed, Bernoulli once explicitly wrote down the differential of a differential quotient of the form brought up in the first post in his book Opera Omnia (1742):


and likewise Euler, in his Institutiones Calculi Differentialis (1755):

where Euler notes that when dx "isn't constant", ddx does not become zero, in which:
Euler's notation of the second order differential:
\[ ddy = pddx + qdx^2, \textrm{ where } p = \frac{dy}{dx} \textrm{ and } q=\frac{dp}{dx}=\frac{d(\frac{dy}{dx})}{dx} \\ \Rightarrow q = \frac{d(\frac{dy}{dx})}{dx} =\frac{ddy-pddx}{dx^2} = \frac{ddy}{dx^2} - \frac{dy}{dx}\frac{ddx}{dx^2} \]

Instead the reason I say that @Altaigne is correct while you are wrong, @Aranfan , has very little to do with the formalism and everything to do with the semantics.

To be blunt, the statement that the Leibniz notation of the second derivative of y with respect to x is "wrong" is completely incorrect, because in historical Leibnizian infinitesimal calculus d(dy/dx)/dx is not the second derivative of y with respect to x. There is no unique historical naive Leibnizian notation of the second derivative y''(x).

Infact, derivatives for the most part didn't exist in naive Leibnizian calculus - they had quotients or ratios of differentials, which sometimes behave similarly and are written in a similar notation, and did evolve into the idea of the derivative, but they are in fact very different mathematical concepts. d(dy/dx)/dx does gain the semantic meaning equivalent to y''(x) if "dx is assumed constant", as noted prior, but without that assumption ddy/dx^2 can't mean y''.

So complaining about d^2y/dx^2 notation of y''(x) being "wrong" really doesn't make sense. d^2y/dx^2 is a completely correct notation of y''(x) in standard analysis, as detailed by @Altaigne in the previous page. d^2y/dx^2 is also a completely correct notation of y''(x) in historical Leibnizian calculus because the only way d^2y/dx^2 actually means y'' in historical Leibnizian calculus notation is if you assume that dx is constant, in which d^2y/dx^2 = y'' anyways. If you don't assume this, then d^2y/dx^2 can no longer be sensibly interpreted as y''.

Bos (1974) had a section in his seminal work on this, which puts it best IMO (which, overall, is IMO an excellent work on historical naive Leibnizian calculus in general):



From Bos, H. J.M. "Differentials, Higher-Order Differentials and the Derivative in the Leibnizian Calculus." Archive for History of Exact Sciences 14, no. 1 (March 1974): 1–90. https://doi.org/10.1007/bf00327456. p.29-32
 
Last edited:
Instead the reason I say that @Altaigne is correct while you are wrong, @Aranfan , has very little to do with the formalism and everything to do with the semantics.

Okay, that's a fair objection. The idea of the derivative didn't really exist for Leibniz, just various differential quotients. So you have Leibniz saying radius of curvature is given by:

LaTeX:
\[ r=\frac{dy}{d(\frac{dx}{ds})} \]

Which is just a complete head scratcher to interpret from a modern point of view.

The derivative didn't really exist as a concept until Lagrange did his work on Taylor Series. While he was contemporaries with Euler, I probably shouldn't have been so free tossing around "derivative" in the essay itself. I just couldn't help myself when it came to the fundamental theorem just because of how much space it saves, and from there it infected the entire work. Which led to my rant.

Ultimately the issue is that x and dx do not vary in concert, and thus must be treated as independent variables when differentiating. So the second derivative is really a partial rather than a total differential quotient.
 
Last edited:
Scroll 3: Taylor Series
I think its a bit too early to go on a tangent, try and keep focused and move this to the end.

You know, if we go on enough tangents, we can actually get from point a to point b.

TAYLOR SERIES​

Here is a dirty secret that Big Math doesn't want you to know: humans are bad at math. The only math we know how to do is add, subtract, multiply, and divide. It is impossible to actually evaluate any other mathematical operation, outside of a few special cases where we have figured out the answer to specific values. We know that sin (0)=0 for instance, and cos (0)=1, but these are the exception and we have no idea how to actually evaluate something like a sine function or a logarithmic function generally.

So we cheat using approximations. And one of the most useful approximations is the infinite polynomial series, named after Brook Taylor, a student of Newton, who was the first to publish an in depth study of them. The reason they are so useful is that when using them you only need to evaluate the exotic function and its derivatives at one value, chosen to be easy, and the rest is just the four basic operations of arithmetic which we know how to do. The infinitesimal calculus, specifically the fundamental theorem and integration by parts, provide a way to determine the approximating series for any sufficiently smooth and nicely behaved function.

So lets say you have an expresion that can be plotted as a curve such as y=f(x), that you want to evaluate at some point "b" even though we only know how to actually evaluate it a select few points of which b is not one. What we do is start with the fundamental theorem of calculus:
LaTeX:
\[ \int_a^b \frac{dy}{dx}dx = y|_b - y|_a \\ y|_b=y|_a+\int_a^b \frac{dy}{dx}dx \]

And then we apply integration by parts:
LaTeX:
\[ \int_a^b udv=uv|_b-uv|_a-\int_a^bvdu \\ \mathrm{let} \\ u=\frac{dy}{dx}, du=d(\frac{dy}{dx}) \\ v=-(b-x),dv=dx \\ y|_b=y|_a+(b-b)\frac{dy}{dx}|_b-(-(b-a)\frac{dy}{dx}|_a)-\int_a^b-(b-x) d(\frac{dy}{dx}) \\ y|_b=y|_a+(0)\frac{dy}{dx}|_b+(b-a)\frac{dy}{dx}|_a+\int_a^b(b-x) d(\frac{dy}{dx})\frac{dx}{dx} \\ y|_b=y|_a+(b-a)\frac{dy}{dx}|_a+\int_a^b(b-x) \frac{d(\frac{dy}{dx})}{dx}dx \]
And then we do it again. Again and again and again until we have enough terms to approximate with enough accuracy:
LaTeX:
\[ y|_b=y|_a+(b-a)\frac{dy}{dx}|_a+\int_a^b(b-x) \frac{d(\frac{dy}{dx})}{dx}dx \\ \mathrm{let} \\ u=\frac{d(\frac{dy}{dx})}{dx},du=d(\frac{d(\frac{dy}{dx})}{dx}) \\ v=-\frac{(b-x)^2}{2},dv=(b-x)dx \\ y|_b=y|_a+(b-a)\frac{dy}{dx}|_a+\frac{(b-a)^2}{2}\frac{d(\frac{dy}{dx})}{dx}|_a+\int_a^b\frac{(b-x)^2}{2} \frac{d(\frac{d(dy/dx)}{dx})}{dx}dx \\ \mathrm{let} \\ u=\frac{d(\frac{d(dy/dx)}{dx})}{dx},du=\frac{d(\frac{d(\frac{d(dy/dx)}{dx})}{dx})}{dx} \\ v=-\frac{(b-x)^3}{2*3},dv=\frac{(b-x)^2}{2}dx \\ y|_b=y|_a+(b-a)\frac{dy}{dx}|_a+\frac{(b-a)^2}{2}\frac{d(\frac{dy}{dx})}{dx}|_a+\frac{(b-a)^3}{2*3}\frac{d(d(dy/dx)/dx)}{dx}|_a+\int_a^b\frac{(b-x)^3}{2*3} \frac{d(\frac{d(d(dy/dx)/dx)}{dx})}{dx}dx \]

At this point two things become very clear:

1. The denominator of the terms grows very fast, and in a very clear way. It is the number of the exponent of (b-a) times every positive integer below it.

This led to the factorial function being developed to clean up the expresion of the Taylor Series and Taylor's Theorem.

2. The notation is growing increasingly unwieldy. We have to either come up with a new notation, or declare that dx is constant and can be moved in and out of the differential freely.

Lagrange invented new notation f'(x), f''(x), etc, for the successive differential quotients, and called these differential quotients the "derived functions of x". This is where the term "derivative" comes from, and eventually the derivative would displace the differential as the fundamental concept in the calculus. Lagrange's notation has the advantage of being very compact, and makes it clear what value you are evaluating the given function at when you do f(a) or f'(a). However the notation of Leibniz has the advantages of making it crystal clear what variable you are deriving with respect to, emphasizing dimensional consistency at a glance, and allowing algebraic manipulation when working with first order differentials and differential equations. Both notations are commonly in use, but Lagrange's notation is very much associated with the anti-infinitesimal movement that took over the integral and differential calculus starting with Weierstrass.


But anyway, we can thus formulate the following series to approximate any expression with a sufficiently well behaved curve:
LaTeX:
\[ \mathrm{assuming} \; dx \; \mathrm{constant} \\ y|_b=y|_a+(b-a)\frac{dy}{dx}|_a+\frac{(b-a)^2}{2}\frac{ddy}{dx^2}|_a+\frac{(b-a)^3}{3!}\frac{d^3y}{dx^3}|_a+...+\frac{(b-a)^n}{n!}\frac{d^ny}{dx^n}|_a+\int_a^b\frac{(b-a)^n}{n!}\frac{d^{n+1}y}{dx^{n+1}}dx \]

Where b is the value you want to evaluate y at, and a is chosen to be easy to evaluate in y and it's various successive differential quotients.

There are, of course, a few caveats in regards to "sufficiently well behaved". All those derivatives have to actually exist, absolute value has the kink at 0, curves that intersect themselves have similar issues. If you are doing square roots you have to pick if you're taking positive or negative roots and stick with it, because playing silly buggers with multivalued functions will always toss up nonsense. And, importantly, that integral term at the end needs to fall towards 0 with each iteration or there is a hard cap on how accurate your approximation can be.

Consideration of the integral remainder, and expecially its behavior with complex valued functions, leads to the notion of "radius of convergence". If "b" is outside of the series' radius of convergence for "a", then you will get less accurate with each iteration. An example of this is the various logarithmic functions: because they have an asymtote at 0, any b between 0 and 2a will get more accurate per iteration, but any b greater than 2a will get less accurate per iteration.

Still, this is quite a useful tool to have. For instance, remember how we didn't know the base of the natural logarithm? We can figure out what it is to whatever degree of accuracy we want! It just depends on how patient we are. The inverse of the natural logarithm, e^x, has an infinite radius of convergence and is its own "derived function". So if we let a=0 and b=1, we'll find out what e is. For n=3 terms of the taylor series we have:
LaTeX:
\[ e^1=1+1+\frac{1}{2}+\frac{1}{6} \\ e=2+\frac{3+1}{6} \\ =2+\frac{4}{6} \\=2.66666... \]

The actual value of e to three figures of accuracy is 2.718, but how close one gets to the true value is only limited by one's patience for computation. And now you know how to compute the trig functions to arbitrary precision, the specifics which is left as an exercise for the reader.
 
Last edited:
The infinitesimal calculus, specifically the fundamental theorem and integration by parts, provide a way to determine the approximating series for any sufficiently smooth and nicely behaved function.
Something that I'm not sure was known at the time, is that even for perfectly smooth (infinitely differentiable at every point) functions, where the Taylor series converges everywhere, it's not necessarily the case that it converges to the function in question at any point other than the point used to construct the series.

The standard function that demonstrates this is (courtesy of the Wikipedia article on bump functions)
LaTeX:
\[ \Psi(x) = \begin{cases} \exp\left( -\frac{1}{1 - x^2}\right), & \text{ if } x \in (-1,1) \\ 0, & \text{ if } x\in \mathbb{R}\setminus \{(-1,1)\} \end{cases} \]
The formula for the non-zero part is defined (and infinitely differentiable) at every point in (-1, 1), and both that function and its derivatives go to 0 at the boundary. That means the whole function and all its derivatives are defined everywhere and equal to zero at those boundary points. Therefore the Taylor series derived from those points is simply equal to 0 everywhere, even though this only correctly approximates the function on one side! Worse, we can add two bump functions together with a horizontal displacement and get a Taylor series that doesn't converge to the right value on either side!
 
Something that I'm not sure was known at the time, is that even for perfectly smooth (infinitely differentiable at every point) functions, where the Taylor series converges everywhere, it's not necessarily the case that it converges to the function in question at any point other than the point used to construct the series.

It definitely wasn't. Lagrange got lucky that "analytic" functions got named after his book, he thought he was talking about all functions. This is because the typical undergraduate response to piecewise counterexamples of "but that's bullshit" is something that the mathematicians of the 18th century also had. Euler and Lagrange thought of functions in terms of algebraic expressions, not in terms of "input gives output". Until Fourier broke everything, most of the functions dealt with by the calculus were complex differentiable, with most of the exceptions being solutions to certain partial differential equations.
 
Last edited:
Back
Top