# Chapter 0: Why Study Calculus?

## Introduction

The purpose of this chapter is to tempt you into learning some calculus.

The purpose of this chapter is to tempt you into learning some calculus.

The purpose of this chapter is to tempt you into learning some calculus.

To study calculus it is essential that you are able to breathe. Without that ability you will soon die, and be unable to continue.

Beyond that, you will need some familiarity with two notions: the notion of a number, and that of a function.

**Suppose I have forgotten everything I ever knew about numbers and functions?**

Do not worry. We will review their properties.

**And if I know everything there is to know about numbers and functions?**

Then you understand calculus already and need not continue.

Before reminding yourself about numbers and functions you might ask the following questions.

**What is calculus?**

**Why should I learn about it?**

Calculus is the study of how things change. It provides a framework for modeling systems in which there is change, and a way to deduce the predictions of such models.

**I have been around for a while, and know how things change, more or less. What can calculus add to that?**

I am sure you know lots about how things change. And you have a qualitative notion of calculus. For example the concept of speed of motion is a notion straight from calculus, though it surely existed long before calculus did and you know lots about it.

**So what does calculus add for me?**

It provides a way for us to construct relatively simple quantitative models of change, and to deduce their consequences.

**To what end?**

With this you get the ability to find the effects of changing conditions on the system being investigated. By studying these, you can learn how to control the system to do make it do what you want it to do. Calculus, by giving engineers and you the ability to model and control systems gives them (and potentially you) extraordinary power over the material world.

The development of calculus and its applications to physics and engineering is probably the most significant factor in the development of modern science beyond where it was in the days of Archimedes. And this was responsible for the industrial revolution and everything that has followed from it including almost all the major advances of the last few centuries.

**Are you trying to claim that I will know enough about calculus to model systems and deduce enough to control them?**

If you had asked me this question in 1990 I would have said no. Now it is within the realm of possibility, for some non-trivial systems, with your use of your laptop or desk computer.

**OK, but how does calculus models change? What is calculus like?**

The fundamental idea of calculus is to study change by studying "instantaneous " change, by which we mean changes over tiny intervals of time.

**And what good is that?**

It turns out that such changes tend to be lots simpler than changes over finite intervals of time. This means they are lots easier to model. In fact calculus was invented by Newton, who discovered that acceleration, which means change of speed of objects could be modeled by his relatively simple laws of motion.

**And so?**

This leaves us with the problem of deducing information about the motion of objects from information about their speed or acceleration. And the details of calculus involve the interrelations between the concepts exemplified by speed and acceleration and that represented by position.

**So what does one study in learning about calculus?**

To begin with you have to have a framework for describing such notions as position speed and acceleration.

Single variable calculus, which is what we begin with, can deal with motion of an object along a fixed path. The more general problem, when motion can take place on a surface, or in space, can be handled by multivariable calculus. We study this latter subject by finding clever tricks for using the one dimensional ideas and methods to handle the more general problems. So single variable calculus is the key to the general problem as well.

When we deal with an object moving along a path, its position varies with time we can describe its position at any time by a single number, which can be the distance in some units from some fixed point on that path, called the origin of our coordinate system. (We add a sign to this distance, which will be negative if the object is behind the origin.)

The motion of the object is then characterized by the set of its numerical positions at relevant points in time.

The set of positions and times that we use to describe motion is what we call a **function**. And similar functions are used to describe the quantities of interest in all the systems to which calculus is applied.

The course here starts with a review of numbers and functions and their properties. You are undoubtedly familiar with much of this, so we have attempted to add unfamiliar material to keep your attention while looking at it.

**I will get bogged down if I read about such stuff. Must I?**

I would love to have you look at it, since I wrote it, but if you prefer not to, you could undoubtedly get by skipping it, and referring back to it when or if you need to do so. However you will miss the new information, and doing so could blight you forever. (Though I doubt it.)

**And what comes after numbers and functions?**

A typical course in calculus covers the following topics:

1. How to find the instantaneous change (called the "derivative") of various functions. (The process of doing so is called **"differentiation"**.)

2. How to use derivatives to solve various kinds of problems.

3. How to go back from the derivative of a function to the function itself. (This process is called **"integration"**.)

4. Study of detailed methods for integrating functions of certain kinds.

5. How to use integration to solve various geometric problems, such as computations of areas and volumes of certain regions.

There are a few other standard topics in such a course. These include description of functions in terms of power series, and the study of when an infinite series "converges " to a number.

**So where does this empower me to do what?**

It doesn't really do so. The problem is that such courses were first designed centuries ago, and they were aimed not at empowerment (at that time utterly impossible) but at familiarizing their audience with ideas and concepts and notations which allow understanding of more advanced work. Mathematicians and scientists and engineers use concepts of calculus in all sorts of contexts and use jargon and notations that, without your learning about calculus, would be completely inscrutable to you. The study of calculus is normally aimed at giving you the "mathematical sophistication" to relate to such more advanced work.

**So why this nonsense about empowerment?**

This course will try to be different and to aim at empowerment as well as the other usual goals. It may not succeed, but at least will try.

**And how will it try to perform this wonder?**

Traditional calculus courses emphasize algebraic methods for performing differentiating and integrating. We will describe such methods, but also show how you can perform differentiation and integration (and also solution of ordinary differential equations) on a computer spreadsheet with a tolerable amount of effort. We will also supply applets which do the same automatically with even less effort. With these applets, or a spreadsheet, you can apply the tools of calculus with greater ease and flexibility than has been possible before.

(There are more advanced programs that are often available, such as MAPLE and Mathematica, which allow you to do much more with similar ease. With them you can deduce the consequences of models of various kinds in a wide variety of contexts. Once you understand calculus they can make its use much easier, but they provide answers given inputs, which does not provide understanding of how they do it.)

Also, we will put much greater emphasis on modeling systems. With ideas on modeling and methods for solving the differential equations they lead to, you can achieve the empowerment we have claimed.

**And I will be able to use this to some worthwhile end?**

Okay, probably not. But you might. And also you might be provoked to learn more about the systems you want to study or about mathematics, to improve your chances to do so. Also you might be able to understand the probable consequences of models a little better than you do now. Also you may get to love the concepts and ideas of calculus.

**Well, what is in the introductory chapter on numbers?**

We start with the natural numbers \((1,2,3,...)\) and note how the operations of subtraction, division and taking the square root lead us to extending our number system to include negative numbers, fractions (called rational numbers) and complex numbers. We also describe decimal expansions (which describe "real numbers") and examine the notion of countability. We mutter about complex numbers as well.

**And in the chapter about functions?**

We start with an abstract definition of a function (as a set of argument-value pairs) and then describe the standard functions. These are those obtained by starting with the identity function (value=argument) and the exponential function, and using various operations on them.

**Operations, what operations?**

These are addition, subtraction, multiplication, division, substitution and inversion.

**But what is the exponential function, and what are substitution and inversion?**

Here are one sentence answers: if you want to know more read the chapter!

The exponential function is mysteriously defined using calculus: it is the function that is its own derivative, defined to have the value 1 at argument 0. It turns out, however, to be something you have seen before. And it turns out to bear a close relation to the sine function of trigonometry.

**Substitution of one function f into another g** produces a new function, the function defined to have, at argument x, the value of f at an argument which is the value of g at argument x. This is simpler than it sounds. Suppose, for example \( y = x^2 \) and \( x = 2z \) then \( y(x(z)) \) is \( (2z)^2 \).

**An inverse of a function** is a function obtained by switching its values with its arguments. For example the square function, usually written as \( x^2 \) has the square root function as an inverse.

In the immortal words of Father William to his nephew, as penned by Lewis Carroll, who was a mathematician:

I have answered three questions and that is enough,

Said the sage, don't give yourself airs.

Do you think I can listen all day to such stuff?

Be off or I'll kick you downstairs!

To study calculus it is essential that you are able to breathe. Without that ability you will soon die, and be unable to continue.

Beyond that, you will need some familiarity with two notions: the notion of a number, and that of a function.

**Suppose I have forgotten everything I ever knew about numbers and functions?**

Do not worry. We will review their properties.

**And if I know everything there is to know about numbers and functions?**

Then you understand calculus already and need not continue.

Before reminding yourself about numbers and functions you might ask the following questions.

**What is calculus?**

**Why should I learn about it?**

Calculus is the study of how things change. It provides a framework for modeling systems in which there is change, and a way to deduce the predictions of such models.

**I have been around for a while, and know how things change, more or less. What can calculus add to that?**

I am sure you know lots about how things change. And you have a qualitative notion of calculus. For example the concept of speed of motion is a notion straight from calculus, though it surely existed long before calculus did and you know lots about it.

**So what does calculus add for me?**

It provides a way for us to construct relatively simple quantitative models of change, and to deduce their consequences.

**To what end?**

With this you get the ability to find the effects of changing conditions on the system being investigated. By studying these, you can learn how to control the system to do make it do what you want it to do. Calculus, by giving engineers and you the ability to model and control systems gives them (and potentially you) extraordinary power over the material world.

The development of calculus and its applications to physics and engineering is probably the most significant factor in the development of modern science beyond where it was in the days of Archimedes. And this was responsible for the industrial revolution and everything that has followed from it including almost all the major advances of the last few centuries.

**Are you trying to claim that I will know enough about calculus to model systems and deduce enough to control them?**

If you had asked me this question in 1990 I would have said no. Now it is within the realm of possibility, for some non-trivial systems, with your use of your laptop or desk computer.

**OK, but how does calculus models change? What is calculus like?**

The fundamental idea of calculus is to study change by studying "instantaneous " change, by which we mean changes over tiny intervals of time.

**And what good is that?**

It turns out that such changes tend to be lots simpler than changes over finite intervals of time. This means they are lots easier to model. In fact calculus was invented by Newton, who discovered that acceleration, which means change of speed of objects could be modeled by his relatively simple laws of motion.

**And so?**

This leaves us with the problem of deducing information about the motion of objects from information about their speed or acceleration. And the details of calculus involve the interrelations between the concepts exemplified by speed and acceleration and that represented by position.

**So what does one study in learning about calculus?**

To begin with you have to have a framework for describing such notions as position speed and acceleration.

Single variable calculus, which is what we begin with, can deal with motion of an object along a fixed path. The more general problem, when motion can take place on a surface, or in space, can be handled by multivariable calculus. We study this latter subject by finding clever tricks for using the one dimensional ideas and methods to handle the more general problems. So single variable calculus is the key to the general problem as well.

When we deal with an object moving along a path, its position varies with time we can describe its position at any time by a single number, which can be the distance in some units from some fixed point on that path, called the origin of our coordinate system. (We add a sign to this distance, which will be negative if the object is behind the origin.)

The motion of the object is then characterized by the set of its numerical positions at relevant points in time.

The set of positions and times that we use to describe motion is what we call a **function**. And similar functions are used to describe the quantities of interest in all the systems to which calculus is applied.

The course here starts with a review of numbers and functions and their properties. You are undoubtedly familiar with much of this, so we have attempted to add unfamiliar material to keep your attention while looking at it.

**I will get bogged down if I read about such stuff. Must I?**

I would love to have you look at it, since I wrote it, but if you prefer not to, you could undoubtedly get by skipping it, and referring back to it when or if you need to do so. However you will miss the new information, and doing so could blight you forever. (Though I doubt it.)

**And what comes after numbers and functions?**

A typical course in calculus covers the following topics:

1. How to find the instantaneous change (called the "derivative") of various functions. (The process of doing so is called **"differentiation"**.)

2. How to use derivatives to solve various kinds of problems.

3. How to go back from the derivative of a function to the function itself. (This process is called **"integration"**.)

4. Study of detailed methods for integrating functions of certain kinds.

5. How to use integration to solve various geometric problems, such as computations of areas and volumes of certain regions.

There are a few other standard topics in such a course. These include description of functions in terms of power series, and the study of when an infinite series "converges " to a number.

**So where does this empower me to do what?**

It doesn't really do so. The problem is that such courses were first designed centuries ago, and they were aimed not at empowerment (at that time utterly impossible) but at familiarizing their audience with ideas and concepts and notations which allow understanding of more advanced work. Mathematicians and scientists and engineers use concepts of calculus in all sorts of contexts and use jargon and notations that, without your learning about calculus, would be completely inscrutable to you. The study of calculus is normally aimed at giving you the "mathematical sophistication" to relate to such more advanced work.

**So why this nonsense about empowerment?**

This course will try to be different and to aim at empowerment as well as the other usual goals. It may not succeed, but at least will try.

**And how will it try to perform this wonder?**

Traditional calculus courses emphasize algebraic methods for performing differentiating and integrating. We will describe such methods, but also show how you can perform differentiation and integration (and also solution of ordinary differential equations) on a computer spreadsheet with a tolerable amount of effort. We will also supply applets which do the same automatically with even less effort. With these applets, or a spreadsheet, you can apply the tools of calculus with greater ease and flexibility than has been possible before.

(There are more advanced programs that are often available, such as MAPLE and Mathematica, which allow you to do much more with similar ease. With them you can deduce the consequences of models of various kinds in a wide variety of contexts. Once you understand calculus they can make its use much easier, but they provide answers given inputs, which does not provide understanding of how they do it.)

Also, we will put much greater emphasis on modeling systems. With ideas on modeling and methods for solving the differential equations they lead to, you can achieve the empowerment we have claimed.

**And I will be able to use this to some worthwhile end?**

Okay, probably not. But you might. And also you might be provoked to learn more about the systems you want to study or about mathematics, to improve your chances to do so. Also you might be able to understand the probable consequences of models a little better than you do now. Also you may get to love the concepts and ideas of calculus.

**Well, what is in the introductory chapter on numbers?**

We start with the natural numbers \((1,2,3,...)\) and note how the operations of subtraction, division and taking the square root lead us to extending our number system to include negative numbers, fractions (called rational numbers) and complex numbers. We also describe decimal expansions (which describe "real numbers") and examine the notion of countability. We mutter about complex numbers as well.

**And in the chapter about functions?**

We start with an abstract definition of a function (as a set of argument-value pairs) and then describe the standard functions. These are those obtained by starting with the identity function (value=argument) and the exponential function, and using various operations on them.

**Operations, what operations?**

These are addition, subtraction, multiplication, division, substitution and inversion.

**But what is the exponential function, and what are substitution and inversion?**

Here are one sentence answers: if you want to know more read the chapter!

The exponential function is mysteriously defined using calculus: it is the function that is its own derivative, defined to have the value 1 at argument 0. It turns out, however, to be something you have seen before. And it turns out to bear a close relation to the sine function of trigonometry.

**Substitution of one function f into another g** produces a new function, the function defined to have, at argument x, the value of f at an argument which is the value of g at argument x. This is simpler than it sounds. Suppose, for example \( y = x^2 \) and \( x = 2z \) then \( y(x(z)) \) is \( (2z)^2 \).

**An inverse of a function** is a function obtained by switching its values with its arguments. For example the square function, usually written as \( x^2 \) has the square root function as an inverse.

In the immortal words of Father William to his nephew, as penned by Lewis Carroll, who was a mathematician:

I have answered three questions and that is enough,

Said the sage, don't give yourself airs.

Do you think I can listen all day to such stuff?

Be off or I'll kick you downstairs!

To add or subtract complex numbers (which are entities of the form \(a + bi\)) you do the appropriate thing to the real parts (the \(a\)'s) and the imaginary parts (the \(b\)'s) separately.

For example, we have

\[( 4 + 3i) - (7 - i) = (4 - 7) + (3 - -1)i = -3 + 4i\]

To multiply two complex numbers, you multiply out the terms in the two factors (using the linearity of multiplication (aka the distributive law) and use the fact that \(i^2\) is \(-1\).

For example, we get

\[ \begin{aligned} (4 + 3i) * (7 - i) &= 4 * 7 + 4 * (- i) + 3i * 7 + 3i * (-i)\\ &= 28 - 4i + 21i - 3i^2\\ &= 28 + 3 + 17i\\ &= 31 + 17i \end{aligned} \]

Division is slightly trickier, because we want our answer to have the form \(a + bi\) and not that of a ratio of such things (though \(a\) and \(b\) can be ratios).

To get this we use the wonderful fact that any complex number multiplied by its complex conjugate (what you get by reversing the sign of its \(b\)) is a real number.

In symbols, this reads \((a + bi) * (a - bi) = (a^2 + b^2)\).

**How come?**

Multiply it out using the distributive law and see.

**What good is this?**

We rewrite this equation as \(a + bi = \frac{a^2 + b^2}{a - bi}\), which tells us that **multiplying by \((a + bi)\)** is the same as **multiplying by the real number \((a^2 + b^2)\), and dividing by \((a - bi)\).**

This means **dividing by \((a + bi)\)** is the reciprocal operation to that, which is **multiplying by \((a - bi)\)** and also dividing by the real number **\((a^2 + b^2)\)**.

So dividing by a complex number, say \((3 + 2i)\) is the same as multiplying by \((3 - 2i)\) and dividing the result by \(3^2 + 2^2\) which is \(9 + 4\) or \(13\).

So, for example, **\(\frac{7-i}{(3+2i)}\)** is \(\frac{(7-i)(3-2i)}{13}\) which is \(\frac{(21-2)-(14+3)i)}{13}\) or **\(\frac{19-17i}{13}\).**

**We thus have rules for adding, subtracting, multiplying and dividing complex numbers**.

By the way, the quantity \(a^2 + b^2\) is called **the square of the magnitude** of the complex number \(a + bi\).

A complex number, (\(a + ib\) with \(a\) and \(b\) real numbers) can be represented by a point in a plane, with \(x\) coordinate \(a\) and \(y\) coordinate \(b\).

This defines what is called the "complex plane". **It differs from an ordinary plane only in the fact that we know how to multiply and divide complex numbers to get another complex number**, something we do not generally know how to do for points in a plane.

This picture suggests that there is another way to describe a complex number. Instead of using its real and imaginary parts, which are its \(x\) and \(y\) coordinates to describe it. We can use the distance from its point in the complex plane to the origin \((0,0)\), and **the angle formed by a line segment from the origin to that point, and the positive half of the \(x\) axis**. The **distance to the origin is usually denoted as \(r\)**, that angle is usually called \(θ\) (theta). \(θ\) is called the **"phase"** and sometimes the **"argument"**" of the complex number. **The distance to the origin is called its "magnitude" and also its "absolute value".**

**How are these parameters, \(r\) and \(\theta\), related to \(a\) and \(b\)?**

We use the Euclidean definition of distance, for which the Pythagorean theorem holds. This tells us

**\[r^2 = a^2 + b^2 \enspace \text{and so} \enspace r = \sqrt{a^2 + b^2}\]**

As for \(\theta\), we use the standard trigonometric definitions of sines and cosines. The sine of an angle is defined to be the ratio of its y-coordinate \(b\) to length \(r\), and the cosine is the ratio of its x-coordinate \(a\) to \(r\). Thus \(\theta\) is an angle whose sine is \(\frac{y}{r}\), and whose cosine is \(\frac{x}{r}\).

This gives us the relations

**\[a = r\cos\theta \enspace \text{and} \enspace b = r\sin\theta\]**

**What good is this?**

Lots of good as we shall eventually see. But right now we can notice the following curious fact:

In terms of \(a\) and \(b\), called the real and imaginary parts of the complex number, addition and subtraction are easy to describe, (add or subtract each part separately as if the other didn’t exist: \((a+bi) + (c+di) = (a+c) + (b+d)i\), but multiplication and division are a bit ugly.

In terms of \(r\) and \(θ\), the magnitude and phase of a complex number, the opposite is true. That is, multiplication and division are easy to describe, while addition and subtraction are a bit ugly.

**How so?**

Well, **you can multiply two complex numbers together by multiplying their magnitudes together, and adding their phases**. You divide by dividing the magnitudes correspondingly, and subtracting the phase of the denominator from that of the numerator.

Explicitly, we have **the product of the complex number with magnitude \(r_1\) and phase \(\theta_1\) with the complex number with magnitude and phase \(r_2\) and \(\theta_2\), is the complex number with magnitude \(r_1*r_2\), and phase \(\theta_1 + \theta_2\).**

(The rules for adding and subtracting in terms of magnitude and phase can be deduced from the rules in terms of real and imaginary parts, but are not particularly illuminating, because they are messy.)

You can see all this on the following mathlet. You can move the complex numbers, \(w\) and \(z\) around by clicking the left mouse button on the appropriate head and holding it down as you move. It allows you to examine the behavior of sums products differences and ratios of complex numbers as you change them. To see what you can do with this mathlet, click on "+ about" in its upper right hand corner.

Properties of the various kinds of numbers we will encounter are reviewed.

We have lots of kinds of numbers but they all start with the **natural numbers**, which are \(1, 2, 3\), and so on.

If you count your figures and toes, you will come to \(20\) (most of you will), and that is a natural number. We can, in our imagination, consider that these natural numbers go on forever, past a million, a billion, a trillion, and so on.

In elementary school you studied not only these numbers, but how you can perform operations on them.

**What operations?**

There are **addition, subtraction, multiplication** and **division**.

You can **add** two natural numbers together, and you will always get another natural number, as in the famous fact that one and one are two.

Subtraction, on the other hand, is trickier. If you subtract a number, for example the number \(5\), from itself, you get something new, something that is not a natural number at all. We call it the number \(0\) or **zero**. And if you subtract a number, again say \(5\), from a smaller number, say \(3\), then you get something else that is new, namely a negative integer, which in this case is \(-2\), called **"minus two"**.

You can use numbers to count the number of pennies you have in your pocket. Thus you might have five pennies in your pocket. Zero is the number of pennies you would have if your pocket had a hole in it, and all those you put in immediately fell out again.

Now suppose you go to a store, and the storekeeper is foolish enough to give you credit. Suppose further that you had five pennies, and you bought some expensive item costing 11 pennies. Then the negative integer, \(-6\), represents the fact that not only do you have no pennies but if you got six more, you would be obligated to surrender them to pay for this item. Six here is the number of pennies you would owe your creditor, if you were to pay him your \(5\) pennies and he gave you the object, and lent you the rest of the money.

So to accommodate subtraction, and to be able to represent "amount owed" by numbers, we extend the natural numbers to include the numbers \(0\) and the negatives of the natural numbers. This entire set of numbers, positive natural numbers, their negatives and 0 is called the set of **integers**, and is denoted by the letter **\(Z\).**

We can take any two members of **\(Z\)** and add them or subtract them and in either case get another member of **\(Z\).**

**I know all that, but I am very rusty on actual additions and subtractions. I get them wrong much of the time I try to do them.**

Most people will make a mistake roughly once in any ten additions or subtractions of single digits that they perform. This means that if they add or subtract numbers having many digits, like \(1234123\) and \(5432121\) they stand an excellent chance of getting the wrong answer.

Fortunately that is of no significance today. You can easily check additions and subtractions on a calculator or on a spreadsheet, and see if you get the same answer several different times. Unfortunately I usually make an error in keying in the numbers to add or subtract, or add instead of subtract or do something else equally absurd. All that means today is that I must do every calculation at least three times, to have a reasonable chance of correctness. True the amount of my effort is triple what it might be, but three times very little effort is still very little effort.

If you have this problem you will be best off adding or subtracting on a spreadsheet. Then you can look at your computation and use your judgment as to whether it makes sense. Here are some rules for checking for sense.

When you add positive numbers the result should be bigger than both of the two **"summands"** that you added. If one of the numbers is positive and one is negative, the magnitude (the value if you ignore any minus sign) of the sum should be smaller than the magnitude of the larger of the two, and the sign should be that of the summand with the larger magnitude.

Also, the least significant digits of your numbers should add or subtract correctly, if you ignore the rest. For example, if you subtract \(431\) from \(512\) then the last digit of the answer had better be \(1\) which is \(2\) minus \(1\).

If your checking produces something suspicious, try your computation again, being more careful, particularly with the input data.

The operation of subtracting 5 from another number, **undoes** the operation of adding \(5\) to another number. Thus, if you do both operations, add five and then subtract five, or vice versa, you are back where you started from: \(3 + 5 - 5 = 3\).

Adding \(5\) and subtracting \(5\) are said to be **inverse** operations to one another, because of this property: **Performing them one after the other is equivalent to doing nothing.**

**By the way, why isn’t \(0\) a natural number?**

I have no idea. That’s the way people defined natural numbers long ago, and nobody has cared much for changing that definition.

Back in elementary school you also encountered the notion of **multiplication.** This is something you can do to two integers which will produce a third one called their **product.** You were (I hope) forced to learn a multiplication table which gives the product of each pair of single digit numbers and then learned how to use this table to multiply numbers with more digits.

**I was never very good at this** **.**

In olden days you had to be able to do these things, additions and multiplications, if only to be able to handle money and to perform ordinary purchases without being swindled.

Now you can use a calculator or computer spreadsheet to do these things, if you know how to enter integers and to push the \(+\) or \(-\) or \(*\) and = buttons as appropriate.

( *Unfortunately this fact has led pedagogues to believe they do not have to force pupils to go through the drudgery of learning the multiplication table.*

*This does much harm to those who don't bother to do so, because of the way our brains function. It turns out that the more time we spend on any activity as children, and even as adults, the bigger the area of the brain gets that is devoted to that activity, and the bigger it gets, the better we get at that activity.*

*Thus, your spending less time learning the multiplication table has the effect of reducing the area of your brain devoted to calculation, which impedes your further mathematical development.*

*Your skill at mathematics will be directly proportional to the amount of time you choose to devote to thinking about it. And that is up to you.* )

Once we are acquainted with multiplication, a natural question is: how can we undo multiplication? What is the inverse operation, say to multiplying by \(5\), so that multiplying and then doing it is the same as doing nothing? This operation is called **division.** So you learned how to divide integers. The **inverse operation to multiplying by \(x\) is dividing by \(x\)**, (unless \(x\) is \(0\)).

Now here comes a problem: if we try to divide \(5\) by \(3\) we do not get an integer. So, just as we had to extend the natural numbers to integers to accommodate the operation of subtraction, **we have to extend our numbers from integers to include also ratios of integers** , like \(\frac{5}{3}\), if we want to make division well defined for every pair of non-zero integers. And we want to be able to define division wherever we can.

Ratios of integers are called rational numbers, and you get one for any pairs of integers, so long as the second integer, called the denominator, is not zero. Ratios like \(\frac{5}{3}\) which are not themselves integers are called **fractions.**

Once we have introduced fractions, we want to provide rules for adding and subtracting them and for multiplying and dividing them. These start to get complicated, but fortunately for us, we have calculators and spreadsheets that can do these things without complaining at all if we have the wit to enter what we want done.

There is one thing we cannot do with our rational numbers, and that is to divide by \(0\). Division, after all, is the action of undoing multiplication. But multiplying any number by 0 gives result \(0\). There is no way to get back from this \(0\) product what you multiplied \(0\) by to get it.

Of course adding and multiplying (and subtracting and dividing) fractions is more complicated than doing so for integers. To multiply say \(\frac{a}{b}\) times \(\frac{c}{d}\), the new numerator is the product of the old ones (namely \(ac\)) and the new denominator is the product of the old ones (\(bd\)), so the product is \(\frac{ac}{bd}\): \(\frac{a}{b}*\frac{c}{d} = \frac{ac}{bd}\).

The inverse operation of multiplying by \(\frac{c}{d}\) is multiplying by \(\frac{d}{c}\), and that inverse is by definition the operation of dividing by \(\frac{c}{d}\). The product of any number and its inverse is always \(1\). This means that \(\frac{d}{d}\) is always \(1\) for any \(d\) other than \(0\).

Thus \(\frac{a}{b}\) divided by \(\frac{c}{d}\) is \(\frac{a}{b}\) multiplied by the inverse of \(\frac{c}{d}\) which is \(\frac{a}{b}\) multiplied by \(\frac{d}{c}\). The answer is \(\frac{ad}{bc}\).

Adding is a bit trickier. The notion of addition can be applied to objects as well as to numbers, in the following sense. We know, for example, that \(3+5\) is \(8\). That means that if we have 3 radishes and dig up \(5\) more, we will have \(8\) radishes (assuming nobody has eaten the first \(3\)). And the same is true for any other objects in place of radishes. This tells us how to add fractions that have the same denominator. Thus \(\frac{3}{a} + \frac{5}{a}\) is \(\frac{8}{a}\) in which \(\frac{1}{a}\) has replaced a radish. We are applying the general rule for addition of like things to the object \(\frac{1}{a}\).

To add fractions with different denominators you must first change them so that the denominators are the same, then add the numerators like you were adding numbers. The easiest way to do this is to make the new denominator the product of the old ones. Thus to find \(\frac{a}{b} + \frac{c}{d}\) you first multiply the first term by \(\frac{d}{d}\), and the second by \(\frac{b}{b}\), getting \(\frac{ad}{bd} + \frac{cb}{bd}\) and the answer is \(\frac{ad+cb}{bd}\). You can do the same sort of thing for subtraction.

You were probably forced to factor out common terms in the numerator and denominator in that answer in school, but you don’t have to do that in entering the answer in a spreadsheet, which makes addition of fractions much easier when you use spreadsheets.

We have a nice way to represent numbers including fractions, and that is as decimal expansions. Suppose we consider numbers like \(\frac{1}{10}\), \(\frac{2}{10}\), (which is the same as \(\frac{1}{5}\)), \(\frac{3}{10}\), and so on.

We write them as \(.1 , .2, .3\), and so on. The decimal point is a code that tells us that the digit just beyond it is to be divided by ten.

We can extend this to integers divided by one hundred, by adding a second digit after the decimal point. Thus \(.24\) means \(\frac{24}{100}\). And we can keep right on going and describe integers divided by a thousand or by a million and so on, by longer and longer strings of integers after the decimal point.

However we do not get all rational numbers this way if we stop. We will only get rational numbers whose denominators are powers of ten. A number like 1/3 will become \(.33333....\), where the threes go on forever. (This is often written as \(.3*\), the star indicating that what immediately precedes it is to be repeated endlessly)

To get all rational numbers using this decimal notation you must therefore be willing to go on forever. If you do so, you get even more than the rational numbers. The set of all sequences of digits starting with a decimal point give you all the rational numbers between 0 and 1 and even more. What you get are called the **real numbers** between 0 and 1. The rational numbers turn out to be those that repeat endlessly, like \(.33333....\), or \(.1000....\), or \(.14141414....\), (aka \(.(14)*\)).

Now neither you nor I nor any computer are really going to go on forever writing a number so there is a sense of unreality about this notion of real numbers, but so what? In your imagination you can visualize a stream of numbers going on forever. That will represent a real number.

If you stop a real number after a finite number of digits, you get a rational number (because all its entries after where you stopped are zeroes). As a result, the rules of addition, subtraction, multiplication and division that work for rational numbers can be used to do the same things for real numbers as well. Fortunately, the digits that are far to the right of the decimal point in a number have little effect on computations when there are non-zero digits much closer to the decimal point.

Since we cannot in real life go on forever to describe a non-rational real number, to do so we have to describe it some other way. Here is an example of different way to describe a number.

We define the number that has decimal expansion \(.1101001000100001....\); between **each consecutive pair of \(1\)'s there is a number of \(0\)'s that is one more than between the previous consecutive pair of 1's.**This number is not rational; it does not repeat itself.

We do not have to, but just for the fun of it, we will go one step further and extend our numbers once more, to complex numbers. This is required if you want to define inverses to the operation of squaring a number. (Complex numbers are entities of the form \(a+bi\) where \(a\) and \(b\) are real numbers and \(i\) squared is \(-1\).)

Among the operations of multiplication is that of squaring a number. This is the operation of multiplying a number by itself. Thus \(5\) times \(5\) is \(25\). We can ask for the inverse of this squaring operation. This is an operation that acting on \(25\) should give back \(5\). This operation has a name: it is called the **square root.** A square root of \(25\) is \(5\).

There are two wonderful complications here. The first is that \(-5\) times \(-5\) is also \(25\), so \(25\) has two square roots, \(5\) and \(-5\). And the same thing holds for any positive real number. Any positive real number has two square roots.

The second complication is: what on earth is the square root of a negative number?

Well no real number has square that is \(-2\) or \(-1\) or minus anything positive.

When we found that subtraction, which is something of an inverse operation to addition, among natural numbers led to non-natural numbers, we extended the natural numbers by defining the **integers** to include both the natural numbers and their negatives and zero as well.

When we considered division, which is an inverse operation to multiplication, we extended our numbers again to include **fractions.**

Well, to accommodate the inverse operation to squaring a number, we can also extend our numbers to include new entities among which we can find square roots of negative numbers.

It turns out to do this we need only introduce one new number, usually designated as **i**, which is defined to have square given by \(-1\). In other words, we define the new number i to obey the equation \(i * i = -1.\) We can get numbers whose squares are any other negative number, say \(-5\), by multiplying \(i\) by an appropriate real number, here by the square root of \(5\). The number **\(i\)** is definitely not a real number, so we call it **an imaginary number;** this nomenclature is in fact silly. Imaginary numbers have just as much existence in our imaginations as real numbers have. Of course they are not natural numbers or integers or even fractions, or real numbers at all.

It turns out that if we look at numbers of the form \(a + bi\) where \(a\) and \(b\) are real numbers, we get what are called **the complex numbers,** and we can define addition, subtraction multiplication, division for these just as we can for rational or real numbers.

If you want to see what these rules are, **click here.**

So by numbers we will mean things like **the rational numbers, the real numbers or complex numbers,** among which the operations of addition, subtraction multiplication and division are defined and have all the standard properties.

By the way, we often represent complex numbers by points in the plane. Real numbers correspond to points on the x-axis, and imaginary numbers can be considered points on the y-axis. The number \(i\) is a distance \(1\) above the origin on the y-axis. A general complex number has real part that is described by its \(x\) component and complex part described by its \(y\) component.

**A set is said to be countable, if you can make a list of its members**. By a **list we mean that you can find a first member, a second one, and so on, and eventually assign to each member an integer of its own**, perhaps going on forever.

**The natural numbers are themselves countable- you can assign each integer to itself.** The set \(Z\) of integers is countable- make the odd entries of your list the positive integers, and the even entries the rest, with the even and odd entries ordered from smallest magnitude up. Here is how this particular sequence of numbers begins:

\[1, 0, 2, -1, 3, -2, 4, -3, ...\]

(If a set is countable you can list it in lots of ways.)

**The positive rational numbers, are also countable** , and here is why. Take first all those whose numerator and denominator sum to \(1\), then \(2\) then \(3\), and so on. When several do so, order them by size. Here is how this list begins. \(\frac{0}{1}, \frac{1}{1}, \frac{1}{2}, \frac{2}{1}, \frac{1}{3}, \frac{2}{2}, \frac{3}{1}, \frac{1}{4}, \frac{2}{3}, \frac{3}{2}, \frac{4}{1}\), and so on. Every positive rational number appears on this list somewhere, and actually appears often on it. (This happens because \(\frac{1}{2}\) appears as \(\frac{1}{2}\) and also as \(\frac{2}{4}\) and \(\frac{3}{6}\) and so on. ) But all fractions eventually appear, and appear over and over again.

**The number of positive and negative rational numbers are also countable,** and we can list all together by taking alternatively from lists of each separately as done above for integers.

\[\frac{0}{1}, \frac{1}{1}, \frac{-1}{1}, \frac{1}{2}\]

Rational numbers are described by pairs of integers, and the arguments above generalize to imply that any collection of pairs of members of a countable set are countable. And this can be generalized to the statement that a countable set of countable sets is countable.

**It follows then that algebraic numbers** , which are all solutions to polynomial equations of finite degree with integer coefficients are countable. There are a countable number of finite degrees, and a countable number of sets of coefficients for each degree and a countable number of solutions for each equation, and so the algebraic numbers of countable sets of countable sets of countable sets, which is still countable.

**On the other hand, the set of all decimal expansions is not countable.**

**How come?**

Well, if we had a list of all decimal expansions, we could easily construct a number that cannot be on it. Just **make its entry \(j\) places beyond the decimal point differ by \(2\) from the entry in that place of the number that is \(j^{th}\) on your list** . Then the number we create differs from every number on your list and therefore cannot be on it. All of this means that we cannot list all the real numbers or decimal expansions!

Neither you nor I could actually do the infinite number of acts necessary to construct such a number, but we can imagine it done.

To visualize this imagine the first three numbers on your list were

\(.101010*\)

\(.314720*\)

\(.71324*\)

A number not on the list by the construction given would start by \(.335\) since this sequence differs from the first number by \(2\) in the first place, differs from the second number by \(2\) in the second place and from the third member by 2 in the third place. The number we ultimately get will definitely differ from the three numbers above. With the rest of its digits similarly determined in reference to the next numbers in sequence on the list, we can deduce that this number cannot be on the list anywhere, so long as the number of its digits is the same as the length of the list.

This means that the set of all decimal expansions cannot possibly be listed. The decimal expansions are uncountable.

**Are decimal expansions the same as real numbers?**

Actually every real number between 0 and 1 has a decimal expansion, but some, namely those rational numbers that end with all \(0\)’s, appear twice as decimal expansions. The reason is that \(0.99999...\) is really the same as \(1.00000\). Since these are a subset of the rational numbers, this difference is of no particular importance.

**Exercise: Subtract \(.9*\) from \(1.0*\). What do you get** ? If the \(9\)'s stopped somewhere, you could subtract and get a \(1\) in the next place and \(0\)'s everywhere else. But what happens when the \(9\)'s never stop?

*The aim of this chapter is to familiarize you with use of a spreadsheet in mathematics.* The first section describes what to do with one, and the later ones describe applications to investigating Fibonacci numbers, Binomial coefficients, and Areas of irregular figures.

It is a rectangular arrangement of boxes looking like a gigantic empty crossword puzzle.

Why bother with one?

You will see that you can do amazing things with them with very little effort. We’ll do some as illustrations. You should try doing them by yourself. When you are done, you will know enough about spreadsheets to use them productively to solve problems and check work. You can do anything doable on a graphing calculator, but you can see all the results and intermediate steps and correct anything any time.

How do you do things?

You enter things into the boxes. You can left click your mouse onto any box, and then type in your entry. (There are analogous ways to do this on a mobile device.) By the way, each box has a name given by its column letter (columns run from A to Z then AA to AZ then BA to BZ etc.; and a row number. Rows run from 1 to thousands.)

What can you enter?

**Ordinary prose (or poetry): to do so just type it in.****Numbers: type them in.****Any function you have ever heard of and lots more with variable the content of some other box:**

For example, typing =sin(A2) in B2 will put the sine of the number in A2 (given in radians) into box B2.

You must start by typing in an equal sign, then any function you know the name of, or can pick from the list of functions given by the spreadsheet. On my spreadsheet you can click on ‘formulas’ at the top of the page and see and choose the one you want. There are so many listed that you may get dizzy if you try to look at the list, but you will recover.

Of course you can also use parentheses and many functions to make your own complicated formulas using many different functions. For example =sqrt(sin(A2)*exp(A3)/(1+atan(A5)) will give the square root of the product of the sine of what is in A2 and (e to the power that is in A3) all divided by (1 plus the angle whose tangent is in A5, described in radians between \(-\pi\) and \(\pi)\).

OK but you can do all this on a calculator.

The best features come from what happens when you copy what is in one box (or a rectangle of boxes) elsewhere.

**When what you have in box B2 refers to some other box, say A2, when you copy B2 somewhere else, the reference box moves with it.** Thus if in B2 you have put =sin(A2), and you copy B2 into say, D2.

Then D2 will contain =sin(C2). Copying B2 into R7 will put =sin(Q7) there. If I remember my alphabet correctly, B comes right after A, D right after C, and R right after Q.

OK, how do I copy?

**You click on the box you want to copy, press Ctrl and c at the same time, and the entry will enter "the clipboard". You may then move the cursor to where you want to copy, and press Ctrl and v at the same time.** Try it and see. (by the way, if you have done something you didn’t want to do, then **pressing Ctrl and z at the same time undoes it**.)

Suppose I don’t want the reference to change when I copy something?

**All you have to do is put a dollar sign (a $) in front of the index (letter or number or both) that you do not want to change.** Thus, =sin($A2) will not change the column index which will stay A. Similarly =sin(A$2) anywhere will keep the reference in the second row, and putting dollar signs in front of both will keep the reference box A2 no matter where you copy it.

Even better, you can copy a whole rectangle, or copy a single box everywhere in a rectangle.

How?

Suppose you want to copy the contents of box B2 into the rectangle with corners C5 and E100.

**First you click on box B2 and press Ctrl and c at the same time.**

The next step is selecting the target block of boxes. **To do this you move the cursor to C5 do a left click, and hold down the shift key while you move the cursor to E100. Then do a left click.** The blocks in the rectangle should then show that they have been "selected". Finally you press Ctrl and v. This should do it.

Try doing this a few times.

Is there an easier way to copy?

**Yes. You can copy material down or to the right (and using the menu you can copy to the left or up as well.)**

**To do this ‘select’ the material (all in one row) that you want to copy down along with the places below it that you want to copy them into.** (Select as described above.) **Next press Ctrl and d simultaneously. (you can hold down the Ctrl key and while it is down press on the d key.)**

**To copy to the right you select appropriately in one column and press Ctrl and r together similarly.**

To go up or to the left there is a icon near the top right on the home menu which brings down your choice of whatever direction you want to copy.

Spreadsheets today allow you to do so many things that they are scary. There are on old Excel (2007) 7 columns of menu pages, each of which allows choice among roughly 20 menus, which you can drop down allowing many many options, but you can ignore them all, if you know how to enter functions and copy. Well, if you want to save what you have done in a file, you can press Ctrl and s together. You will then have to state what file you want to save to.

OK what can I do with this stuff?

The copying properties just discussed allow quick generation of functions, derivatives, and sums and integrals of functions, whatever these words mean.

As an example, lets look at the Fibonacci numbers. They were studied first by Fibonacci during the Middle Ages. They are defined by the following conditions:

**\(F(0) = 0, \: F(1) = 1\)**

and for all integer arguments, we have

**\(F(j+2) = F(j+1) + F(j)\)**

In words, each Fibonacci number is the sum of the previous two of them.

These numbers have lots of interesting properties, and we shall look at two of them.

Start by entering the words Fibonacci numbers in say box A1. (In case you ever want to look later at what you are doing now, having a label helps.)

Add the following labels: n in A9, F(n) in B9, Golden ratio in C9, Partial sums in D19, and F(-n) in E9.

Then enter \(0\) in A10 and =A10+1 in A11.

Now copy this down column A to A60.

What do you see? Not much; you see the integers from \(0\) to \(50\).

OK. Now in B10 enter \(0\) and in B11, enter \(1\). Then enter =B10+B11 in B12.

Copy B12 down column B to B60.

You will see the Fibonacci numbers in that column, from argument \(0\) up to \(50\).

Next let us look at the ratio of Fibonacci numbers to their immediate predecessors.

Do this by entering **=B12/B11** in C12, and copying it down to C60.

What do you see?

Let’s figure out what the number you are seeing is. Suppose what is in B41 is \(x\) times what is in B40, and what is in B42 is similarly approximately \(x\) times what is in B41 which is approximately \(x^2\)B40.

This means that \(x^2\)B(40) = B(42) = F(42) = F(41) + F(40) = xB(40) + B(40). Divide by B(40) and we get the quadratic equation \(x^2 = x + 1\). So the ratio \(x\) that we are getting is a solution of this equation. The larger solution to this equation, which is what you see, is called the "Golden Ratio".

Now try the following: enter in \(0\) in D10 and **=B11+D10** in D11. Copy that down column D to D60.

What you are getting in column D is the sum of the Fibonacci numbers up to the index (in column A) where you are. What can you say about this sum? Compare the entries in column B with those in column D, and describe their relation to each other. Also notice that the entry in D11, **=B11+D10**, copied down column D as done here, produces partial sums of the entries in column B. That means that the entry in D50, for example is the sum of the first \(51\) Fibonacci numbers.

Here’s something else you can do. The defining property of Fibonacci numbers is

\(f(a+2) = f(a+1) + f(a)\). We can also write that as \(f(a) = f(a+2) - f(a+1)\). This allows us to define Fibonacci numbers with negative arguments. Thus \(f(-1) = f(1)-f(0) = 1-0 = 1\), \(f(-2) = f(0) - f(-1) = -1\), and so on.

So put \(0\) in E10, put \(1\) in E11, and in E12 enter **=E10-E11.** Then copy E12 down column E to E60.

**The entries in column E will be the negative Fibonacci numbers with argument in column A.**

What can you say about negative argument Fibonacci numbers?

By the way, the Fibonacci numbers with positive arguments count the number of different ways of inserting \(n-1\) identical dominoes into a \(2\) by \(n-1\) grid, so that each domino covers two adjacent boxes and no box is covered twice.

Number of steps

10

Number of digits after decimal point

10

**Exercises:**

**2.1 Set this all up on your own machine.**

**2.2 Prove that the Fibonacci numbers count the number of different ways of inserting \(n -1\) dominoes into a \(2\) by \(n-1\) grid, so that each domino covers two adjacent boxes.**

**2.3 Make a definition of convergence of a sequence that reflects the property of the ratio of Fibonacci numbers to their predecessors that you see in column C.**

**2.4 This procedure produces a solution to the quadratic equation indicated above. Given any quadratic with integer coefficients, we can produce a recursion as above and by substituting it into B4 and copying it down, look at what happens to it. Try doing this with some quadratics, and find another for which we get a solution like we do for Fibonacci numbers, and one which we don't. What happens with the cubic \(x^3 = x + 1\)?**

This time we put integers at the left, as we did before but also along the top

Thus we set A3 to \(0\) and A4 to =A3+1 and copy A4 down to A13. We set C1 to \(0\) and D1 to =C1+1 and copy D1 to the right to M1.

Now, we set C3 to be =B2+C2. Copy C2 (using Ctrl c). Select the rectangle from C3 to M13. Paste in the rectangle (using Ctrl v). (By the way there are icons near the top left for copying and pasting that you can use instead of using the Ctrl c or Ctrl v.)

What do you see? You should see all \(0\)'s.

Now set C3 to be \(1\). You should now see a slanted Pascal triangle bordered by \(1\)'s in the selected area.

The content of any selected box is the binomial coefficient \(C(n,k)\) or \(\frac{n!}{k!(n-k)!}\) where \(n\) is the number in the A column of the box and \(k\) is the one in its first row.

We shall next see how we can use a spreadsheet to calculate areas.

**We have not yet defined functions so this section is way ahead of us. If you encounter something that bothers you here, stop, go on to the next chapter and come back here later. If what you see below makes sense, then keep going.**

**Integration has a geometric meaning. Given a positive function \(f\), the Definite Integral of \(f\) between \(A\) and \(B\) means the area between the plot of of the function \(f(x)\), and the x-axis, from a fixed starting value of \(x, A\), to another value \(B\) with \(B > A\).**

If the function is constant, that area just the product of \((B-A)\) (the length of the interval) with the constant value of \(f\), because the figure whose are we are computing is a rectangle, with sides at \(x = A\) and \(x = B\), top at \(y = f\), and bottom at \(y = 0\).

**Otherwise, we can divide the interval from \(A\) to \(B\) into slivers of length \(d\) and calculate the area by the sum of the areas in each sliver. (We count area below the x axis as negative when the function is negative and when \(B < A\) positive becomes negative and vice versa.) We will choose slivers all of length \(d\), and approximate the area in each sliver.**

There is an interesting question here: **what do you do to approximate the area in a sliver?**

**A sliver has width \(d\), and we chose an approximate height, so this question becomes what height should we assign to the area between say \(s\) and \(s + d\)?**

There are three very simple ways to do this. One way is to use \(f(s)\), and another to use \(f(s+d)\) and another is to use their average.

These ways of estimating have names! They are, the **left hand rule, the right hand rule and the trapezoid rule.** The contribution to area from each sliver will be this estimate multiplied by \(d\).

Happily, the only difference between these in the answer you get for the area in question comes from the contributions \(f(A)d\) and \(f(B)d\). All other intermediate points contribute the same amount no matter which of these "rules" is used.

This happens because the end of one sliver is the beginning of the next, and the contribution to the sum from point \(s\) is \(f(s)d\) no matter which of these methods is used. If you use the value of \(f\) on the left side of intervals, then you get \(f(s)d\) from the interval starting at \(x\); if you use the right side value of \(f\) you get the same thing coming from the interval ending at \(x\); and if you use their average, you get half from either interval.

This means the only difference comes from the first and last intervals. With the "left rule", you get \(f(A)d\) but not \(f(B)d\) vice versa for the "right rule", and \(\frac{(f(A)+f(B))d}{2}\) from the average or "Trapezoid Rule". In other words, in the trapezoid rule you get \(f(s)d\) for every interior sliver except the end ones, and only \(\frac{f(A)d}{2}\) and \(\frac{f(B)d}{2}\) at the endpoints \(A\) and \(B\). The trapezoid rule turns out to be the best of the three.

So we will estimate the sum using \(f(s)d\) for values s between A and B **inclusive**, and subtract \(\frac{(f(A)+f(B))d}{2}\) from the total, and that will give us the answer supplied by the trapezoid rule. Later on we will see that this is much better than either of the others, because its error is proportional to \(d^2\) while the others each differ from the actual area by a linear term \(\frac{|f(A)-f(B)|d}{2}\) as well.

Calculating the sum of the contents of consecutive boxes in a column is what you did in column D with Fibonacci numbers. To get in column C the sum of what is in column B from 5 on you enter =B5+C4 into C5 and copy it down that column.

This will compute the left hand rule estimate for areas in column C. By putting =C5-(B$5+B5)/2 in D5, we convert the left hand rule to the trapezoid rule which will be displayed at each intermediate point by what is in column D. The -B$5/2 takes away half of the contribution at \(x = A\), and the other subtraction takes away the contribution at the other end.

We start by putting the choice for d in B2; putting the starting value for \(x\), \(A\), in B3.

We do this so we can easily change these when we want to.

Column A will contain the values of \(x\) from A on.

Entry Bk will contain the values of your function for \(x = \text{Ak}\).

As an illustration we will estimate the integral of the function \(\sin x\).

You can set this up starting at the fifth row by putting =B3 in A5. Then set A6 to =A5+B$2, and copy A6 down column A. That will be the value of your variable.

In B5 put =B$2*sin(A5) and copy this down column B.

In C5 put =B5+C4 and copy that down column C.

In D5 put =C5-(B$5+B5)/2 and copy down column D.

**If you do this, you can change d just by inserting a different value for it in B2. You can change the starting point by entering the new one in B3. You can change the function you want to integrate by replacing sin(A5) by your new f(A5) and copying =B$2*f(A5) down column B.**

The estimate of the area starting at A5 and ending at at x=A5+kd using the Left Hand Rule will appear in column C at row whose a value is **B5+(k-1)d**. (This box will have the sum of \(k\) terms of the form \(\sin(x)d\).)

The entries in column D convert the left hand rule to the Trapezoid rule. Thus what appears in the row with A value B4+kd will be the trapezoid rule estimate of the area between the x-axis, the sine curve and the lines x=B4 and x=B4+kd.

This is an estimate to the area; we can do better and will, later on.

This is what the spreadsheet should look like with \(d = 0.01\) and \(A = 1\).

Number of increments

25

Number of digits after decimal point

10

Now select columns A and B from A5 to B105, and insert an xy scatter chart. What do you see?

**How can we do better?**

If you add a column E which is like C except jumps by \(2\), that is in E5 put =2*B5+E3 and copy down, and correct this to the trapezoid rule in column F by putting in F5 =E5-(B$5+B5) and copying down, and finally putting =(4*D5-F5)/3 in column G you will get the Simpson’s rule estimate to the area in question in the odd entries of column G. (like rows \(5\), \(7\), \(9\), etc.) The even entries will be useless junk.

**What the devil is this?**

The odd entries in E and F repeat the previous calculation with \(d\) replaced by \(2d\). The error in the trapezoid rule behaves as \(d^{-2}\); the term in that error that behaves as \(d^{-2}\) will cancel out if you take **\(4\) times the \(d\) computation and subtract the \(2d\) one**. The result will be roughly \(3\) times the actual result. Thus **dividing 4D5-F5 by 3** gives a rule for the area whose error actually goes as \(d^4\). It is called **Simpson's rule**.

This will be discussed in detail in Chapter 14.

Suppose we have a linear function, say, \(f(x) = 5x + 3\).

We now address the following questions:

*1. How can we evaluate this function at an arbitrary argument, \(x\), on a spreadsheet?*

*2. How can we evaluate it at a whole lot of arguments?*

*3. How can we plot it?*

Will see that once the first of these questions is addressed, the rest are quite easy to do. They were harder in the old days.

One nice feature of what you can do is that if you set this up once, you can change the linear function at will and watch how the plot changes instantly, as in the mathlet.

Just in case you want to keep what you are doing you will be wise to give it labels so at some future time you will know what you have.

So as a preliminary, you might enter in box A1 the title: *Linear Functions.*

Some more preliminaries: in A2 write the word *slope*, and in B2 enter the number 5 (later on you can change this to anything else you want)

In A3 enter the words: *y intercept,* and in B3 enter the number 3.

In A4 enter: *starting argument* and in B4 enter -1

In A5 enter: *spacing* and in B5 enter .01.

(When you want to plot your function, you can only do it over a finite interval, and these last lines are useful for creating an interval.)

Now you are ready to start.

In A9 enter the symbol x and in B9 enter f(x). These are labels for the columns below them.

In A10 enter =B4

In B10 enter =B$2*A10 +B$3

You now have the answer to the first question. The number that appears in box B10 will be the value of your function at the argument given in B4 (at this point that argument is -1, and with function 5x + 3 the value in B10 should be -2.)

You can evaluate this function anywhere else you please, by changing the entry in B4 to whatever you please.

**Suppose I want to change the slope or the y intercept of my function?**

You can do that by changing the entries in B2 or B3. The value of the changed function at the argument in A10 will appear in B10.

**What are these funny dollar signs that I have put in A10 and B10?**

To answer the second and third questions above we are going to copy the instruction in B10 into other boxes as well. When we do that, the references which do NOT have dollar signs in front of them will change. Those with dollar signs will stay the same.

**How do the references change? What do you mean?**

Suppose we copy B10 to B11. Then what will appear in B11 will not be exactly what is in B10, but instead it will be =B$2*A11 +B$3. Because the A10 had no dollar sign in it, when we copied it down one row the 10 turned into an 11. The other terms did not change because we put dollar signs in front of them.

**What happens if you copy to a different column?**

The same kind of thing will happen. That is, if you copy what is in B10 to C11, you will get =C$2*B11 +C$3. All the column indices that do not have dollar signs in front of them will shift over one column, because you shifted over one column. The same goes for shifting any number of rows or columns.

This property is what allows us to look at a function over a range and plot it by copying. Our plan is**: have the argument increase by d from row to row, which can be accomplished by putting one entry in A11 and copying it down the A column Then copying B10 down the B column**. That is all there is to answer the second question.

**OK, what goes into A11?**

We can enter =A10+B$5. This will increase the entry in column A in each row we copy it to by the amount in B5 over what it was in the previous row. If we do this in Column A, say down to row 500, and copy B10 also down to row 500, you will have a set of pairs for your function all ready to plot.

**OK, how do I copy?**

This varies somewhat from spreadsheet to spreadsheet. For many or most you do the following:

1. Highlight the box you want to copy.

2. Press [Ctrl] and c at the same time.

3. Highlight the boxes you want to copy to.

4. Press [Ctrl] and v at the same time.

There is another way that is easier if you are copying several columns down from the same row at once; it is called *fill* or *fill down* on the edit menu. Try it. You can also fill sidewise. (Here you could copy B10 into B11 and then fill A11 and B11 both together down to A500 and B500.) Experiment with these things until you get them to work. If you can't get them to work on your spreadsheet, ask someone how.

**OK, how do I get a graph of my function?**

Highlight columns A and B from row 10 or 11 to row 500 (or to wherever you copied to) and click on **"chart"** in the insert menu. You will get to another menu with lots of options. Click on **"x-y scatter"**, and you will get to your plot. You will be asked about inserting labels on it and asked where you want it. You can put it anywhere, but if you put it on the same sheet as your calculation, you can change the function or domain by changing what is in B2,...,B5 and see the results immediately. There are ways to adjust the size of the graph and where it is, that you have to figure out for yourself. I generally screw them up.

You can enter and plot lots of functions using the mathlet below, and you can plot curves defined parametrically as well.

**What is that?** Play with this mathlet and figure that out yourself!

The abstract definition of a function is described, and along with properties of linear functions.

Functions are what we use to describe things we want to talk about mathematically. I find, though, that I get a bit tongue tied when I try to define them.

The simplest definition is: **a function is a bunch of ordered pairs of things (in our case the things will be numbers, but they can be otherwise), with the property that the first members of the pairs are all different from one another.**

Thus, here is an example of a function:

\[[\{1, 1\}, \{2, 1\}, \{3, 2\}]\]

This function consists of three pairs, whose first members are \(1, 2\) and \(3\).

It is customary to give functions names, like \(f, g\) or \(h\), and if we call this function \(f\), we generally use the following notation to describe it:

\[f(1) = 1, f(2) = 1, f(3) = 2\]

The first members of the pairs are called **arguments** and the whole set of them is called the **domain** of the function. Thus the arguments of \(f\) here are \(1, 2\) and \(3\), and the set consisting of these three numbers is its domain.

The second members of the pairs are called the **values** of the functions, and the set of these is called the **range** of the function.

The standard terminology for describing this function f is:

The value of \(f\) at argument \(1\) is \(1\), its value at argument \(2\) is \(1\), and its value at argument \(3\) is \(2\), which we write as \(f(1) = 1, f(2) = 1, f(3) = 2\).

We generally think of a function as a set of assignments of values (second members of our pairs) to arguments (their first members).

The condition that the first members of the pairs are all different is the condition that each argument in the domain of \(f\) is assigned a **unique** value in its range by any function.

**Exercise 3.1 Consider the function \(g\), defined by the pairs \((1, 1), (2, 5), (3, 1)\) and \((4, 2)\). What is its domain? What is the value of \(g\) at argument \(3\)? What is \(g(4)\)?**

If you stick a thermometer in your mouth, you can measure your temperature, at some particular time. You can define a function \(T\) or temperature, which assigns the temperature you measure to the time at which you remove the thermometer from your mouth. This is a typical function. Its arguments are times of measurement and its values are temperatures.

Of course your mouth has a temperature even when you don't measure it, and it has one at every instant of time and there are an infinite number of such instants.

This means that if you want to describe a function \(T\) whose value at any time t is the temperatures in your mouth at that time, you cannot really list all its pairs. There are an infinite number of possible arguments \(t\) and it would take you forever to list them.

Instead, **we employ a trick to describe a function \(f\):** we generally provide a rule which allows you, the reader, to choose any argument you like in \(f\)'s domain, and, by using the rule, to compute the value of your function at that argument. This rule is often called a **formula** for the function. The symbol \(x\) is often used to denote the argument you will select, and the formula tells you how to compute the function at that argument.

The simplest function of all, sometimes called **the identity function,** is the one that assigns as value the argument itself. If we denote this function as \(f\), it obeys

\[f(x) = x\]

for \(x\) in whatever domain we choose for it. In other words, both members of its pairs are the same wherever you choose to define it.

We can get more complicated functions by giving more complicated rules, (These rules are often called formulae as we have noted already). Thus we can define functions by giving any of the following formulae among an infinity of possibilities:

\[3x, x^2, x^2-1, \frac{3}{x}, x^3, \frac{x}{x^2 + 1}, 3x + 5, x^2 + 7x - 1\]

These represent, respectively, \(3\) times \(x\), \(x\) squared, \(x\) squared minus \(1\), \(3\) divided by \(x\), \(x\) cubed, \(x\) divided by the sum of the square of \(x\) and \(1\), and so on.

We can construct functions by **applying the operations of addition, subtraction, multiplication and division to copies of \(x\) and numbers in any way we see fit to do so.**

There are two very nice features of functions that we construct in this way, and the first applies to all functions.

We can draw a picture of a function, called its **graph,** on a piece of graph paper, or on a spreadsheet, chart, or with a graphing calculator. We can do it by taking argument-value pairs of the function and describing each by a point in the plane, *with \(x\) coordinate given by the argument and y coordinate given by the value for its pair.*

Of course it is impossible to plot all the pairs of a function that has an infinite domain, but we can get a pretty good idea of what its graph looks like by taking perhaps a hundred evenly spaced points in any interval of interest to us. This sounds like an impossibly tedious thing to do and it used to be so, but now it is not. On a spreadsheet, the main job is to enter the function **once** (with its argument given by the address of some other location). That and some copying is all you have to do, and with practice it can be done in \(30\) seconds for a very wide variety of functions.

The second nice feature is that **we can enter any function** formed by adding, subtracting, multiplying, dividing and performing still another operation, on the contents of some address **very easily on a spreadsheet** or graphing calculator. Not only that, these devices have some other built in functions that we can use as well.

The two of these facts mean that we can actually look at any function formed by adding subtracting multiplying or dividing copies of the identity function \(x\) and other built in functions, and any number we want, and see how they behave, with very limited effort.

We will soon see that we can use the same procedure used for graphing functions to graph their derivatives (we have not defined these yet) as well, but that is getting ahead of the story. You should realize though that we can compute derivatives for most functions numerically with only a small amount of effort as well.

**Exercise 3.2 What is the value of the function \(x^2 + 3\) at \(x = 5\)? At argument \(10\)?**

**Would you please give some examples?**

The basic fundamental function, the one that calculus is based upon, is the **linear function.** A linear function is a function whose graph consists of segments of one straight line throughout its domain.

Such a line is, you may remember, determined by any two points on it, say \((a, f(a)), (b, f(b))\). Thus, you can pick any \(a\) and any \(b\) in its domain and determine the line from the two values, \(f(a)\) and \(f(b)\).

**What is a formula for such a function?**

We can determine the linear function which takes value \(f(a)\) at \(a\) and \(f(b)\) at \(b\) by the following formula:

\[f(x) = f(a)\frac{x-b}{a-b} + f(b)\frac{x-a}{b-a}\]

The first term is \(0\) when \(x\) is \(b\) and is \(f(a)\) when \(x\) is \(a\), while the second term is \(0\) when \(x\) is \(a\) and is \(f(b)\) when \(x\) is \(b\). The sum of the two is therefore \(f(a)\) when \(x\) is \(a\) and \(f(b)\) when \(x\) is \(b\). And it is a linear function. **Linear functions have a term that is \(x\) multiplied by some constant, and may also have a constant term as well.**

A more convenient and suggestive form for this function can be gotten by putting the x terms together:

\[f(x) = mx + c = \frac{f(b) - f(a)}{b-a}x + \frac{f(a)b - f(b)a}{b-a}\]

The number **\(m\)** which occurs here is called the **slope** of this line. Notice that \(m\) is given by the ratio of the change of \(f\) between \(x = b\) and \(x = a\) to the change in \(x\) between these two arguments:

\[m = \frac{f(b) - f(a)}{b-a}\]

If \(f\) is plotted, where \(f(x)\) meets the \(y\) axis is what we call \(c\) here. It is called the **y-intercept** of this line, which is **the value of \(y\) when \(x\) is \(0\)**.

There is a mathlet here which allows you to vary the slope \(m\) and y-intercept c and see what that does to a line. You should fiddle with this mathlet and from it get an idea what the slope \(m\) tells you about the line. Using it you can construct your own examples.

You can actually construct a spreadsheet that can do the same thing as this applet. You would be wise to do so. Directions on exactly how can be reached by **clicking here.**

**I know all this stuff. Why do you waste my time with it** **?**

All this may sound simple to you, but if you understand it, you are well on your way to understanding calculus. Realize that calculus consists of studying functions through studying the slopes of the straight lines they resemble near any given argument. Here are some exercises to help you get used to these things.

**Exercises:**

**3.3 Play with the applet until you get a feel for the geometric meaning of the slope of a line. Then take a piece of paper, draw x and y axes on it and put scales on them, and have a friend draw some straight lines on the paper. Without measuring, guess the slopes of the lines. Now measure the lines (change in y over change in \(x\)) and see how good your guesses were.**

**3.4 When is the slope of a line negative? When is it \(0\)? When is it \(1\)? When \(-1\)? If you use the same scale for \(x\) and \(y\), what does slope 10 look like? How about slope \(-\frac{1}{10}\)?**

**3.5 Follow the directions that you can get to above to construct a spreadsheet that can work as the applet here. Try it with the various slopes of the last question.**

**3.6 Construct the linear function, \(g\), with slope 2 satisfying \(g(1) = 1\); graph it. What is \(g(4)\)? Do the same for the linear function, \(h\), which satisfies \(h(1) = 4\), \(h(4) = 12\). What is the slope of \(h\)?**

A linear function, we have seen is a function whose graph lies on a straight line, and which can be described by giving the slope and y intercept of that line.

There is a special kind of linear function, which has a wonderful and important property that is often useful. These are **linear functions whose \(y\) intercepts are \(0\)** (for example functions like \(3x\) or \(5x\)). This means their graphs pass right through the origin, (the point with coordinates \((0, 0)\)). Such functions are called **homogeneous linear functions.** They have the property that *their values at any combination of two arguments is the same combination of their values at those arguments.* In symbols this statement is:

\[f(ax + bz) = af(x) + bf(z)\]

**Do ordinary linear functions have any such property?**

They sort of do. **Any linear function at all has the same property when \(b\) is \(1 -a\).** Thus for any linear function at all we have

**\[f(ax + (1 - a)z) = a f(x) + (1 - a) f(z)\]**

But be careful, **linear functions that are not homogeneous do not obey the general linearity property stated several lines above.**

**Either one of these conditions allows you to calculate the value of \(f\) at any \(y\) given its value at \(x\) and \(z\). If \(y\) is \(z + a(x-z)\) then \(f(y)\) is \(f(x) + a(f(x)-f(z))\).**

Properties like these mean that once you know the value of a linear function at two arguments you can easily find its value anywhere else it is defined.

The property here described is often called the property of linearity. This is not really a sensible way to describe it because perfectly good linear functions which have \(y\) intercept that is not \(0\) do not obey the more general version of the property (the first one above.)

Anyway, realize that functions that are not linear **DO NOT** have either of these properties.

The word tangent comes from the Latin word tangent, which means touching. Two smooth curves are said to be tangent when they just touch at some point, usually without crossing one another. (For example, the functions \(y = 0\) and \(y = x^2\) are tangent at \(x = 0\); Also \(y = 0\) and \(y = x^3\) are tangent at \(x = 0\) even though they cross, because they sort of hug each other before crossing).

We describe the standard structure of formulae that we use to describe functions, review the properties of quadratic functions, and introduce the notion of the derivative.

Differential calculus is about approximating more complicated functions by linear functions. We now address the question, what more complicated functions do we want to deal with?

Most all of the functions we will talk about can be formed by **starting with three basic functions,** and **applying the operations of addition, subtraction, multiplication, division, inversion (like in going from the square to the square root) and substitution to copies of them.**

We can define even more functions by using calculus, but these need not be investigated now.

The three basic functions are **the identity** function, the **sine function** and **the exponential** function. For the moment we will start with only the first, the identity function.

If we multiply copies of the identity function together, we get powers of it, like \(x * x\) (which is \(x\) squared), or \(x * x * x\), which is \(x\) cubed, and so on. Any function consisting of a positive power multiplied by a constant is called a **monomial.** If we add or subtract a finite number of these, we get what are called **polynomials.**

The simplest polynomials are the linear functions we have already mentioned. The next more complicated ones are **quadratic functions;** these have the form, \(ax^2 + bx + c\), where \(a, b\) and \(c\) are numbers. Cubic functions have a cube term in the, quartic functions a term like \(dx^4\), and so on.

We can evaluate and plot quadratic functions with very little more effort than we expended on linear functions. The only difference is that we should add a quadratic coefficient say in B6, and enter =B$6*A10*A10+B$2*A10+B$3 into B10 (and then copy this down column B.)

For example, try this, putting \(1\) in B6. After entering the instruction above in A10, you have to copy it into B11 through B500, and you can now plot any quadratic by changing your parameters.

When you do this you will find something that is sort of nice, **all quadratics look more or less alike except that some are upside down.**

That is, if you plot a quadratic and don't pay attention to the scales of your graph or which end is up, and where its peak or valley is, you cannot tell them apart. Quadratics with a given sign for the quadratic coefficient, are all alike except for scale and location of their high and low points.

A second nice fact about quadratics is that we know how to solve some equations of the form \(f(x) = 0\), when \(f\) is quadratic.

**What equations are those?**

Well, we know how to solve equation

\[x^2 = A\] which means the same thing as: \[x^2 - A = 0\]

when A is a positive number. We can solve them because a solution is, by definition, **the square root of A**.

Actually we define \(\sqrt{A}\) (also written as \(A^{\frac{1}{2}}\)) to be the positive number whose square is \(A\), when \(A\) is positive, and the two solutions to this equation are \(\sqrt{A}\) and \(-\sqrt{A}\).

By arithmetic manipulations you can reduce any quadratic to this solvable form, and solve it, and you will get the famous quadratic formula for solutions.

**How is that and what is that?**

The equation \(ax^2 + bx + c = 0\) can be rewritten (when \(a\) is not \(0\), after dividing by \(a\)) as

\[x^2 + \frac{bx}{a} + \frac{c}{a} = 0\]

which is the same as

\[(x + \frac{b}{2a})^2 = \frac{b^2-4ac}{4a^2}\]

Thus, the square root of the left hand side is plus or minus the square root of the right hand side here.

\[x + \frac{b}{2a} = \frac{\sqrt{(b^2 - 4ac)}}{2a}\]

or

\[x + \frac{b}{2a} = -\frac{\sqrt{b^2 - 4ac}}{2a}\]This is a peculiar way to write the standard quadratic formula.

**Exercise 4.1 Find two solutions to each of the following equations:**

**\[x^2 - 3x - 4 = 0\]**

**\[4x^2 - 3x - 1 = 0\]**

If you graph a quadratic you will notice that you do not get a straight line. On the other hand, if you were to look at your graph under a microscope, you might think it was a straight line. In the same sense, though the earth is round, as we walk down the street it looks pretty flat to us poor tiny creatures.

If you look at a quadratic function \(f\) at some particular argument, call it \(z\), and very close to \(z\), then \(f\) will look like a straight line. **The line f resembles at argument \(z\) is called the tangent line to \(f\)** at argument \(z\), and **the slope of this tangent line to \(f\) at \(z\) is called the derivative of \(f\) at argument \(z\).** This slope is often written as

\[f'(z), \text{or as} \frac{df}{dz} \text{or} \frac{df(z)}{dz}\]

The tangent line to the function \(f\) at a specific argument is the graph of a linear function. That function is **called the linear approximation to \(f\) at argument \(z\). Notice that it is a different function from \(f\) and is typically near \(f\) only when evaluated at an argument \(x\) that is near to \(z\).**

**The same exact words can be used to define the derivative of any function, \(f\), that looks like a straight line in some vicinity of argument \(z\). \(f\)'s derivative at argument \(z\), which we write as \(f'(z)\) or \(\frac{df(z)}{dz}\), will be the slope of that straight line.**

The derivative and tangent line mathlet allows you to enter any function you can construct into it, and look at the graph of its values, and its slopes, that is, its derivative on any interval you choose.

We will next see how to find the derivative of a quadratic function, or any polynomial function given its formula.

We describe the first great property of derivatives, and show how it allows us to calculate the derivative of any rational function. We also emphasize the need for checking any such calculation, and describe how to do so by differentiating numerically on a spreadsheet.

Here are some facts about derivatives in general.

1. Derivatives have two **great properties** which allow us to find formulae for them if we have formulae for the function we want to differentiate.

2. We can compute and graph the derivative of \(f\) as well as \(f\) itself for all sorts of functions, with not much work on a spreadsheet (In fact, what work is needed to find the derivative as well as the function only has to be done once, and you can switch functions almost exactly as you would if you were only graphing the function, and get a plot of both together. We will see this explicitly soon.)

**What "great properties"?**

We already know the derivative of a **linear function.** It is its **slope.** A linear function is its own linear approximation. Thus the derivative of \(ax + b\) is \(a\); the derivative of \(x\) is \(1\). Derivatives kill constant terms, and replace x by 1 in any linear term.

The first great property is this: **if an argument, \(x\), occurs more than once** in a formula for its value \(f(x)\) at argument \(x\), then **you can find the derivative of \(f\) by looking at the derivative caused by each occurrence separately, treating the other occurrences as if they were mere constants as you do so; and then adding all these up. We call this the "Multiple Occurrence Rule".**

For example, consider the quadratic, \(a*x*x + b*x + c\). The argument \(x\) occurs three times in it. Taking the derivative of one single occurrence, that is, of any single \(x\) alone, changes that \(x\) to \(1\). If we do that to each occurrence separately, ignoring the others as we do so, we get three terms: \(a*1*x + a*x*1 + b* 1\), or \(2ax + b\), and this sum is the derivative of our quadratic.

Notice that the constant term, \(c\), has no effect on the derivative.

This property allows us to calculate a formula for the derivative of any polynomial directly from the formula for the polynomial itself, as we shall soon see.

A special case of this basic rule is the statement that **taking the derivative is a linear operation.** This means that if \(f\) consists of two terms, you can find \(f\)'s derivative by adding the derivatives of each of its terms separately, computed in both cases as if the other term did not exist.

This statement can be written as:

**\[(f + g)' = f' + g'\]**

Another special case is **the formula for the derivative of the product of two factors.** If we have \(f = g*h\), then there will be contributions to the change in \(f\) from changes in \(g\) and from changes in \(h\), and these can be computed separately. The result is the statement:

**\[(g*h)' = g'*h + g*h'\]**

which is called the **product rule** for differentiation.

We can deduce, as a special case of this **product rule,** what the derivative of **the reciprocal** of a function \(f\) is. **The reciprocal of a function is \(1\) divided by that function;** which is usually written as \(\frac{1}{f}\) or \(f^{-1}\).

By the definition of the reciprocal we have \(f*\frac{1}{f} = 1\), throughout the domain of \(f\). The derivative of \(1\), which is a number and is the right hand side here, is \(0\); we can deduce that the derivative of the left hand side is also \(0\).

By the product rule we then get: \(f'*\frac{1}{f} + f*\left(\frac{1}{f}\right)' = 0\).

which we can divide by \(f\) and rearrange to tell us:

\[\left(\frac{1}{f}\right)' = \frac{-f'}{f^2}\]

Our first great property actually tells us all we need to find the derivative of any polynomial or any **rational function,** by which we mean the ratio of two polynomials. And these are all the functions we can get by applying the operations of addition, subtraction, multiplication, and division to the identity function.

The derivative of any positive integer power, say \(x^n\), is obtained by noticing that the contribution to the derivative from each of the n occurrences of \(x\) by itself is gotten by replacing that occurrence by \(1\), or in other words by dividing here by \(x\): the total result from all \(n\) of the factors \(x\), which is the derivative of \(x^n\), is then \(\frac{nx^n}{x}\), or if you prefer, \(nx^{n-1}\). (This statement applies to negative powers as well as positive ones, and to fractional and in fact to any power at all, as we shall soon see.)

This, and the rule for differentiating a sum given the derivatives of the summands, tells you how to differentiate any polynomial.

The reciprocal rule of the last equation above then **tells us how we can differentiate any rational function,** say \(\frac{p}{q}\) where \(p\) and \(q\) are polynomials. We apply the product rule and reciprocal rule, to get

\[ \begin{aligned} \left(\frac{p}{q}\right)' & = p'\frac{1}{q} + p\left(\frac{1}{q}\right)' \\ & = p'\left(\frac{1}{q}\right) - \frac{pq'}{q^2} \\ & = \frac{p'q - q'p}{q^2} \end{aligned} \]

**Exercises:**

**5.1 Find the derivatives of the following polynomials:**

**a. \(3x - 7\)**

**b. \(x^2 - 7x + 4\)**

**c. \(3x^3 - 2x^2 + x + 1\)**

**d. \(x^4 - 7x^2 + 4\)**

**e. \(x^4 - x^3 + x^2 - x + 1\)**

**5.2 Find the derivatives of the following rational functions:**

**a. \(\frac{x+3}{x-1}\)**

**b. \(\frac{1}{x-1}\)**

**c. \(\frac{x^2 - x + 1}{x^2 - 3x + 3}\)**

You should practice finding the derivatives of polynomials and of rational functions using these rules until you feel comfortable with them. In fact you should practice until you can differentiate any rational function with \(100\)% accuracy.

**But no human being can do anything to \(100\)% accuracy and I certainly can't.**

In the age of computers, any little mistake at all can screw up everything. It is very important that you learn to do what you do with \(100\)% accuracy. This sounds hopeless, but it isn't. Its not that you have to do everything perfectly; far from it. You only have to learn to find your mistakes and fix them. You can make them by the dozen if you take the trouble to fix them all.

Most mistakes that you will make with a computer are so gross in their effects that you can see immediately that you have done something wrong, and find and fix whatever it is. A few mistakes are subtle enough that you might miss them. The key to getting perfect answers is to check whatever you do to see if it is right until it is right.

By the way, the most common subtle mistake by far consists of using incorrect input, which means, trying to solve the wrong problem. It is absolutely essential that you check to see that you have copied the input information correctly into your computation.

Suppose you find a formula for a derivative. Instead of stopping with the formula, you should check it to see if it looks right. The computer gives you an easy way to do this: you can compute the derivative numerically, and see if you get the same answer. If you do, you KNOW your answer is right.

If you don't get the same answer numerically that you got from the formula, you must find what went wrong. You do not have to be perfect the first time, or even the seventh time. But in the end, if you are dealing with machines, you MUST be perfect.

**How can I check my differentiations easily?**

One way is to compare the function you compute as derivative to the derivative as found by the **derivative applet** by entering your own function into it. Remember that in doing so the times sign is * and exponents are preceded by ^ so \(x^3\) is entered as x^3.

You can also check your derivative by using a spreadsheet to set up your own applet. The setup described in **Section 3A** for plotting a function, can be enhanced to allow you to plot not only your function, but also its numerical derivative and the answer you get to differentiating it, without your expending much effort.

Once you have this set up, all you need do is enter your function in one place and your answer for the derivative in another place, copy each appropriately, and you can look at your answer and the numerical one on your chart. If they are the same, your answer is correct. If not you have to de-bug your differentiation and/or your spreadsheet calculation. Becoming an expert means becoming proficient at de-bugging, through lots of experience.

**OK, how do I set this up?**

For explicit directions, see **Chapter 9**.

We introduce the notion of constructing complicated functions by substitution, and show how to differentiate such functions. The way to do so is called the chain rule. We also introduce the exponential function which is defined to be its own derivative.

Rational functions are an important and useful class of functions, but there are others. We actually get most useful functions by starting with two additional functions beyond the identity function, and allowing two more operations in addition to addition subtraction multiplication and division.

**What additional starting functions?**

The two are **the exponential function,** which we will write for the moment as **\(\exp(x)\),** and **the sine function,** which is generally written as **\(\sin(x)\).**

**And what are these?**

We will devote some time and effort to introducing and describing these two functions and their many wonderful properties very soon. For now, all we care about is that they exist, you can find them on spreadsheets and scientific calculators, and we can perform arithmetic operations (addition, subtraction, multiplication and division) on them. If you want just a hint, the sine function is the basic function of the study of angles, which is called trigonometry. The **exponent function** is defined in terms of derivatives. **It is the function whose value at argument 0 is 1, that has derivative everywhere that is the same as itself.** We have

\[\frac{d \, \exp(x)}{dx} = \exp(x) \,\text{or,}\, (\exp(x))' = \exp(x)\]

This definition may make the function a bit mysterious to you at first, but you have to admit that it makes it easy to differentiate this function.

**This exponential function has an important and interesting property: Namely,**

**\[\exp(x+y) = \exp(x)\exp(y)\]**

**(Idea of proof) As a function of \(x\), by the statements below about derivatives of substitutions, we can deduce that the derivative of \(\exp(x+y)\) is itself. Its value at \(x = 0\) is \(\exp(y)\). This derivative differs from \(\exp(x)\) only by having the value \(\exp(y)\) when \(x\) is \(0\) rather than the value \(1\). This means that the derivative of \(\exp(x+y)\) which is \(\exp(x+y)\) itself, is \(\exp(x)\) multiplied by \(\exp(y)\). We have ignored any possible dependence of \(y\) on \(x\). Doing so only means we were computing what is called the "partial derivative with respect to the variable x keeping the variable y fixed". Do not worry about this; it is one of the ways we handle calculus when there is more than one variable)**

**The defining properties of \(\exp(x)\) allow us to deduce a power series representation of it.** \(\exp(x)\) has a constant term \(1\), and being its own derivative must have a linear term whose derivative is that \(1\), namely \(x\). Likewise it must have a quadratic term whose derivative is \(x\), namely, \(\frac{x^2}{2}\). Continuing this deduction forever gives us

\[\exp(x) = 1 + x + \frac{x^2}{2} + ... + \frac{x^k}{k!} + ...\]

**And what additional operations are there?**

The two new operations that we want to use are **substitution,** and **inversion.**

**And what are these?**

If we have two functions, \(f\) and \(g\), with values \(f(x)\) and \(g(x)\) at argument \(x\), we can construct a new function, which we write as \(f(g)\), that is gotten by **taking the value of \(g\) at argument \(x\), as the argument of \(f\).**

**The value of \(f(g)\) at \(x\),** which we write as \(f(g(x))\), is the value of \(f\) at argument given by the value of \(g\) at \(x\); it is **the value of \(f\) at argument \(g(x)\).** We call this new function **the substitution of \(g\) into \(f\).** We'll get to inversion in Chapter 8.

Substitution is simpler than it sounds. Suppose you have a value for \(x\) on a spreadsheet in box A5, and you put =g(A5) in box B5, and =f(B5) in C5. Then C5 will contain \(f(g(x))\).

If you substitute a polynomial into a polynomial, you just get a polynomial, and if you substitute a rational function into a rational function, you still have a rational function. But if you substitute these things into exponentials and sines you get entirely new things (like \(\exp(-cx^2)\)) which is the basic function of probability theory.

Just as utilizing copies of the exponential or sine functions presents no problem to a spreadsheet or scientific calculator, substitution presents no real problem. We have seen that you can create g(A10) in B10, and then f(B10) in C(10) and you have created the substituted value f(g(A10)) in C10. You can, by repeating this procedure, construct the most horrible looking combination of substitutions and arithmetical operations imaginable, and even worse than you could imagine, with very little difficulty, and you can find their numerical derivatives as well.

Before we go on to the last operation, we note that there is a great property associated with the operation of substitution. Just as we have found formulae above for finding the derivative of a sum or product or ratio of functions whose derivatives we know, we have **a neat formula for the derivative of a substitution function in terms of the the derivatives of its constituents.** Actually it is about as simple a formula for this as could be.

The result is often called the **chain rule:**

The derivative \(f(g(x))\) with respect to \(x\) at some argument \(z\), like any other derivative, is the slope of the straight line tangent to this function, at argument \(z\). This slope, like all slopes, is the ratio of the change in the given function to a change in its argument, in any interval very near argument \(z\). Thus the derivative of \(f\) here is the tiny change in \(f\) divided by the change in \(g\). Substituting changes the denominator to the tiny change in \(x\).

Suppose then, we make a very small change in the variable \(x\), very near to \(x = z\), a change that is sufficiently small that the linear approximation to \(g\) and to \(f(g)\) are extremely accurate within the interval of change. Let us call that change \(dx\). This will cause a change in \(g(x)\) of \(g'(z)dx\), (because the definition of \(g'(z)\) is the ratio of the change of \(g\) to the change of \(x\) for \(x\) very near to \(z\).)

If \(g'(z)\) is 0, then g will not change and neither will \(f(g(x))\), when \(f\) depends on \(x\) only in that its argument \(g\) depends on \(x\). (If f has other dependence on \(x\) the contribution to its derivative from that other dependence gets added to the contribution from the change coming from the change in \(g\) and is irrelevant here.)

If \(g'(z)\) is not \(0\), we can define \(dg\) to be \(\frac{dg}{dx}\), and use the fact that the change in \(f\) for arguments near \(g(z)\) is given by \(df = \frac{df}{dg}dg\) which becomes

\(\frac{df}{dg}\frac{dg}{dx}dx\), where \(\frac{df}{dg}\) is evaluated at \(g(z)\) and \(\frac{dg}{dx}\) is evaluated at \(x=z\).

It follows from this remark that the chain rules reads

**\[\frac{df(g(x))}{dx} = \frac{df(g)}{dg}\frac{dg(x)}{dx}\]**

In words, this means that the **derivative of the substituted function with values \(f(g(x))\), with respect to the variable \(x\) is the product of the derivatives of the constituent functions \(f\) and \(g\), taken at the relevant arguments: which are \(x\) itself for \(g\) and \(g(x)\) for \(f\).**

**How about some examples?**

We will give two examples, but you should work out at least a dozen for yourself.

**Example 1: Suppose we substitute the function \(g\) which has values given by \(g(x) = x^2 + 1\) into the function \(f\) which takes values \(f(x) = x^3 - 3\).**

The substituted function \(f(g)\) has values \(f(g(x)) = (x^2 + 1)^3 - 3\).

Let us compute the derivative of this function. The derivative of \(f(s)\) with respect to \(s\) is **\(3s^2\),** while the derivative of \(g(x)\) with respect to \(x\) is **\(2x\).**

If we set \(s = g(x)\) which is \(x^2 + 1\), and take the product of these two we get:

\[\frac{d((x^2 + 1)^3 - 3)}{dx} = (3s^2)(2x) = 3g(x)^2(2x) = 6x(x^2 + 1)^2\]

You could multiply the cube here out and then differentiate to get the same result, but that is much messier, and most people would make at least one mistake in doing it. You have a chance of getting such things right even the first time, if you do them by the chain rule. (Unfortunately, if you do it correctly , you will not get any practice debugging from it.)

**Example 2: Find the derivative of the function \(\exp(\frac{-x^2}{2})\).**

This is the function obtained by substituting the function \(\frac{-x^2}{2}\) into the exponential function.

The derivative of the function \(\frac{-x^2}{2}\) is the function \(-x\); the exponential function is its own derivative.

On applying the chain rule we find: that the derivative of \(\exp(\frac{-x^2}{2})\) is \((-x)\exp(\frac{-x^2}{2})\), the latter factor being the derivative of the exponential function evaluated at \(\frac{-x^2}{2}\).

**Exercises:**

**6.1 Write an expression for the result of substituting \(g\) into \(f\) to form \(f(g)\) for the following pairs of functions, and find expressions for their derivatives using the chain rule.**

**a. \(f\) defined by \(f(x) = \frac{x^2+1}{x}\), \(g\) defined by \(g(x)= x^2 - 1\).**

**b. \(f\) defined by \(f(x) = -x\), \(g\) by \(g(x) = \exp(x)\).**

**c. \(f\) defined by \(f(x) = \exp(x)\), \(g\) by \(g(x) = -x\).**

**6. 2 Check each of your results using the derivative applet.**

**6.3**

**a. Consider the function defined by the formula \(x^4 - 2x + 3\). Use the applet to plot it and see its derivative. Where is its minimum value, and what is it? What is its derivative at the minimum point? Estimate these things from the applet.**

**b. Find the maximum point for \(f\) and the value of \(f\) at that argument approximately for \(f\) defined by \(f(x) = x^2exp(-x)\),**

**c. If a function \(f\) is differentiable in the interval from \(a\) to \(b\) and has a minimum value at a point \(c\) in between \(a\) and \(b\), what is its derivative at \(c\)?**

**6.4 Use the chain rule to show : \(\exp(x + y) = \exp(x)\exp(y)\).**

**OK, where am I now?**

At this point you have rules that enable you to differentiate all functions that you can make up using arithmetic operations and substitutions starting with the identity function (\(f(x) = x\)) or with the mysterious exponential function, \(f(x) = \exp(x)\).

**In the next section we will extend things so you can start with the sine function, \(f = \sin x\) as well and differentiate anything you can create. Finally we will extend the rules to differentiating inverse functions as well.**

**What is this \(\exp(x)\)?**

The number \(\exp(1)\) is called \(e\). The property: \(\exp(x + y)= \exp(x)\exp(y)\) implies that \(\exp(n)\) is \(e^n\), because \(\exp(n)\) is the product of \(\exp(1)\) \(n\) times. We have not as yet defined \(e^a\) when \(a\) is not an integer. When we do define it, we will find that \(\exp(z)\) is \(e^z\) for all real or complex numbers \(z\). Actually we will define \(e^x\) explicitly for rational values of \(x\), and show that it is \(\exp(x)\) and **then define \(e^x\) for irrational values to be \(\exp(x)\).**

**And what is \(e\)?**

One easy way to answer this question is to write \(e^x\) as a sum of powers of \(x\) multiplied by appropriate coefficients, and then set \(x = 1\). We can figure out the coefficients of each power in the sum by requiring that its derivative be the previous term.

Thus, we know by definition that \(\exp(0)\), the constant term in the sum, is 1. For \(\exp(x)\) to be its own derivative, it must contain something whose derivative is this constant term, \(1\). The term whose derivative is \(1\) is \(x\); the term whose derivative is \(x\) is \(\frac{x^2}{2}\); the term whose derivative is \(\frac{x^2}{2}\) is \(\frac{x^3}{3!}\), where \(n!\) is \(n\) multiplied by \((n-1)!\). And the general term in the sum that is \(e^x\) is \(\frac{x^n}{n!}\). (We already proved this but I like it so much I am repeating it.)

This tells us that \(e\) is \(1 + 1 + \frac{1^2}{2!} + ... + \frac{1}{n!} + ...\)

**Exercise 6.5 Sum the first 18 terms of this series using a spreadsheet** .

I get something like \(2.718281828459...\) for the number \(e\). It turns out that \(e\) is not rational or even a solution to a polynomial equation. Such numbers are called **transcendental.**

**And how is \(e^x\) defined when \(x\) is not an integer?**

**When \(x\) is rational, say \(\frac{n}{m}\), \(e^x\)is the \(m^{th}\) root of \(e^n\). Otherwise it is defined by the endless power series proven above:**

**\[e^x = \sum_{n=0}^{\infty}\frac{x^n}{n!}\]**

We begin with a brief review of plane geometry. We then introduce the sine function, and then the notion of the vector of a line segment and the wonderful things vectors tell us. Finally we review trigonometry find the derivatives of trigonometric functions.

This chapter is a review of all you should know about plane geometry trigonometry and much more. I am sure you have seen the first half of it before so you can whiz through it.

Starting with 7.1b you may find new information worth knowing. What is relevant to calculus is the last section on derivatives of trigonometric functions.

This section is a review of plane geometry. Plane Geometry is about properties of points lines and figures in a plane (or more generally in other surfaces). The elementary concepts are points and straight lines. Lines that do not meet are said to be parallel.

Euclid, long ago noticed the following things:

Every pair of points determines a unique straight line containing both.

Two lines are parallel if they never meet.

In a plane, every line and a point not on it determine a unique line parallel to the former containing the latter. (This is not true on all other surfaces)

Every pair of lines that are not parallel have a unique point of intersection.

He derived all sorts of wonderful consequences from these statements.

If you cut a line in two at point \(C\), each half is called a **ray,** starting at \(C\).

If you cut at \(D\) any ray starting at \(C\), you get another ray starting at \(D\) and a **line segment** with ends at \(D\) and \(C\).

Two rays, \(a\) and \(b\) both starting at \(C\) and meeting only at \(C\), determine two angles at \(C\). Unless \(a\) and \(b\) are part of the same straight line, one of these is smaller than the other. We can call them \(aCb\) and \(bCa\). We can describe the angle \(aCb\) as corresponding to all the rays that emanate from \(C\) that are clockwise past \(a\) and before \(b\).

We can describe any point in the plane by its \(x\) and \(y\) coordinates. We always put the x coordinate first. Thus \((5,7)\) means the point whose \(x\) coordinate is \(5\) and \(y\) coordinate is \(7\). A line segment can then be described by the coordinates of its two ends. **If the two end points both have the same \(y\) coordinate, the length of the segment is the difference between their \(x\) coordinates, (and the same with \(x\) and \(y\) reversed).** Thus the distance between points with coordinates \((5,7)\) and \((2,7)\) is \(3\).

**We have to define the distance between points when the points differ from each other in both coordinates.** We want distance to be a meaningful concept and one that does not depend on the coordinate system being used. The definition we choose has the important and necessary property that the length of a segment is the same no matter what direction we choose as the x direction; that direction is our choice. (Length and distance would not be intrinsic to the segments if they changed with that choice.)

And this is the definition: * The square of the distance between two points is the sum of the squares of the differences in each coordinate. Distance is the positive square root of this sum*. Thus, the distance squared between points having coordinates \((1,1)\) and \((4,5)\) is \(3^2 + 4^2\) or \(9 + 16\) which is \(25\); the distance between these points is \(5\).

We call some particular length a unit length, and say that **a segment has length \(x\) if it is \(x\) times longer than that unit.** It really doesn't matter what the unit is, plane geometry is the same with any. (By the way, this is not so if we were dealing with the surface of a sphere instead of a plane.)

The next question is, **how do we similarly describe angles?** We can declare any angle to be a unit angle, and associate the size of any other angle to be whatever multiple of that unit angle it is. Traditionally, there are two commonly used unit angles, and it is wise to be familiar with both.

Euclid (or some other ancient person) defined the **angle obtained by going all the way around from one side of a ray to the other side of it, to be \(360\) degrees.** (Why? I think the answer is that it is easy to divide 360 by small numbers; in fact 360 is divisible by all numbers from 1 to 10 except 7, and it is itself not a very big a number.)

With that definition, a "**straight line" angle**, which occurs when \(a\) and \(b\) (the sides of the angle) **point exactly opposite one another, is \(180\) degrees**. A right angle, which is half of a straight line angle, is \(90\) degrees, and so on. **Two segments or rays or lines that make a right angle** at \(C\) are said to be **perpendicular**.

The second commonly used measure of involves a unit circle. This is the set of points that are all a unit distance from the central point \(C\). Then we can measure an angle by the **length of the portion of the unit circle inside the angle.** The distance all the way around the unit radius circle is the circumference of the unit circle which is \(2\pi\). That means that a straight angle has size \(\pi\) and a right angle size is \(\frac{\pi}{2}\). The unit of distance here is called the **radian**.

**What is a radian?**

Well, \(\pi\) is close to \(\frac{22}{7}\). So \(2\pi\) is near \(\frac{44}{7}\) or roughly \(6.28\) radians. If we divide \(360\) by \(6.28\) we get that a radian is something near \(57\) degrees.)

To be more specific, \(\frac{22}{7}\) is \(3.142857\)... \(\pi\) is \(3.141593\)... \(1\) radian is \(\frac{360}{2\pi}\) which is \(57.29578\) degrees while \(\frac{360}{2*22/7}\) is \(57.27273\).

(You are best off not trying to remember these details. It is enough to remember that the angle change in going around a circle is \(360\) degrees, and also is \(2\pi\) radians. This means that a radian is \(\frac{360}{2\pi}\) degrees. If you do not want to use a machine to get the answer above, you can replace \(\pi\) by \(\frac{22}{7}\) and approximate \(1\) radian by \(\frac{360*7}{44}\) which is \(\frac{630}{11}\) and you will be wrong by a little less than one twentieth of one percent.)

**What did Euclid deduce from his postulates?**

Here is one simple fact and its proof:

Fact: **When lines meet, opposite angles are the same, and the sum of any two adjacent angles is \(\pi\) radians.**

*Proof: Suppose lines \(a\) and \(b\) meet at \(C\) and denote the ray of \(a\) on one side of point \(C\) by \(a\) and on the other side by \(a'\), and do similarly for \(b\). Then there are 4 angles at \(C\): they are \(aCb, bCa', a'Cb'\) and \(b'Ca\).*

*Then any consecutive pair of these, including \(b'Ca\) and \(aCb\), form a straight line angle, if added together.*

*This means that, for example \(aCb\) and \(a'Cb'\) when added to \(bCa'\) both are the same, which implies that \(aCb\) and \(a'Cb'\) are the same.*

Before mentioning any more conclusions, we make one more definition. Suppose we have an angle \(\theta\) that is less than a right angle.

We choose a center point and draw a **unit circle** around it. (This is a circle whose radii have length \(1\).) We draw the angle in question at the center and choose one side of it to be the \(x\) axis (on which \(y\) is \(0\).). Let \(P\) be the point at which the other side of the angle meets the circle. Then we draw a line segment in the direction of the y-axis that goes between the x-axis and \(P\).

**The \(y\) coordinate of the point \(P\), which is the length of that perpendicular line, is called the sine of the angle \(θ\), written as \(\sin θ\).** Notice that the perpendicular line is a straight line down from P to x-axis, and so is shorter than the path from P to that axis along the circle, which is the size of the angle in radians. This means that **the sine is always less than the angle when the latter is measured in radians. When the angle is small, the sine is pretty close to the angle size, measured in radians, because the straight path and the path on the circle are almost the same.** Thus we have \(\sin 0 = 0\) and the derivative of \(\sin x\) at \(x = 0\) is \(1\) when angles are measured in radians.

Notice also that the sine starts \(0\) at the angle \(0\), and increases to \(1\) when the angle becomes \(\frac{\pi}{2}\). We can define it the same way for larger angles as well, as the y coordinate of \(P\). After \(\frac{\pi}{2}\) the sine decreases as the angle increases, and reaches \(-1\) when we get three quarters of the way around the circle, at angle \(\frac{3\pi}{2}\). Then it goes up again as \(θ\) increases. Of course when the sine is negative it is minus the length of the perpendicular line, which corresponds to the fact that the \(y\) coordinate of \(P\) is negative there.

**Angles are often described (in radians) as going from \(-\pi\) to \(-\pi\) as you go around the circle.** That way, \(0\) angle as usual corresponds to the positive x-axis, but angles below the x-axis have negative size. If so the angles for which the sine is negative are negative angles. The sine is an **odd function** of its angle argument which means:

\[\sin(-θ) = -\sin(θ)\]

The angle **"complementary" to \(θ\)** is the angle whose sides are the positive y-axis and the ray from the center of the circle (the origin here) through \(P\). The sine of the complementary angle to \(\theta\) is called the cosine of \(\theta\), written as \(\cos(θ)\). **This cosine is the \(x\) coordinate of the point \(P\) on the unit circle.** A glance at the picture shows that the cosine, the \(x\) coordinate of \(P\), is an even function of the angle \(\theta\). It is the same whether the angle is positive or negative.

**Triangles**

Another way to describe the sine of \(\theta\) when \(\theta\) is less than \(\frac{\pi}{2}\) is in terms of the triangle formed by **the two sides \(a\) and \(\b\) of the angle and any line perpendicular to one side** .

This triangle has a right angle where that side \((B)\) meets the perpendicular line. The sine of \(\theta\) is then the length of the side of that **right triangle** opposite \(\theta\), divided by the length of the side opposite the right angle, which is called the hypotenuse of that triangle. (I think it is daffy to use Greek letters for angles, and to use words like hypotenuse for the longest side of a right triangle, but that’s what everyone does.)

The points on the unit circle all have length \(1\), which means that the sum of the squares of the \(x\) component and \(y\) component of all points on it is \(1\). The \(y\) component’s length is \(\sin(\theta)\); its square is \(\sin^2(\theta)\). We can deduce that the square of the \(x\) component, which is \(\cos(x)^2\) must be \(1- \sin^2(\theta)\). This tells us

**\[\sin^2(\theta) + \cos^2(\theta) = 1\]**

Suppose we pick three points not all on one line, and join them in pairs by line segments. They form a triangle. Each triangle has three sides and three interior angles. **Two triangles** are said to be **congruent, when the three interior angles of one have the same values as interior angles of the other, and three side lengths of one match those of the other** . If the \(3\) angles are the same we call them **similar** triangles even when the lengths are different.

There are two interesting questions about triangles: First, **what restrictions are there on the three side lengths and three angle sizes for them to form a triangle?**

If we consider lengths alone, for them to form a triangle the **largest side cannot be larger than the sum of the other two sides** . (Proof: The two smaller sides must meet opposite ends of the largest one, and otherwise they are then not long enough to meet each other. If the sum of the lengths of the two smaller ones is exactly the length of the largest one, they all have to lie on one line which does not describe a triangle.) This condition is called the **Triangle Inequality.**

We can prove that the sum of the angles in a triangle is \(\pi\) radians, but it requires use of the "parallel postulate". (The parallel postulate says that there is exactly one line parallel to any line that passes through any point.) This has to be, because the sum of the angles of a triangle is not \(\pi\) radians on the surface of a sphere. If you define each pair of antipodal points to be a single point, geometry on the surface of sphere obeys all the others of Euclids axioms and postulates.

If we consider angles alone, we have seen that in a right triangle, the other two angles are complementary, which means that the sum of their sizes, in radians, is \(\frac{\pi}{2}\). Thus the sum of all three angles of a right triangle is \(\pi\), the straight line angle.

This is so for any triangle:

**The sum of the interior angles in any triangle is \(\pi\) radians.**

*Proof: Suppose we start with a triangle \(ABC\) whose largest angle is at point \(A\). If we draw a line segment from \(A\) to \(BC\) perpendicular to the latter meeting it at point \(P\), we have divided our triangle into two right triangles, and the sum of the angles of these two is \(2\pi\) radians. This sum consists of the interior angles at \(A, B\) and \(C\), and a straight line angle at \(P\). Since the angle at \(P\) is \(\pi\) radians in size, the interior angles sum to \(\pi\) as well.*

**Exercise 7.11: Suppose the perpendicular so that \(P\) was outside of the original triangle. Prove by similar reasoning that the sum of the angles of that triangle is \(\pi\) radians.**

The second question is: **How many of the six parameters (angle sizes and side lengths) are needed to determine all the size parameters of a triangle?**

Euclid used constructions with ruler and compass to answer such questions, and these are lots of fun. But we can do even better using the concept of the sine. With it, we can actually figure out the missing information whenever the triangle is determined.

Obviously, if all we know of a triangle is one side length, there are lots of triangles that are not similar to one another that can have a side of that length. The same is true if we only know one angle. Knowing two angles tells us the third angle as well, since the sum of all three is \(\pi\) radians. That **means knowing two angles means all triangles with same are similar,** but we know nothing of their side size. **Knowing two angles and a side length between any particular pair of angles determines all three lengths,** as we shall see.

Knowing two side lengths alone does not determine angles at all **; knowing two side lengths and the angle between them does, and so does knowing all three sides** . There are elegant facts that allow us to determine all the missing information as we shall see.

Actually **knowing two side lengths \(A\) and \(B\) with \(A\) greater than \(B\), and knowing the angle where the \(B\) side meets the third \(C\) side, determines everything as well,** and we can find the missing information here also.

When the sides lengths are \(A, B\) and \(C\), and \(A\) is greater than \(B\), and we know the angle where the \(A\) side meets the \(C\) side, there are either \(0\) or two solutions, except for one special case (which happens when the side lengths \(A, B\) and \(C\) determine a right triangle so that \(A^2 = B^2 + C^2\).) To have a solution, \(B\) must be at least \(A \sin \theta\).

**Knowing three side lengths determines the triangles completely.** We will now prove all these statements by use of sines.

**How We Find Missing Triangle Parameters**

One tool for doing this is the **Law of Sines**. This is the statement that **the size of side \(A\), divided by the size of side \(B\), is the sine of the angle opposite side \(A\) (this is the angle where \(B\) and \(C\) meet) divided by the sine of angle opposite side \(B\).** If we know two angles we know the third, and their sines, so if we know any one side length, we know its ratio to all other side lengths and can calculate the other two side lengths.

*Proof of the Law of Sines: Given a triangle with side lengths \(A,B\) and \(C\), draw a line segment perpendicular to the \(C\) side from it to the vertex opposite it. The length of that segment is \(A \sin(AC)\) and also is \(B \sin(BC)\), by the definition of the sine. This means \(\frac{A}{B}\) is \(\frac{\sin(BC)}{\sin(AC)}\) which is the statement above.*

**Exercise 7.12: Draw yourself a picture with vertex labels instead of segment length labels and verify these statements.**

**The law of sines tells us that if we know all the angles of a triangle, then we know their sines and hence we know all the ratios between side lengths in it. We can thus deduce that similar triangles have the same ratios of side lengths of corresponding sides.**

Before describing how to find missing parameters in a triangle when we know three sides only, or two sides and an angle, we make one more definition. We have used the notation \((5,7)\) to describe a point with \(x\) coordinate \(5\) and \(y\) coordinate \(7\). In these terms a line segment is described by giving the two coordinates of each of its endpoints. This is cumbersome. **For many purposes such as determining length, we don't really care where the segment starts; what is important to us is only the differences between each of its two coordinates at the two endpoints.** These determine the length and the orientation of the segment.

Thus, given the line segment whose endpoints are \((1,2)\) and \((3,6)\) the differences at the two endpoints are 2 in \(x\) coordinate and \(4\) in y coordinate. We write this information as \(2\hat{i} + 4\hat{j}\), where \(\hat{i}\) and \(\hat{j}\) are called unit vectors in the \(x\) and \(y\) directions and say that the given line segment has \(2\hat{i} + 4\hat{j}\) as its **vector.** Actually **this notation describes the directed segment with direction from \((1,2)\) to \((3,6)\); the segment directed oppositely is the negative of this one.**

In general, **each directed line segment, say one from \((a,b)\) to \((c,d)\), defines a vector, namely** \((c-a)\hat{i} + (d-b)\hat{j}\), with \(\hat{i}\) and \(\hat{j}\) unit vectors in the \(x\) and \(y\) directions respectively. This vector contains information relative to the segment important to us, but says nothing about where it starts, or what the \(x\) and \(y\) directions are, except that they are perpendicular to one another.

To see the use of this definition, suppose we have a triangle with vertex points \((1,2)\), \((3, 7)\) and \((6,2)\)

Each side of the triangle is described by two of these point descriptors, namely those of its two endpoints. And we often have no interest in where the triangle is located in the plane. Suppose we direct the segments to form a cycle.

The line \((1,2)\) to \((3,7)\) has vector \(2\hat{i} + 5\hat{j}\). The line \((3,7)\) to \((6,2)\) has vector \(3\hat{i} - 5\hat{j}\). The line \((6,2)\) to \((1,2)\) has vector \(-5\hat{i}\).

Notice that the sum of the vectors corresponding to this cycle is the \(0\) vector.

**In general, the sum of a bunch of vectors that correspond to the line segments of a directed path is the vector from the beginning of that path to its end. In the case of a cycle these are the same point and the sum is thus the \(0\) vector.**

*Proof: In forming the sum vector, the intermediate coordinates get added from their incoming vector and subtracted from their outgoing one, and so drop out. Only the contributions from the endpoints remain.*

This information is implicit on the notation describing lines by points, but that notation has too much information, and is much harder to work with.

But the wonderful thing is, given two line segments we can easily extract important information from their vectors. The first bit of information is what is called their **dot product:** given \(a\hat{i} + b\hat{j}\), and \(c\hat{i} + d\hat{j}\), their dot product is \(ac + bd\). **You multiply together like components and add them up**. We have already seen that **the dot product of a vector with itself gives the square of the length of its line segment.** In general, as we will prove, the dot product **of two different vectors gives the product of the length of the two segments, multiplied by the cosine of the angle between them. The angle between them** is the angle you get if you line the two segments up with the same back vertex, directed away from it.

When the two segments form part of the boundary of a cycle triangle, the interior angle of the triangle is not the angle of size \(\theta\) between them, but instead has size \(\pi-\theta\), and the cosine of this angle is \(-\cos\theta\). **Draw pictures and use them to verify this claim.**

The proof of the evaluation of the dot product here comes from the fact that this product is an **invariant;** which means it does not depend on the orientation of the coordinate system.

**How do you know the dot product is an invariant?**

Claim: If we rotate our coordinates so that the unit vector \(\hat{i}\) is replaced by \(\hat{i}\cos\theta + \hat{j}\sin\theta\) and \(\hat{j}\) is replaced by \(\hat{i}\sin\theta - \hat{j}\cos\theta\), the dot product between any two vectors does not change.

**Exercise 7.13: Prove this for a vector \(\vec{v}\) that points in the \(x\) direction, and a general \(\vec{w}\) vector.**

This means we can choose our coordinate system so that the first vector, \(\vec{v}\) whose length is \(|\vec{v}|\) points in the \(x\) direction, so that \(\vec{v}\) is \(|\vec{v}|\hat{i}\). The second vector \(\vec{w}\) similarly is \(|\vec{w}|(\cos\theta\hat{i} + \sin\theta\hat{j})\) when the angle between \(|\vec{v}|\) and \(|\vec{w}|\) is \(\theta\). The dot product of the two is \(|\vec{v}|||\vec{w}|\cos\theta\) by its definition.

**Law of Cosines: If three directed line segments form a cycle triangle, then their side lengths \(A\), \(B\) and \(C\) obey \(C^2 = A^2 + B^2 - 2AB \cos(\theta)\), where \(\theta\) is the interior angle of the triangle where the \(A\) and \(B\) segments meet.**

*Proof: We have seen that the sum of the vectors of all the sides of the triangle is the \(0\) vector. This means that the vector for the \(C\) segment is minus the sum of the \(\hat{A}\) and \(\hat{B}\) vectors.*

*The square of the length of \(C\) or \(C^2\) is then the square of the sum of the \(\hat{A}\) and \(\hat{B}\) vectors, which is the dot product of this sum with itself. This is \(A^2 + B^2 + 2AB\cos(\pi-\theta)\), (remember that the interior angle \(\theta\) where the \(A\) and \(B\) segments meet is \(\pi - \theta\)). The conclusion follows from the fact: \(\cos(\pi - \theta) = -\cos\theta\).*

**We can see immediately from this law that knowing \(A\) and \(B\) and \(\theta\) determines \(C\), and also knowing \(A\), \(B\) and \(C\) determines \(\cos(θ)\).**

**This law of cosines therefore allows us to deduce all side lengths and angles of the triangles given either three side lengths or two side lengths and the angle between the two corresponding sides.**

The other case in which all the information can be deduced is when, in the formula above, we know \(C\) and \(A\) and \(\theta_{AB}\), which is angle where \(A\) and \(B\) meet, and \(C\) is bigger than \(A\). Filling in the given information in the law of cosines yields a quadratic equation for \(B\). When \(A\) is less than \(C\) one of the two solutions to this equation is negative, so we can determine the unique solution by finding the one positive solution to the quadratic equation obtained.

**Exercise 7.14: Define lengths for \(A\) and \(C\) and an angle \(\theta_{AB}\), obtain the quadratic equation for \(B\) and find its positive solution. Verify that the other solution is negative.**

**Cross Products and Areas**

We have seen above how the dot product of two vectors (along with their dot products with themselves) conveys useful information about the segments they describe, namely the product of their lengths with the cosine of the angle between them.

There is another thing we can do with vectors in the plane called their **cross product**. This product depends not only on the directions of the line segments but on the order in which one places them. But it is quite simple.

In forming the dot product you multiply like components and add them. **In forming the cross product you multiply unlike components and subtract them.** Obviously the sign of what you get depends on which you subtract from which. This depends on you and not on the segments. But the magnitude of the cross product has real meaning. **It is the Area of the parallelogram formed from the two line segments as adjacent sides. This is twice the area of the triangle with these line segments as sides** .

Proof: *The area of the parallelogram is its base length multiplied by its height. If the base has length \(a\) and the other side, with length \(b\) forms angle \(\theta\) with the base, the height is \(b \sin\theta\), and the area is \(ab \sin\theta\). That is exactly what the magnitude of the cross product is if a points in the \(x\) direction. The conclusion follows from the invariance of the cross product under rotation of coordinates, which is proven exactly as one proves the invariance of the dot product.*

Suppose we have the vectors \(2\hat{i} + 3\hat{j}\), and \(4\hat{i} - 7\hat{i}\), Their cross product is (up to sign) \(3*4 -2*(-7) = 12 + 14 = 26\). Thus the parallelogram they form has area \(26\), and the triangle they form has area half of this or \(13\).

**The cross product of a vector with itself is \(0\).**

By the way, dot and cross products can be formed in higher dimensions. In three dimensions, points have three components and so do vectors. **The dot product is defined the same way in any dimension as the sum of the products of like components, and has the same meaning in all.**

**The cross product in two dimensions involves both components. In higher dimensions it is formed by taking two dimensional cross products with each pair of coordinates.**

In three dimensions you can multiply \(x\) and \(y\) components and subtract and can do the same with \(y\) and \(z\) components and also with \(z\) and \(x\) coordinates. We make a sort of vector by making these in order the \(z\), \(x\) and \(y\) components of **the cross product vector.**

\[ (a_x\hat{i} + a_y\hat{j} + a_z\hat{k}) \times (b_x\hat{i} + b_y\hat{j} + b_z\hat{k}) = (a_xb_y - a_yb_x)\hat{k} + (a_yb_z-a_zb_y)\hat{i} + (a_zb_x-a_zb_x)\hat{j} \]

(The \(k\) term is the ordinary two dimensional \(x\), \(y\) cross product. You can determine the other terms by changing \(x\) to \(y\), \(y\) to \(z\) and \(z\) to \(x\), and also \(i\) to \(j\), \(j\) to \(k\), and \(k\) to \(i\), once, and also twice.)

**The cross product of two vectors in three dimensions points perpendicular to the plane of the segments that these vectors represent. Its magnitude is the area of any parallelogram whose sides are represented by these vectors.**

**Given three vectors \(A\), \(B\) and \(C\), the dot product of \(C\) with \(A \times B\) is the volume of a parallelepiped whose sides are described by these vectors.**

**Exercises: 1. Prove these two statements. (Hint: choose directions such that the vector a points in the \(x\) direction and \(b\) lies in the \(xy\) plane.**

**2. Given two line segments that are perpendicular to each other. What does all this imply about the dot product of their vectors? About the magnitude of their cross product?**

**Introduction**

In 7.1, we introduced lots of trigonometry without actually mentioning it.

Trigonometry is a long and off-putting name for what is really a fun subject. A **Trigon** is a fancy name for a triangle; analogous to the words octagon or pentagon, Metry refers to measurement. So trigonometry means either measuring triangles, or using triangles to measure other things, I’m not sure which; maybe both.

We will describe the remaining important trigonometric results. One of the mysteries of trigonometry is: Why does every one of the six ratios of side lengths in a right triangle have its own special name? Why for example does \(\frac{1}{\sin x}\) have a name of its own? When I studied trigonometry in school, (in prehistoric times) we were confronted with all six names, and quizzed on and expected to memorize which means which without any clues at all. This turned many of us off to trigonometry.

Suppose our angle \(theta\), as in the picture here lies between the \(x\) axis and the line \(0B\). The ancients drew a line segment that extends **from the point \(B\) tangent to the unit circle to the \(x\) axis at point \(C\). The length of this segment they called the tangent** of the angle \(\theta\). (When the line has a positive slope the tangent is taken to be negative.) Tangent is a Latin word that means 'touching', and that is what this line does to the circle, at point \(B\).

**The \(x\) coordinate of the point \(A\) where the tangent line meets the \(x\) axis, is called the secant of \(\theta\)** (we are assuming that the origin is at the center of the unit circle.) Secant is a Latin word meaning 'cutting' which is what this line does to the circle.

They also defined **the complement of an angle that is less than a right angle to be the difference between a right angle and it.** This got them to define **the cosine, cotangent and cosecant as the sine, tangent and secant of the complement of the original angle.**

Fortunately for us, all of these six functions are easily related to the sine function, which means that we need only really become familiar with the sine, and we can then figure out what the others are.

Here are the relations between these functions, all of which follow from the definitions from the fact that **corresponding angles of similar triangles are equal.**

By definition, **\(\cos\theta\)** is **\(\sin\left(\frac{\pi}{2}-\theta\right)\).**

**From triangle \(BCD\) in which the hypotenuse is \(\tan \theta\) and the side not opposite \(\theta\) is \(\sin θ\), we get**

**\[(\cos \theta)(\tan \theta) = \sin \theta\]**

which means

**\[\tan\theta = \frac{\sin\theta}{\cos\theta} = \frac{\sin\theta}{\sin(\pi/2-\theta)}\]**

**The complementary version of this is:**

**\[\cot θ = \frac{\cos \theta}{\sin θ} = \frac{\sin(\pi/2-\theta)}{\sin \theta}\]**

From the triangle \(BCO\)

we similarly get

**\[(\sec \theta)(\sin \theta) = \tan \theta\]**

which means

**\[\sec \theta = \frac{1}{\cos \theta}\]**

and the complementary version is

**\[\csc \theta = \frac{1}{\sin \theta}\]**

So all this explains why every ratio of side lengths of a right triangle has a name of its own.

I like this picture so much, here it is again but as a mathlet.

**Exercises:**

**7.21. Draw a this picture yourself, without looking at this one, for an angle \(\theta\) that is less than \(\frac{\pi}{2}\) showing all of these entities.**

**7.22. How many similar triangles do you see? Remember that the two angles other than a right angle in a triangle with a right angle are complementary.**

**7.23. What strange relations do you get if you use the definition of sine in the triangle \(OAC\)?**

We have, in the previous section, already discussed the three basic theorems of trigonometry that you should know. There are also useful "addition theorems" of trigonometry, which describe what sine and cosine of sums of arguments are. We also describe the derivatives of the sine and cosine, and their relation to exponentials.

**What were the basic Theorems of Trigonometry?**

**1.The Pythagorean Theorem**: This famous result states that **the square of the hypotenuse of a right triangle is the sum of the squares of its other two sides.** Translated to our definitions it says that for any angle, we have

which implies that, up to sign we have

\[\cos\theta = \sqrt{1-(\sin\theta)^2}\]

2. **The Law of Sines**: This states that in any triangle \(ABC\) the ratio of the sines of its angle at \(A\) to its angle at \(B\) is the ratio of the lengths of the side opposite \(A\) to the side opposite \(B\). If we describe these lengths as \(l(BC)\) and \(l(AC)\) respectively, we have

\[\frac{\sin A}{\sin B} = \frac{|BC|}{|AC|}\]

We proved this in section 7.1C

3. **The Law of Cosines**: This statement gives the length of the side \(BC\) of a triangle in terms of the lengths of \(AB\) and \(AC\) and its angle at \(A\)

**\[|BC|^2 = |AB|^2 + |AC|^2 - 2 |AB||AC|\cos A\]**

Also proven in 7.1C

**Derivatives of Sines and Cosines**

Consider a point \(P\) on the unit circle which circle is centered at the point \(C\). Let \(\theta\) be the angle clockwise from the line segment \(CP\) to the \(x\) axis.

We then have \(x = \cos\theta\) and \(y = \sin\theta\). We know that \((\cos\theta)^2 + (\sin\theta)^2\) is \(1\) for any \(P\) on the unit circle since this is the statement that \(x^2 + y^2\) for such \(P\) is \(1\), which is the definition of the unit circle.

This means that the derivative of \((\cos\theta)^2 + (\sin\theta)^2\) is \(0\) as we move around the unit circle is \(0\). This tells us

\[2\cos\theta\left(\frac{d\cos\theta}{d\theta}\right) + 2\sin\theta\left(\frac{d\sin\theta}{d\theta}\right) = 0\]

which means that the vector with components \(\cos\theta\) and \(\sin\theta\) has \(0\) dot product with the vector whose components are their corresponding derivatives.

It is, fortunately very easy to find all vectors in two dimensions that have \(0\) dot product with a given one: you reverse the components, change one of their signs, and multiply by any constant \(c\). Thus \((a,b)\) (which is a shorter way of writing \(a\hat{i} + b\hat{j}\)) has \(0\) dot product with \((cb, -ca)\).

This tells us \(\frac{d\sin\theta}{d\theta} = c\cos\theta\) and \(\frac{d\cos\theta}{d\theta} = -c\sin\theta\), for some constant \(c\).

We can determine the constant \(c\) by examining these statements at the point for which \(\theta = 0\).

If our angle is measured in radians, we have observed that on the unit circle, moving a distance \(d\theta\) from angle \(0\) changes the sine from \(0\) to almost \(dθ\). Thus the constant \(c\) above is \(1\) at angle \(0\), and being a constant, is always \(1\).

We conclude \((\sin\theta)' = \cos\theta, and (\cos\theta)' = -\sin\theta\). (The latter relation actually follows from the former from the fact that \(\cos\theta\) is \(\sin(\frac{\pi}{2} - \theta)\).

**Exercise 7.25: Deduce the derivatives of the secant and tangent from these facts.**

**Why not tell us the answers?**

**If we did this you will have trouble remembering them. If you figure them out yourself you will have trouble forgetting them.**

An interesting consequence is gotten by looking at the combination \(\cos x + i \sin x\). (\(i\) here is the square root of \(-1\).) Notice that its derivative is \(i\) times itself. And its value at \(x = 0\) is \(1\). We know what that means. A function whose derivative is \(q\) times itself, whose value at \(x = 0\) is \(1\), is \(\exp(qx)\).

We therefore find: \(\cos x + i \sin x = \exp(ix)\).

In general we can divide any function \(f\) into an odd part and an even part; the even part is \(\frac{f(x) + f(-x)}{2}\) and the odd part is \(\frac{f(x) - f(-x)}{2}\). The sum of the two parts is \(f(x)\).

Since \(\cos x\) is even and \(\sin x\) is odd, we can identify \(\cos x\) as the even part of \(\exp(ix)\), and \(i\sin x\) as its odd part.

The formal expressions are

**\[\cos x = \frac{\exp(ix) + \exp(-ix)}{2}\]**

and

**\[\sin x = \frac{\exp(ix) - \exp(-ix)}{2i}\]**

**Power series expansions of Sines and Cosines**

We have seen that \(\cos 0\) is \(1\). Since the derivative of the sine is the cosine, \(\sin x\), when written as a sum of powers of \(x\), must have a term in that sum whose derivative is \(1\). That has to be \(x\), so that the first term in the power series expansion for \(\sin x\) is \(x\). The cosine has derivative \(-\sin x\), so it must have a term in its power series expansion whose derivative is \(-x\), which term must be \(-\frac{x^2}{2}\). \(\sin x\) similarly must have a term whose derivative is that, namely \(-\frac{x^3}{3!}\); this forces a term in the cosine series to be \(-\frac{x^4}{4!}\), etc.

**So what do we get?**

The sine has contributions from all the odd powers and their signs alternate:

**\[\sin x = x - \frac{x^3}{3!} + \frac{x^5}{5!} - ... + (-1)^k\frac{x^{2k+1}}{(2k+1)!} + ...\]**

while the cosine has similarly alternating sign terms from the even powers:

**\[\cos x = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - ... + (-1)^k\frac{x^{2k}}{(2k)!} + ...\]**

By the way, \(\cosh x\) and \(\sinh x\) are the even and odd parts of \(\exp(x)\). Their power series expansions are similar to those of the cosine and sine except that all terms are positive.

There are power series expressions for all the trigonometric functions, but you can figure them out yourself if you ever want to do so, from their relations to the sine and cosine of angle

**Addition Theorems**

**What is an addition theorem?**

We have already noticed that the standard measure of angle, in terms of degrees or radians is additive: this measure of the sum of two angles is the sum of the same measures of each summand. This statement is not true for sines or cosines. The sine of the sum of two angles is **not** the sum of their sines. The addition theorems tell us how to compute the sine and cosine of the sum of two angles in terms of the sines and cosines of the two angles that are summed.

The easiest way to find or prove addition theorems for sines and cosines is to use their relations to exponentials. We already know the addition theorem for exponentials:

**\[\exp(cx + cy) = \exp(cx)\exp(cy)\]**

Since \(\exp(ix)\) is \(\cos x + i\sin x\), we find that \(\exp(i(x+y))\) is

\[(\cos x + i\sin x)(\cos y + i\sin y)\]

which is

**\[((\cos x)(\cos y) - (\sin x)(\sin y)) + i((\sin x)(\cos y) + (\cos x)(\sin y))\]**

**The real part of this last expression is \(\cos(x+y)\) and the imaginary part is \(\sin(x+y)\) and these are our addition theorems.**

For the hasty:

\[ \frac{d(\tan x)}{dx} = \frac{d(\sin x/\cos x)}{dx} = \frac{\cos x}{\cos x} - \frac{\sin x(-\sin x)}{(\cos x)^2} = 1 + (\tan x)^2 \]

From triangle \(OBC\) in the picture at the top of section 7.2 we find that the last is \((\sec x)^2\).

\[\frac{d(\sec x)}{dx} = \frac{d(1/\cos x)}{dx} = -\frac{(-\sin x)}{(\cos x)^2} = (\tan x)(\sec x)\]

We describe the notion of the inverse of a function, and how such a thing can be differentiated, If f acting on argument x has value y, the inverse of f, acting on argument y has the value x. The derivative of the inverse of f at argument x is the reciprocal of the derivative of f at argument f(x).

The inverse of a function \(f\) is another function \(f_{inv}\) defined so that \(f(f_{inv}(x)) = x\) and \(f_{inv}(f(x)) = x\) both hold.

In words, the inverse function to \(f\) acting on \(f\) produces the identity function, \(x\). Also \(f\) acting on its inverse function is the identity function.

We have encountered this notion before. A typical example of inversion is the square root. The square root function is the inverse of the square function.

This concept has three complications that you must learn to handle. First, is the question of notation. We are tempted to use the notation \(f^{-1}\) for the inverse function to \(f\), and we often do this. But we shouldn't and often we don't use that notation, because it is sometimes used to represent the reciprocal function, whose value at argument \(x\) is \(\frac{1}{f(x)}\).

The commonest inverse functions are, **the inverses to powers like** \(x^k\) which are called **roots** and denoted as \(x^{\frac{1}{k}}\) and the inverse to the **exponent function, \(\exp(x)\),** which is called **the natural logarithm of \(x\)** and denoted as **\(\ln(x)\).**

The **inverse sine function** is called **the arcsine** and is denoted as **\(\arcsin(x)\).** On most spreadsheets it is written as =asin(B6), (if you want the arcsine of what is in box B6.) There are similarly, \(\arccos(x)\), \(\arctan(x)\), and so on.

The second complication is that **the inverse function is not in general defined everywhere.** A function like the exponent, \(\exp(x)\), or the square, **whose values are always non-negative,** will, upon interchanging values and arguments, **only be capable of definition for non-negative arguments.** All the other functions we have been considering so far, can be defined almost everywhere; inverse functions, however, often have restricted domains unless we want to extend our number system.

The final complication is **that many functions that we like to invert take on the same value for more than one argument.** The function, \(f\) with \(f(x) = x^2\), the function that squares, taking \(x\) to \(x^2\), is a good example of this.

\(5\) and \(-5\) have the same square. Which should be called the square root of \(25\)?

The sine function is periodic and repeats itself endlessly as you go around and around a circle, with period \(2\pi\) (measuring angles in radians.). Which of its many arguments for which sine has the same value should be taken as the value of the inverse function to the sine?

The answer to such questions is that in inverting a function \(f\) which takes on the same values more than once, we must **first restrict the domain of \(f\) so that this does not happen,** so **that \(f\) takes on each value at most once, in this restricted domain, if we want its inverse to be a single valued function.** The square function can be restricted to the non-negative numbers, or to the non-positive numbers, (or to appropriate mixtures). **After such restriction this problem disappears,** since \(f\) is single valued in the restricted domain.

For roots we typically restrict the domain of the power being inverted to be the non-negative numbers, which means that the square root, which we write as \(x^{\frac{1}{2}}\), is always positive. In principle we could have chosen \(x^{\frac{1}{2}}\) to be negative instead, or negative over part of its domain and positive on the rest. We do not do this for two reasons: first it is an unnatural thing to do; second, the positive square root has the nice property that the square root of a product, say of \(xy\) is the product of their square roots; this is not true for negative square roots, since the product of two of them is positive, and is never a negative square root.

In general, what we have been saying means that the inverse function to \(f\) requires an added condition to be well defined, when \(f\) is not single valued. To get a unique inverse function you must make a restriction of the domain of \(f\) to one in which \(f\) is single valued.

There are three observations to be made about inverse functions, two nice and the other less nice.

The first nice one is that it is **very easy to find the graph of an inverse function from that of the original function,** and therefore to decide on a domain for \(f\) (which becomes the range for \(f^{-1}\). It is similarly easy to graph \(f^{-1}\) on a spreadsheet.

One way to find the graph of the inverse function is to rotate your paper (which has the graph on it) by \(\pi\) radians (\(180\) degrees) around the main diagonal (the line through the origin at angle \(\frac{\pi}{4}\) or \(45\) degrees counterclockwise from the \(x\) axis.)

You will then find that you have to look through your paper at the function but that can usually be done and if you start with the graph of \(f\) you are looking at the graph of the inverse function to \(f\).

For the spreadsheet, you can set up the spreadsheet you use to graph a function, and copy the column of arguments \(x\) beyond the column of values \(f(x)\), and then highlight and do an x-y scatter chart of the old \(f\) and new \(x\) columns. You will see the graph of the inverse to your function.

**Exercises:**

**8.1 Set up a spreadsheet that plots the exponent function in the domain from \(-3\) to \(2\). Copy the argument column after the value column for it and highlight the value column and the copied column and plot the inverse function to the exponent, which is the natural \(\log\) function. For what argument is \(\ln x\) \(0\)? For what argument is it \(1\)? \(-1\)?**

**8. 2 The mathlet below allows you to enter functions and plot their inverses as well as themselves. Check your answer to \(1\) by finding the inverse to \(\exp x\) in the given domain with the mathlet.**

The not so nice observation is that there is no standard obvious way of finding the value of an inverse function at a particular argument \(x\). All the other functions we have discussed can be found by performing simple standard operations such as adding, dividing, multiplying, subtracting, and substituting. But there is no such procedure for inverses.

And in general, there cannot be one. This is because in general you have to choose the domain for the original function to make it single valued, and a means of calculating the inverse would have to know in advance what decision you will make in doing so, if it is to get the corresponding inverse. This is something it cannot in general do.

Of course most inverse functions that you will ever encounter, and perhaps all of them, are accessible as functions on your spreadsheet or calculator. You can compute them by pushing a button. This is because the maker of your machine and its programs has made the decision for you as to what domain to choose for the original function and hence what range to get for the inverse function to it, and has used some sneaky procedure for computing it once it has been determined.

The first good news is that even though there is no general way to compute the **value** of the inverse to a function at a given argument, there is a simple formula for the **derivative** of the inverse of \(f\) in terms of the derivative of \(f\) itself.

In fact, the derivative of \(f^{-1}\) **is the reciprocal of the derivative of \(f\), with argument and value reversed.**

This is more or less obvious geometrically. The derivative of the function \(y\) is \(\frac{dy}{dx}\), while that of any inverse function to \(y\) is \(\frac{dx}{dy}\) which will be the reciprocal of the former at the value \(x\) if evaluated at the \(y\) value \(y(x)\).

Let's prove this using algebra. All we have to do is to apply the chain rule to the defining property of \(f^{-1}\) namely \(f(f^{-1}(x)) = x\). By the chain rule we have \(\frac{df^{-1}(x)}{dx}\frac{df(y)}{dy} = 1\) evaluated at \(y = f^{-1}(x)\).

This means that the derivative of the inverse function is the reciprocal of the derivative of the function itself, evaluated at the value of the inverse function.

**The argument seems simple enough but it is confusing. Can you use this rule to actually find derivatives of inverses without going nuts?**

Let us see what this means for the exponential function and its inverse, \(\ln(x)\). The derivative of the exponent function is itself, \(\exp(x)\). Then the derivative of the logarithm function, \(y = \ln(x)\) is the reciprocal of the exponent, evaluated at \(\ln(x)\); this is \(\frac{1}{\exp(\ln(x)}\), which is \(\frac{1}{x}\). This latter claim follows from the definition of inverse which tells us that \(\exp(\ln(x)) = x\).

Similarly, for the sine function, since its derivative at argument \(x\) is \(\cos(x)\), the derivative of \(\arcsin(x)\) is the reciprocal of the cosine of itself, or \(\frac{1}{\cos(\arcsin(x)}\).

You could leave it at that, but we generally reduce it to something slightly less ugly. A spreadsheet is as happy with this as with the result we end up with in the next paragraph. By the way, my spreadsheet gives the arcccosine function of argument A6 wherever I enter =acos(A6).

Since as we have seen in Chapter 7, \(\cos x\) is \((1 - (\sin x)^2)^{\frac{1}{2}}\), and \(\arcsin(\sin x)\) is \(x\), we find that \(\cos(\arcsin x)\) is \((1-x^2)^{\frac{1}{2}}\), and the reciprocal of that is the derivative of \(\arcsin x\).

\[ (\arcsin x)' = \frac{1}{\cos(\arcsin x)} = \frac{1}{1-\sin(\arcsin x)^2)^{\frac{1}{2}}} = (1-x^2)^{-\frac{1}{2}} \]

Similarly the derivative of \(x^k\) is \(kx^{k-1}\). This tells thus that the derivative of \(x^{\frac{1}{k}}\) is the reciprocal of that evaluated at argument \(x^{\frac{1}{k}}\). This is

\[\frac{1}{k}(x^{\frac{1}{k}})^{1-k} = \frac{1}{k}x^{\frac{1}{k}-1}\]

This is exactly the same result, \((x^a)' = ax^{a-1}\), that holds for integer powers,

In fact for any rational power, \(a\), positive or negative, we have

\[(x^a)' = ax^{a-1}\]

We have already mentioned **another piece of good news about inverse functions.** Even though there is no obvious way to compute a particular value of one, at a particular argument, there is any easy way to compute the value of \(f^{-1}(x)\) with a spreadsheet that you can actually perform in about a minute, once you know how, assuming you know how to compute \(f\). All you have to do is reverse the order of the \(x\) and \(f(x)\) columns in doing an \(xy\) scatter chart. By doing this you can see that the result gives a "multiple valued function" rather than an ordinary function, and can pick out your favorite single valued range for the inverse.

**Exercises:**

**8.3 Use the fact proven above, \((x^{\frac{1}{k}})'= \frac{1}{k}x^{\frac{1}{k}-1}\) to find \((x^{\frac{j}{k}})'\). (you can use the multiple occurrence rule, or the product rule)**

**8.4 The tangent of an angle \(z\), denoted as \(\tan z\), is the ratio given by the sine divided by the cosine: \(\tan x = \frac{\sin x}{\cos x}\). What is the derivative of \(\tan z\)? From it find the derivative of \(\arctan z\) (called the arctangent of \(z\)), the inverse function to \(\tan z\), (when the domain of \(\tan z\) has been restricted to be from \(-\frac{\pi}{2}\) to \(\frac{\pi}{2}\).)**

The rules for differentiation discussed so far allow us to find formulae for derivatives of most functions you will encounter.

However there actually are other rules for differentiating that we will eventually discuss. We have not done so here because defining the functions to be differentiated involves concepts we have not yet discussed.

In particular, functions whose derivatives we have not yet considered are: **infinite sums**, and also **areas between a function's graph and the x-axis between two given \(x\) values.** Such latter things are called **definite integrals. If the given x values are both finite and the function is bounded from above and below, it is called a proper integral.**

The rules are quite simple when they work, so the only interesting thing about them is determining when they work.

**For an infinite sum you can apply the sum rule and just sum the derivatives of all the terms to get the derivative of the sum,** unless the sum becomes infinite in some way.

**The same is true for differentiating a proper integral of a function with respect to a parameter that appears in that function** . You can just differentiate the function with respect to that parameter and find the integral of the result, if it makes sense, and there were no infinities lurking in the problem.

**You can also differentiate an integral with respect to a variable that is one of the endpoints of the area defining it.** The answer then, for the upper endpoint, is **the integrand, which is the function defining the integral, itself, evaluated there**, as we shall see soon. (By the way, this statement is one direction of the Fundamental Theorem of Calculus and it is very easy to prove as we shall soon see. Here is the statement:

\[\frac{d}{dx} \int_{y=z}^{y=x} f(y) dy = f(x)\]

**Exercise 8.5 Plot \(\tan x\) and \(\text{atan}\,x\) using the applet that takes inverses.**

**I am getting tired of this stuff.**

Well, we really are done with what is traditionally taught about how to differentiate functions. The multiple occurrence rule, the chain rule and the inverse rule tell us how to differentiate anything we can construct, starting from the three functions \(x\), \(\exp x\), and \(\sin x\), whose derivatives we know. It is easy to make mistakes when you find derivatives, so it is wise to have a way to check your answers.

**How?**

An easy way is to compare them to the results of differentiating numerically, which we next describe.

We discuss how you can numerically differentiate a function with high accuracy with little effort. One setup can allow you to do so for any function you can enter by doing so once, and doing some copying.We then indicate how one can estimate the derivative of your function at say a hundred points and graph it on your spreadsheet, and also testing the accuracy of your estimates.

**How can we find a good approximation to the derivative of a function?**

The obvious approach is to pick a very small \(d\) and calculate \(\frac{f(x+d)-f(x)}{d}\), which looks like the definition of the derivative. Actually, this is not a great idea.

**Why?**

The problem is that your means of calculating are not infinitely accurate, particularly if \(f(x)\) or \(f(x+d)\) are irrational numbers. This implies that there will sometimes be small errors in your evaluations. \(f(x+d)\) and \(f(x)\) will differ from each other by something like \(f'(x)d\) when \(d\) is small, but your calculation errors in them will be roughly independent of \(d\). As a result, as you make \(d\) smaller, the ratio of your error to \(f'(x)d\) grows. Dividing the result by a very small \(d\) is the same as multiplying it by a very large \(\frac{1}{d}\), and that amplifies the error. When \(d\) becomes smaller than the size of your calculational errors, the estimate you get for the derivative will be mostly calculational error and will tell you very little about \(f'(x)\). The kind of error that arises in this way is usually called round-off error.

The upshot of this is that you really want to compute the derivative using only relatively large values of \(d\).

**Is this possible?**

The answer is yes! And doing it is lots of fun.

**How?**

Here is the basic idea: Suppose your function \(f\) is not only differentiable, but its derivative is also differentiable at argument \(x\), and so is its derivative, etc.

If so, the value of \(f(x+d)\) can be described by a power series,

\[f(x+d) = f(x) + f'(x)d + f''(x)\frac{d^2}{2} + f^{(3)}(x)\frac{d^3}{3!} + ...\]

(here \(f^{(j)}(x_0)\) means the \(j^{th}\) derivative of \(f(x)\) evaluated at \(x = x_0\).) To prove this, calculate the derivatives of both sides with respect to \(d\) at \(d = 0\), and the second derivatives etc.)

We want \(f'(x)\), so we want to get rid of the \(d^2\) and further terms on the right.

If we form \(\frac{f(x+d)-f(x)}{d}\) we will get \(f'(x) + f''(x)\frac{d}{2} + f^{(3)}(x)\frac{d^2}{3!} + ...\) and there is an error term of \(f''(x)\frac{d}{2}\) that is proportional to \(d\).

On the other hand, if we instead form **\(\frac{f(x+d)-f(x-d)}{2d}\)** all the terms in the series above that have even powers of \(d\) disappear, and we get \(f'(x) + f^{(3)}(x)\frac{d^2}{3!} + f^{(5)}(x)\frac{d^4}{5!} + ...\) The error term in this expression is proportional to \(d^2\).

This is already a big improvement over the obvious estimate. Here the error decreases as \(d^2\) rather than as \(d\) as you reduce \(d\). The wonderful thing is, we can do even better, by eliminating the \(f^{(3)}\) term, and then the \(f^{(5)}\) term, and so on, as far as we want to go.

How can we do that?

Well, you can combine your evaluation of \(\frac{f(x+d)-f(x-d)}{2d}\) with one of \(\frac{f(x+2d)-f(x-2d)}{4d}\). The second of these will have the same \(f'(x)\) term in the power series, but \(4\) times more of the \(d^2\) term. Thus if we form \(4\) times the first of these and subtract the second, we will end up with three times \(f'(x)\), no \(d^2\) term at all, and only correction terms of order \(d^4\) and higher.

Thus if we define \(C(d)\) to be **\(\frac{f(x+d)-f(x-d)}{2d}\),** the combination \(\frac{4C(d)-C(2d)}{3}\) will yield \(f'(x)\) plus an error coming from the fifth derivative term in the original expansion rather than the third, and that error term will be proportional to d^{4} in our computation (plus terms proportional to \(d^6\), \(d^8\), etc.).

Call this combination \(D(d)\); then similarly, \(\frac{16D(d)-D(2d)}{15}\) (call this \(E(d)\)) will yield \(f'(x)\) plus an error that comes from the seventh derivative, and is proportional to \(d^6\). And you could go on to form \(256\) times this \(E(d)\) minus its value for twice \(d\) all divided by \(255\) to get an expression whose error will be proportional to \(d^8\).

What this means is that dividing \(d\) by \(2\) will reduce the error in this last estimate \(E(d)\) by a factor of \(2^8\) which is \(256\).

**This looks like a mess.**

But it isn't. It is very easy to do all this on a spreadsheet, and you can see what happens to each of the estimates above as you reduce \(d\) by a factor of \(2\) successively, for any function you can write down, and any argument \(x\).

Not only that, you can change the argument by changing only one entry, and change the function by changing only one entry and doing some copying.

**OK, how?**

We will set this up using the function \(\sin(x)\) to be specific

**Preliminaries:**

*1. Put Calculating \(f'(x)\) in A1*

*2. Put the name of your function in A2*

*3. Put dstart in A3 and your starting value for \(d\) in B3 (I put \(1\) in B3)*

*4. Put the letter \(x\) in A4 and your value of \(x\) in B4 (I put \(1\) there also)*

*5. Label columns in row 5 as follows: in A5 put \(d\), in B5 \(x\), in C5 \(x+d\), in D5 \(x-d\), in E5 \(f(x)\), F5 \(f(x+d)\), G5 \(f(x-d)\), H5 \(C(d)\), in I5 \(D(d)\), in J5 \(E(d)\)*

**Setup**

*Now in A6 enter =B3, in B6 enter =B$4, in C5 enter =A6+B6, in D6 =B6-A6, in E6 enter =f(B6). For example =tan(B6)*

*Then copy E6 to F6 and G6. In H6 enter =(A6-G6)/2/A6*

*Copy B6 through H6 down their columns to row 50*

*Now enter in A7 =A6/2 and copy A7 down to row 50. In I7 enter =(4*H7-H6)/3 and copy I7 down to row 50*

*Finally in J8 enter =(16*I8-I7)/15 and copy J8 down to row 50.*

*What is all this?*

Column A will contain the difference d used in your computation. It will divide by \(2\) in going from one line to the next. Column B contains the value of \(x\) at which you are finding the derivative, C and D contain \(x+d\) and \(x-d\). E contains \(f(x)\) and F and G contain \(f(x+d)\) and \(f(x-d)\). H contains the estimate \(\frac{f(x+d)-f(x-d)}{2d}\), column I contains the improvement obtainable by taking four times the H estimate, subtracting the similar estimate for \(d\) replaced by \(2d\), and dividing this difference by \(3\). J contains the improvement obtained by similarly subtracting the \(2d\) estimate in I from \(16\) times the \(d\) estimate there, and dividing by \(15\).

**Here is the result for the function \(\tan(x)\) at \(x = 1\).**

Number of rows

25

Number of digits after decimal point

10

Notice that column E is accurate to \(9\) places when \(d\) is around \(\frac{1}{100}\).

To change \(x\) you need only put the value you want in B4. To change functions you enter the new function in E6 with variable B6, copy to F6 and G6, and copy columns E, F, and G down to the \(50^{th}\) row. The other columns need not be changed at all.

**Exercises.**

9.1 Set this up yourself. What is the value of d when the difference between E and F reaches the ten digit accuracy?

9.2 Try finding \(\tan(1)\) using \(\frac{f(x+d)- f(x)}{d}\) instead of E as above. For what \(d\) value do you reach the correct answer to ten digit accuracy?

9.3 Find a function and value for which H above does not get the answer to ten digit accuracy.

The spreadsheet construction above gives the user the ability to find the derivative of a function at one specific argument. We want to do the same thing at many different arguments, which can be turned into a chart or graph of the derivative function.

This can be accomplished by picking a single value of \(d\), putting the structure described in Section 9.1 all on one row, and copying that row down. Now each row will correspond to an argument \(x\) that increases by \(q\) from that of the previous row. If we compute both D and E we can compare them. The difference between what is in D and E is a measure of how bad the D estimate is. If it is too big for what we want we can reduce our \(d\) until we like the result.

OK, how?

Here is an outline of how to do this. It consists of a list of columns and what to put in them.

Suppose you want to graph the value and derivative of a function, say \(\sin(x)\) from \(x = 0\) to \(x = 5\).

You will probably want to put the information: Graphing \(f(x)\) and \(f'(x)\) in A1 and \(\sin(x)\) in A2. In A3 put: starting argument; and in B3 enter \(0\), in A4 enter ending argument; and in B4 enter \(5\). In A5 enter number of arguments; and in B5 put your favorite number, say \(100\). In C5 put =(B4-B3)/B5; In A6 enter \(d\) and in B6 enter \(0.01\). Also make the following entries: A7 \(0\), B7 \(1\), C7 \(2\), D7 \(4\), E7 \(-1\), F7 \(-2\), G7 \(-4\).

The idea is to put entries in columns as follows: The only entries you really need enter are two rows of column A, and one of each of columns , H, O, R, T, and the rest is copying. Changing parameters only involves changing the data entered in the paragraph above. Changing the function only involves changing the data entry in column H and copying that into columns I through N and down \(100\) rows. (The factors of \(\frac{1}{2}\) in columns R, S, and T come from the fact that columns Q and R are convenient for copying, but are twice and four times the approximate derivatives.)

In A9, enter x

In B9, x+d

In C9, x+2d

In D9, x+4d

In E9, x-d

In F9, x-2d

In G9, x-4d

In H9, sin(x)

In I9, sin(x+d)

In J9, sin(x+2d)

In K9, sin(x+4d)

In L9, sin(x-d)

In M9, sin(x-2d)

In N9, sin(x-4d)

In O9, (sin(x+d)-sin(x-d))/(2d) which is the \(d\) approximation to the derivative

In P9, (sin(x+2d)-sin(x-2d)/2d which is \(2\) times the \(2d\) approximation

In Q9, (sin(x+4d)-sin(x-4d)/2d which is \(4\) times the \(4d\) approximation

In R9, (4O-P/2)/3 which is the estimate with error proportional to \(d^4\)

In S9, (4P-Q/2)/3 which is \(2\) times the estimate with error proportial to \(d^4\)

In T9, (16R-S/2)/15 which is the estimate with error proportional to \(d^6\)

In U9, A x data

In V9, H f(x) data

In W9, T f'(x) data

In X9, T-R accuracy check, if this number is small, error is small

Columns U,V, W and X are used for graphing our functions. If the maximum value in the X column is unacceptably large, \(d\) should be reduced.

Here are the entries that need be entered. Suppose we start in row 10 (remember to have A7 =0, B7 =1, C7=2, D7=4, E7=-1, F7=-2, G7=-4).

A10 =$B$3+A$7*$B$6

A11 =A10+$C$5

Copy A11 down column A until B4 is obtained

Copy A10 to B10, … G10, and A11 to B11, … G11

Copy B11 to G11 down those columns as far as you have copied column A

H10 =sin(A10) copy across to I10, J10, K10, L10, M10, and N10

O10 =(I10-L10)/$B$6/2 copy to P10 and Q10

R10 =(4*O10-P10/2)/3 copy to S10

T10 =(16*R10 - S10/2)/15

The following are repetitions of previous defined columns done to make scatter plots:

U10 =A10 which is \(x\)

V10 =H10 which is \(f(x)\)

W(10) =T10 which is the estimate of derivative of \(f(x)\)

X(10) =T10-R10 which is the improvement in estimate from using T instead of R

Now copy row 10 from column H through X down as far as column A goes.

Make an \(xy\) scatter chart from the insert chart menu of the last 4 columns.

The parameters entered in B3-B6 can be changed there. The function can be changed in H10 and copied as above to I10 through N10 and down those columns.

If you have calculated the derivative of \(f\) you can create a column for it as well and see if the plot is (or values are) any different from the numerical derivative.

**Here is the result for the function \(\sin(x)\) from \(x = 0\) to \(x = 5\).**

Number of increments

100

Number of digits after decimal point

5

**Exercises:**

Set this up and apply it to the function \(\tan x\) from \(x = 0\) to \(x = 1.5\), What happens if you make the upper limit \(1.6\)?

**Can we differentiate any function anywhere?**

Differentiation can only be applied to functions whose graphs look like straight lines in the vicinity of the point at which you want to differentiate. After all, differentiating is finding the slope of the line it looks like (the tangent line to the function we are considering) No tangent line means no derivative.

Also when the tangent line is straight vertical the derivative would be infinite and that is not good either.

**How and when does non-differentiability happen [at argument \(x\)]?**

Here are some ways:

1. The function jumps at \(x\), (is not continuous) like what happens at a step on a flight of stairs.

2. The function's graph has a kink, like the letter V has. The absolute value function, which is \(x\) when \(x\) is positive and \(-x\) when \(x\) is negative has a kink at \(x = 0\).

3. The function is unbounded and goes to infinity. The functions \(\frac{1}{x}\) and \(x ^{-2}\) do this at \(x = 0\). Notice that at the particular argument \(x = 0\), you have to divide by \(0\) to form this function, and dividing by \(0\) is not an acceptable operation, as we noted somewhere.

4. The function is totally bizarre: consider a function that is \(1\) for irrational numbers and \(0\) for rational numbers. This is bizarre.

5. The function can't be defined at argument \(x\). When we are talking about real functions the square root cannot be defined for negative \(x\) arguments.

6. The function can be defined and finite but its derivative can be infinite. An example is \(x^{1/3}\) at \(x = 0\).

7. The function can be defined and nice, but it can wiggle so much as to have no derivative. Try to differentiate \(\sin\left(\frac{1}{x}\right)\) at \(x = 0\). This kind of behavior is called **an Essential Singularity at \(x = 0\).**

These are the only kinds of non-differentiable behavior you will encounter for functions you can describe by a formula, and you probably will not encounter many of these.

Now you have seen almost everything there is to say about differentiating functions of one variable. There is a little bit more; namely, what goes on when you want to find the derivative of functions defined using power series, or using the inverse operation to differentiating. We will get to them later.

We next want to study how to apply this, and then how to invert the operation of differentiation.

This chapter contains some random comments and a summary of the rules for algebraic differentiation, for students.

**10.11 Where are we?**

**Numbers are numbers. Read the section on them again.**

**Functions are sets of (argument, value) pairs of numbers** . They are often described by formulae which tell us how to compute the value from the argument. Only one value is allowed for each argument. The formulae you will usually encounter start with the identity function, the exponential function and the sine function, and are defined by applying arithmetic operations, substitution and inversion in some manner to them.

**The derivative of a function at any argument is the slope of the straight line it resembles near that argument, if that slope is finite.** The straight line it resembles near that argument is called **the tangent line to the function at that argument** and the function describing that line is called the **linear approximation to the function at that argument**. If the function does not look like a straight line near an argument, (has a kink or a jump or crazy behavior there) it is not differentiable at that argument.

There are straightforward rules for calculating derivatives of the identity, sine and exponential functions, and for computing derivatives of combinations of these obtained by applying arithmetic operations, substitution and inversion in some manner to them.

Thus we have means to obtain formulae for the derivative of all functions of the kind described above. The rules appear below.If you are not comfortable with them, practice!

Armed with a spreadsheet, you can plot functions and determine their derivatives with great accuracy, most of the time, with little effort.

**What else should I know at this point?**

First, you should feel comfortable with calculating or computing derivatives numerically.

So far, all we have said about the exponential function is the statements that its value at argument \(0\) is \(1\), and it is its own derivative everywhere. And the sine function is \(0\) at argument \(0\) and has derivative that is the sine of the argument complement to it.

You would be well advised to review the properties of the sine and the other trigonometric functions and the exponential. These are described in section T.

**OK, what can we do with this?**

The two major applications of differentiation are modeling phenomena, and solving equations.

**Do I really expect to do these things?**

You cannot ever be called on to do either of these things if you have no idea how to do them. Similarly you will only rarely be asked to cross a road if you never learned how to walk. Once you know about these things, all sorts of possibilities open up that you can begin to handle.

Once a model of a phenomenon has been constructed, you want to be able to deduce the consequences of the model. This involves getting back from derivatives or equations involving derivatives to the functions whose derivatives they are.

The processes of going from a derivative back to a function is sometimes (rarely) called **anti-differentiation**, and usually called **integration** or **quadrature** (also a rare name). Going from an equation involving derivatives to the original function is called **solving** (or integrating) **a differential equation.**

In the next section we will describe a way to use differentiation to solve non-linear equations involving one variable, and other methods for doing so as well. Then we will discuss integration and you will learn how to do it, where possible, both numerically and by formula. We will then give examples of use of derivatives in modeling real world situations. Finally we will examine how to solve differential equations numerically, and so discover the implications of such models.

**Is this all I have to know about calculus?**

The answer depends on your goals.

If you seek only a qualitative notion of what calculus is about, you can quit when you are satisfied that you have one. At this point we have only discussed differentiation. The inverse operation to taking the derivative of a function is of equal interest and is yet to come.

If your goal is to understand the language of science, in which models of change appear everywhere, this is a good start but there is more, in two directions.

First, we live in a world in which it takes three numbers to describe the location of a point in space; six numbers to describe the location of two points, and so on; and people often want to model motion in space. Thus we need to be able to examine change when we are dealing with several or many variables at a time. So we need to be able to extend the notion of differentiation to the analog of functions which depend on more than one variable. Doing this means extending the notion of derivative to sets of argument-value pairs for which the arguments and/or the values are sequences of numbers rather than single numbers. The study of such things is called Multi-Variable Calculus.

Fortunately it is possible to make the desired extension in a way which allows you to exploit your ability to differentiate in one dimension to get results in higher dimensions. You have to learn some new concepts but the work of differentiating is the same. This subject largely consists of the introduction of new multi-dimensional concepts, and description of how they can be calculated or computed by the techniques of one dimensional calculus.

Second, there is a large amount of lore about differential equations that has developed over the years as people have studied equations that arise in real world applications. In the past, numerical methods, like those you can now apply, were completely impractical, and special methods were found to solve many classes of equations. These methods were also valuable for allowing people to get an idea of the solutions of more complicated equations without actually solving them.

The fact that these methods are adequate for solving very important problems in a number of fields, and that they provide intuition about many other equations means that they are still of interest and worth studying today.

Perhaps the first goal that is well worth your pursuing is to gain the possibility of understanding scientific literature. Papers in science and engineering use notions and notations of derivatives and integrals incessantly, and if these buffalo you, you can get nowhere with reading the literature. Once you are comfortable with the concepts of calculus and their notations, this difficulty disappears.

Enough vague nonsense!

**10.12 Algebraic Rules For Differentiation.** (And how to deduce them)

**Facts 0:** The derivative of a straight line function \(ax\) is the slope of the line it represents which is \(a\). A constant function has derivative \(0\). This mean from the original formula, the \(x\) is replaced by a \(1\) and any constant term is omitted. By definition we have \(\frac{de^x}{dx} = e^x\) and we have \(\frac{dsin(x)}{dx} = \cos(x) = sin(x + \frac{\pi}{2})\).

**Basic Rule 1:** To compute the derivative of a function having several occurrences of the variable (let it be \(x\)), take the derivative contribution from each occurrence separately, treating the others as constant, and add all these up

Consequences:

**Sum Rule:** \(\frac{d(f(x)+g(x)}{dx} = \frac{df(x)}{dx} + \frac{dg(x)}{dx}\)

**Product Rule:** \(\frac{d(f(x)g(x)}{dx} = \left(\frac{df(x)}{dx}\right)g(x) + f(x)\left(\frac{dg(x)}{dx}\right)\)

**Power Rule:** \(\frac{d(x^k)}{dx} = kx^{k-1}\) (\(k\) different \(x\)'s are replaced by \(1\) separately and summed)

**Quotient Rule:** \(\frac{d(1/f(x))}{dx} = -\frac{df(x)/dx}{f(x)^2}\). (differentiate both sides of the equation \(f(x)\left(\frac{1}{f(x)}\right) = 1\).)

**Basic Rule 2:** The derivative of a function of a function, \(f(g(x))\), is the product of \(\frac{df}{dg}\) evaluated at \(g = g(x)\) and \(\frac{dg(x)}{dx}\). This is called the **chain rule**, and it follows directly from the definition of the derivative when expressed as a ratio of changes.

Consequences:

**The Inverse Rule.** The inverse is defined by: If \(y = f(x)\) then \(x = f^{-1}(y)\),

Since \(\frac{dy}{dx} = f'(x)\), \(\frac{dx}{dy} = \frac{1}{f'(x)}\) which means (after switching variable names) \(\frac{df^{-1}(y)}{dx} = \frac{1}{f'(y)}\), evaluated at \(y = f^{-1}(x)\).

**The Fundamental Theorem: The derivative of a definite integral with respect to its upper limit is the integrand evaluated there.**

If you are comfortable with these facts, are not cowed by numerical computation, and make efforts to study your mistakes so that you have hope of not making them again, you are where you want to be concerning differential calculus in one dimension.

**Exercise: Imagine you are teaching a course in calculus. Make a list of 10 questions that you would find hardest to answer with regard to the material in the first 9 chapters. I believe that making up questions is a more challenging endeavor than is answering them.**

Newton’s method for solving equations is described along with directions for easy implementation on a spreadsheet. There is also an applet that allows you to enter an equation and apply the method from slider determined starting points.

If we have a linear equation, such as \(5x - 3 = 0\), there is a straightforward procedure for solving it. You apply "the golden rule of equations": do unto the left side exactly what you do unto the right side. And you do it until all you have on the left is \(x\).

Thus with this example you would add \(3\) to both sides, getting rid of the \(-3\) on the left, and then divide by \(5\), with the result: \(x = \frac{3}{5}\).

Suppose however, we have a more complicated equation, such as

\[\sin(x) - \exp(x) + 2 = 0\]

Our task here is to find a solution, or all the solutions, of such an equation. **We are assuming that the functions in our equation are continuous and differentiable in the domain of interest to us.**

First note that it is always a good idea to plot the left hand side here and observe, crudely, where it changes sign or comes very near to \(0\). This will tell you roughly where it becomes \(0\).

In the old days this was an extremely tedious task, in general, and people tried to solve equations without plotting, which is a bit like flying blind. It’s OK if you can do it, but why try if you don't have to do so?

There is a standard technique for solving such equations apparently goes back to Newton. And here it is.

You start with a guess for the solution you seek, picking an argument, call it \(x_0\). You then find the linear approximation to your function, \(f\), at argument \(x_0\), and solve the equation that states that this linear approximation is \(0\). Call the argument for which the linear approximation is \(0\), \(x_1\).

Now you do exactly the same thing, starting at \(x_1\): you find the linear approximation to \(f\) at \(x_1\) and solve the equation that this linear approximation is \(0\) to determine \(x_2\). And you continue this as long as you need to.

In the old days this was an extremely tedious thing to do, for any function. Finding \(x_{j+1}\) from \(x_j\) is quite easy, but doing it over and over again is a real bore.

Now with a spreadsheet, you can set this up and find solutions, with practice, in under a minute. You only have to do each step once, and copy.

**How?**

First let's see how to get \(x_{j+1}\) from \(x_j\).

The linear approximation to \(f\) at \(x_j\) is given by

\[Lf_j(x) = f(x_j) + (x-x_j) f'(x_j)\]

If we set this to \(0\) at argument \(x_{j+1}\) we get

\[f(x_j) + (x_{j+1} - x_j) f'(x_j) = 0\]

which has solution, obtained by dividing and subtracting from both sides appropriately

\[x_{j+1} = x_j - \frac{f(x_j)}{f'(x_j)}\]

**So what do I do on a spreadsheet?**

Suppose we put our first guess in box A1. We will put it and subsequent guesses in column A starting say, with A3 (just to leave room for labels).

We can then put \(f\) in column B and \(f'\) in column C.

To do this we need make the following entries:

In A3, enter =A1 (this puts starting guess \(x_0\) in A3)

In B3, =f(A3) (this computes \(f(x_0)\))

In C3, =f'(A3) (this computes \(f'(x_0)\))

In A4, =A3-B3/C3 (this applies the algorithm to get the new guess)

If you now copy A4 (not A3!) and B3 and C3 down the A, B and C columns, you have implemented the algorithm.

You can change your starting guess by changing A1, and change your function by changing B3 and C3 appropriately, and copying the results down.

**Does this really work?**

This method converges very rapidly most of the time. If you start near a \(0\) of \(f\), and are on "the good side" it will always converge. Otherwise it stands a good chance of doing so, but strange things can happen.

**What is the "good side"?**

Suppose you start above the solution, call the solution \(z\), so \(x_0\) is greater than \(z\). Then if \(f\) and the second derivative of \(f\) are both positive between \(z\) and \(x_0\), you are on the good side.

**Why?**

That the second derivative of \(f\) is positive, between \(z\) and \(x_0\), means that the first derivative of \(f\) is increasing between \(z\) and \(x_0\), which means that the slope of \(f\) is biggest, in the range between \(z\) and \(x_0\), right at \(x_0\).

All this means that the linear approximation to \(f\) at \(x_0\) will dive down to \(0\) faster than \(f\) does as you near the solution \(z\), so that \(x_1\) will lie somewhere between \(z\) and \(x_0\). And each successive \(x_j\) will lie between z and the previous one. As we get closer to \(z\), \(f\) will look more and more like a straight line, which will mean it will look more and more like its linear approximation, so you will get closer and closer to \(z\) faster and faster.

**Suppose we want to solve the equation \(\sin x - e^x + 2 = 0\), and we start with \(x = 0.3\) as a guess.** The derivative of the left side is \(cos x - e^x\).

Our spreadsheet instructions when filled in should look as follows:

In A1, enter 0.3

In A2, enter xj. In B2, f(xj). In C2, f'(xj).

In A3, =A1. In B3, =sin(A3)-exp(A3)+2. In C3, =cos(A3)-exp(A3)

In A4, =A3-B3/C3. In B4, =sin(A4)-exp(A4)+2. In C4, =cos(A4)-exp(A4)

Copy down columns A, B, and C

Number of steps

25

Number of digits after decimal point

10

What happens when you start at \(5\) instead of \(0.3\)? At \(0\)? At \(10\)?

**Exercises:**

**11.1 Suppose \(f\) is negative at \(x_0\) and \(x_0\) is bigger than \(z\). What condition on \(f''\) between \(z\) and \(x_0\) will mean you are on the good side? What is the condition when \(f\) is positive at \(x_0\) but \(x_0\) is less than \(z\) for you to be on the good side as discussed here?**

**11. 2 What will happen if \(f''\) has the wrong sign but the same sign between your guess and \(z\)?**

Still and all, the method can do bizarre things. If \(f' = 0\) at a guess, the iteration won't even make sense because you will divide by \(0\) in it. If \(f'\) is very near \(0\), the new guess will be very far from the old one, and it can zip around weirdly.

The following applet allows you to plot and view the method just by entering the function. (which is only slightly simpler than starting from scratch with a spreadsheet).

**Exercises:**

**11.3 What happens if you look for the solution to \(x^{-\frac{1}{3}} - 0.001 = 0\), and you try to use this method directly? How about \(\tan x = 1\)?**

**11.4 Find all solutions to \(\sin (x) - \exp(x) + 3 = 0\) for \(x\) positive, accurate to ten decimal places.**

Do I have to differentiate \(f\) to apply this algorithm?

**No! You can choose a value of d that is very small relative to the scale of your function and put =(f(A3+d)-f(A3-d))/(2*d) in C3 instead of =f'(A3).**

This will do just about as well as the regular Newton's method, practically always.

**Exercise 11.5 Redo exercise 11.5 with the entry, =(f(A3+B$1)-f(A3-B$1)/(2*B$1) in C3. How is your answer affected?**

**What can go wrong?**

For one thing, it is possible that our equation has no real solutions. In that case neither method can ever find one. Plotting \(f(x)\) against \(x\) will confirm this.

Another problem arises when your equation has more than one solution. Then the one you get depends on where you start. The Applet illustrates that.

**In general, if you arrive at a point \(x_j\) at which \(f'(x_j)\) is near zero, while \(f\) is not near zero, \(x_{j+1}\) will be very far away and the successive values of \(x\) could easily zoom around like crazy, even when you were once near the solution you were looking for.**

But the method is fun to use anyway, and you can easily tell when it is not working.

**Divide and Conquer**

There is another method for solving an equation that gets closer to a solution by a factor of \(2\) at each iteration, **if you can find \(2\) arguments at which your function has opposite signs**. You then look at the midpoint between them and replace the endpoint in which the function has the same sign as it has at the midpoint, with the midpoint.

**Exercise: Figure out how to implement this method on a spreadsheet. (hint: you can enter things like =if(D5*F5>0,C5,A5) which gives C5 if D5 and F5 have the same sign and A5 otherwise.)**

The anti-derivative is an operation that undoes differentiation, Since the derivative of a constant is 0, the anti-derivative gives an answer that says nothing about constant terms. The rules for differentiation each give rise to information on calculating anti-derivatives.

The antiderivative is the name we sometimes, (rarely) give to the operation that goes backward from the derivative of a function to the function itself. Since the derivative does not determine the function completely (you can add any constant to your function and the derivative will be the same), you have to add additional information to go back to an explicit function as anti-derivative.

Thus we sometimes say that the antiderivative of a function is a function plus an arbitrary constant. Thus the antiderivative of \(\cos x\) is \((\sin x) + c\).

The more common name for the antiderivative is the indefinite integral. This is the identical notion, merely a different name for it.

A wavy line is used as a symbol for it. Thus the sentence "the antiderivative of \(\cos x\) is \((\sin x) + c\)" is usually stated as: the indefinite integral of \(\cos x\) is \((\sin x) + c\), and this is generally written as

\[\int \cos x \; dx = (\sin x) + c\]

Actually this is bad notation. The variable \(x\) that occurs on the right is a variable and represents the argument of the sine function. The symbols on the left merely say that the function whose antiderivative we are looking for is the cosine function. You will avoid confusion if you express this using an entirely different symbol (say \(y\)) on the left to denote this. The proper way to write this is then

\[\int \cos y \; dy = (\sin x) + c\]

**Why use this peculiar and ugly notation?**

We do so out of respect for tradition. This is the notation people have used for centuries. We will see why they did so in the next section.

The first question we address is: if you give me a function, say \(g\), and ask me to find its indefinite integral, how do I do it?

The basic answer to this question is: there are no new gimmicks for doing this. You can work backwards from the rules for differentiation, and get some rules for integration, and that is essentially all you can do. But that allows you to integrate (find the antiderivative of) lots of useful functions.

The antiderivative of a sum of several terms is the sum of their antiderivatives. This follows from the fact that the derivative of a sum is the sum of the derivatives of the terms. And similarly, multiplying a function by a constant multiplies its antiderivative by the same constant.

Using these facts we can find the antiderivative of any polynomial.

**How?**

The fact that the derivative of \(x^k\) is \(kx^{k-1}\) is equivalent to the statement that the antiderivative of \(kx^{k-1}\) is \(x^k + c\). This means that the antiderivative of \(x^k\) is \(\frac{x^{k+1}}{k+1} +c\).

**What’s with this \(+c\) stuff?**

It is a reminder that the derivative of a constant is \(0\) so an anti-derivative as an inverse operation to a derivative is not completely determined. You can add any constant to an anti-derivative and get another one. Some believe that it was invented by pedants to torture students by penalizing them for occasionally ignoring this boring fact.

We can apply this to each term in a polynomial, and find its anti-derivative.

Thus, the anti-derivative of

\[3x^3 - 4x^2 - x + 7\]

is

\[\frac{3x^4}{4} - \frac{4x^3}{3} - \frac{x^2}{2} + 7x + c\]

Students typically find this so easy that when they are forced to find such an anti-derivative on a test, often their minds are already focused on the next question, and they absent mindedly forget and differentiate instead of anti-differentiating one or perhaps all terms. Please avoid this error.

**Exercises:**

**Find antiderivatives of each of the following functions:**

**12.1 \(x^3 - 3x^2 + 6\)**

**12.2 \(\cos (x)\)**

**12.3 \(\sin (2x)\)**

**12.4 \(\exp (2x)\)**

**12.5 \(x^{-\frac{1}{2}}\)**

**(check your answer by differentiating it.)**

We will now study the area of very irregular figures. In particular, if we have a curve defined by some function, we will consider the (signed) area between that function and the x axis, between specified values of x. Area above the axis counts as positive area, and that below the axis counts as negative area. We will be concerned with the definition, nomenclature and notation for such areas, and with means of evaluating them, both exactly and numerically. We will also relate such areas to anti-derivatives and describe tricks for evaluating integrals, which is what these areas are called.

13.1 Areas: Definition, Names, and Notations

13.2 The Fundamental Theorem of Calculus and Determining Areas

We start with the area of a rectangle with side lengths \(A\) and \(B\). As you know, this area is \(AB\). Our first task is to use this fact to provide a means of finding areas of irregular figures.

To do so we must first define precisely what we are trying to do.

Suppose we have some function, for example the sine function, and have an interval, say from \(0\) to \(1\), on the \(x\) axis. We can then plot the curve defined by

\[y(x) = \sin(x)\]

and ask for the area in the region whose sides are: the lines \(x = 0, x = 2, y = 0\) and the given curve.

This area is called **the definite integral of the function \(\sin(x)\) from lower limit \(0\) to upper limit \(1\).** The word definite is sometimes left out, and the area is then called the integral from \(0\) to \(1\).

The standard notation for it is:

\[\int_{x=0}^{x=1} \sin(x)dx\]

**Why this ugly notation? Why the weird \(\int\) thing?**

We use this notation because everyone else does. Cheer up! You will be able to recognize and read statements involving these symbols. They are not a threat. Imagine it as a weird S, standing for Sum. The integral is a weird kind of sum.

**What is \(dx\)?**

It indicates that if we divide the interval between the endpoints of the integral into tiny slivers of length \(dx\), the contribution to the area from the sliver containing the value \(x\) and it is almost a rectangle, the area of that rectangle will be the y difference between its ends, which is \(f(x)\) multiplied by the s difference which for each sliver is \(dx\); \(f(x)dx\), is thus the area in that sliver whose sum over all the slivers is the area we seek. If \(f(x)\) is \(\sin x\), The average value of \(\sin x\) here, multiplied by \(dx\) is what we sum over all the slivers to get the indicated integral.

In describing your integral you can often leave out the \(x\)’s when describing its endpoints , writing it as

\[\int_0^1 \sin(x)dx\]

Sometimes you want to describe the endpoints by giving the values of something other than the \(x\) value,

(Occasionally you may want to describe the lower and upper endpoints of the \(x\) interval by values of some other function \(g(x)\) in which you can do so by indicating the endpoints by \(g(x)=a\) and \(g(x)=b\)). For example the integral here could be described just as well as the integral from \(x^3=0\) to \(x^3=1\).)

We call this the integral from \(0\) to \(1\) of the sine function.

**It** **is the area between the four boundaries \(x = 0, x = 1, y = 0\), and \(y = \sin(x)\) counting any area below the \(x\) axis as negative.**

In general the area or integral from \(a\) to \(b\) of "the integrand" \(z(x)\) times \(dx\), is the area bounded by \(x = a, x = b, y = z(x)\) and \(y = 0\). Area below \(y = 0\) is counted negatively.

**What happens if the "lower limit" \(a\) is bigger than the "upper limit" \(b\)?**

**The area between \(x = a\) and \(x = b\) plus the area between \(x = b\) and \(x = c\) is, when \(a\) is less than \(b\) and \(b\) less than \(c\), merely the area between \(a\) and \(c\).**

This is such a wonderful property that we define the integral in the case you mention to make it hold true for all \(a, b\) and \(c\). This means the area from \(a\) to \(b\) plus the area from \(b\) back to \(a\) must be the area from \(a\) to \(a\), which is nothing at all. To make this happen we define the area from bigger to smaller to be minus the area from smaller to bigger. With this definition, the integral from \(a\) to \(b\) plus that from \(b\) to \(c\) is the integral from \(a\) to \(c\) no matter the numerical order of the numbers \(a, b\) and \(c\).

**And what good is all this?**

Our key task is to figure out how to determine what these areas are. And we have a mighty tool for doing this.

**Eh?**

**First notice that the notion of integral here gives us a new way to define a function**. We can make the upper limit of our integration vary, call it \(t\), and consider the resulting integral as a function of \(t\).

For example, we can write

\[g(t) = \int_0^t \sin(x)dx\]

And now we can ask, what is the derivative of the function \(g\) defined this way, as a function of \(t\)?

We are interested in the derivative of the integral

\[g(t) = \int_0^t \sin(x)dx\]

with respect to the upper limit, \(t\).

We can compute this derivative, roughly, by evaluating \(\frac{g(t+d)-g(t)}{d}\) for very small \(d\).

But \(g(t + d)-g(t)\) is just

\[g(t) = \int_t^{t+d} \sin(x)dx\]

The region between \(x = t\) and \(x = t + d\) is just a sliver, in which \(\sin(x)\) is very near \(\sin(t)\). So the area in this sliver between \(y = \sin(t)\) and \(y = 0\) is just \(d \sin(t)\), where \(d\) is the width of the sliver and \(\sin(t)\) its height, to a first approximation.

This tells us that the derivative of \(g(t)\), the derivative of the integral of the sine function at argument \(t\), is this area divided by \(d\), which is \(\sin(t)\).

**Exactly the same result holds for any function whose values for arguments sufficiently close to \(t\) are as close as you like to its value at \(t\). (These are called continuous functions) for all \(t\) between the limits of integration.**

This result is **called the fundamental theorem of calculus**. It says: **If you differentiate the integral of a function, \(f\), that is continuous at argument \(t\) in the closed interval including the endpoints of integration** (this is the condition that if's values are as close as you like to \(f(t)\) at arguments sufficiently near \(t\)) **you get back the value of the integrand, \(f\), at argument \(t\).**

Another way to say this is: **the integral with upper limit as variable,** an area as we have just defined it, is **an antiderivative of its integrand, when that integrand is continuous.**

This means that **integrating a function and then differentiating the result with respect to upper limit, gives back the function.**

**We can also make the same statement about applying these operations in the opposite order.**

Suppose we start with a differentiable function, \(f\), and form its derivative, \(f'(x)\), and integrate this derivative between somewhere, say \(a\), and \(t\).

In other words suppose we form

\[g(t) = \int_a^t f'(x)dx\]

The fundamental theorem then tells us: \(g(t) = f(t) - f(a)\).

To see this, recall that if \(f\) is differentiable at argument \(x\) then for \(d\) sufficiently small, we have, to any desired accuracy:

\[f' = \frac{f(x+d)-f(x)}{d} \enspace\text{or}\enspace f'd = f(x+d)-f(x)\]

If we chop the interval between \(t\) and a up into slices of widths given by \(d\) appropriate to each \(x\) value, we can sum up the contribution from either side of the equation \(f'd = f(x + d) - f(x)\) over all the slices. We use the same value \(d\) for each slice

The sum of the positive and negative terms in the last equation above will give us the sum of the areas in the little slices. This sum will "telescope". The left term from one slice will be the right term from the previous slice with the opposite sign; the two will cancel each other out, and we will get contributions only from the first and last slices. This means:

\[f(t) - f(a) = \int_a^t f'(x)dx\]

This is the standard form for the fundamental theorem.

**And what good is this "fundamental theorem"?**

The uses of this theorem, and of its analogues in higher dimensions, have been so significant in history that they cannot be exaggerated. We will ignore these here. For our purposes, the main use of this theorem is in allowing us to **evaluate integrals, that is, areas under curves**, for vast numbers of integrands.

**What integrands** **?**

For starters, we can integrate **any integrand that we can recognize as a derivative.**

For example, the sine is the derivative of minus the cosine. Applying the last equation above to this fact, we get

\[\int_a^t \sin(x)dx = \cos(a) - \cos(t)\]

The original area we used as an example was the integral of the sine from \(0\) to \(1\). This is \(\cos(0) - \cos(1)\) or \(1 - \cos(1)\).

**What else can we recognize?**

1. Any power of \(x\) such as \(x^a\), and therefore any polynomial or sum of powers.

2. The exponent function, \(exp(x)\) and therefore \(exp(kx)\) for any \(k\).

3. The derivative of the arctangent, of the tangent and arcsine, and lots more.

**Exercises: Calculate the integrals defined as follows:**

**13.1 Integrand \(\sin(x)\cos(x)\) from \(0\) to \(2\).**

**13.2 Integrand \(x^2 + 3x - 7\) from \(1\) to \(4\).**

**13.3 Integrand \((1 + x^2)^{-1}\) from \(0\) to \(+\infty\).**

**13.4 Integrand \((2 + x)^{-1}\) from \(0\) to \(1\).**

**13.5 Write down some horrible function. Differentiate it. Now ask some friend (former friend?) to integrate your result. You will know the answer!**

**13.6 Remember the separate occurrence rule for this one. Differentiate (with respect to t): \(g(t) = \int_0^t \sin(x-t)dx\).**

The techniques of integration are basically those of differentiation looked at backwards.

The rule for differentiating a sum: **It is the sum of the derivatives of the summands**, gives rise to the same fact for integrals: **the integral of a sum of integrands is the sum of their integrals.**

**13.3.1 The Product Rule Backwards**

The product rule, says that the **derivative of a product is the sum gotten by differentiating each factor as if the other were constant and adding up the results.**

We can read this backwards as a way to handle an integrand of the form \(fg'\), when we know how to handle the integrand \(f'g\). For, we can write the product rule as

\[fg' = (fg)'-f'g\]

and integrating both sides tells us

\[ \begin{aligned} \int_a^b f(x)g'(x)dx &= \int_a^b \frac{d(f(x)g(x))}{dx}dx \\ \thinspace &= \int_{x=a}^{x=b} (d(f(x)g(x)) - \int_a^b f'(x)g(x)dx \\ \thinspace &= f(b)g(b) - \int_a^b f'(x)g(x)dx \end{aligned} \]

This statement is called **"integrating by parts"** and is useful for integrands like \(x^k\exp(x)\) or \(\ln(x)\) or \(x\ln(x)\).

For example, to integrate \(\ln(x)\), set \(f(x) = \ln(x)\) and \(g'(x)= 1\). Then we have \(f'(x) = \frac{1}{x}\) and \(g(x) = x\). Thus \(f(x)g(x)\) is \(x\ln(x)\) and \(f'g\) is \(1\).

We can conclude that the integral of \(\ln(x)\) from \(a\) to \(b\) is \(b\ln(b) - a\ln(a) - (b-a)\). This answer is often written as

\[(x(\ln x)-x)\vert_a^b\]

**Exercise 13.7 Do the other integrals mentioned just above: with integrands \(x^k \exp(x)\) for \(k = 1\) and \(k = 2\), and also \(x\ln(x)\).**

**13.3.2 The Chain Rule Backwards**

The chain rule tells us how to differentiate \(f(g(x))\) and the answer is \(\frac{df}{dg}\frac{dg}{dx}\).

This tells us that we if we can recognize an integrand as having the form \(\frac{df}{dg}g'\), we can integrate it over \(dx\) to get \(f(g(x))\) evaluated at \(b\) less its evaluation at \(a\).

What can we recognize this way?

Here are examples you should mull over: \((\sin x)^6cos x\), and \(\frac{\ln(x)}{x}\). Try guessing what to choose for \(g(x)\) in each case and see if you can get it to work. If you fail, try again.

Using the chain rule backwards is sometimes called **the method of substitution**.

We will not dwell on this topic. To actually learn to use any of these methods you must practice them perhaps a dozen times each. This is very valuable experience for the substitution rule, something like solving puzzles. It can be fun but at first it seems like drudgery.

We do note that, by an appropriate magical substitution, you can turn any rational function of sines and cosines into a rational function, which you can actually integrate, with enough effort. In the old days tables of integrable functions with their integrals were very useful. Such things are now readily available on the web.

**Is there anything we cannot integrate?**

Yes definitely. The integrands \(\frac{e^{-x}}{x}\) and \(\exp(-x^2)\) are examples, for which there is no solution that can be expressed in terms of combinations obtained from the identity function., the sine and the exponent. Actually modern spreadsheets often include their integrals as non-standard but spreadsheet available functions.

(The Error Function, or erf for example,

\[erf(x) = \frac{2}{\sqrt{\pi}} \int_0^x e^{-t^2}dt\]

is a function available in Excel 7 among its Engineering Formulae, and it can be entered as part of the content of any location in the spreadsheet.)

Nowadays, you can consult any of a number of available programs, such as Maple, Mathematica, and Matlab and they will give you formal solutions to any doable integrals, and solutions to arbitrary accuracy for those that cannot be integrated exactly in terms of the functions we have defined.

We now turn to the question: how feasible is it to integrate, that is, to determine areas under curves, numerically? There was a tiny discussion of this in Chapter 2.

If you want to evaluate a particular integral, you can do it with amazing ease on a spreadsheet. It should take no more than ten minutes to set up an integrating spreadsheet, and once you have one, you can apply it to a new integrand in under a minute. All you need do is enter your integrand once, copy it into a rectangle, and enter your limits of integration. Since it is so convenient to do this, you would be wise to check any integral you evaluate numerically. If your formal answer and the numerical answer agree, you are definitely right, so long as you got the question you are answering right.

**So how do we do it?**

Here is the idea. We create a spreadsheet, in one column of which we start with the lower limit of integration and increase values of the argument by some amount, \(d\), per row. This can be arranged by one or two entries and copying down. You can call this the argument column.

In the next column, we evaluate the integrand at each argument. This can be done by evaluating it at the first entry, and copying down. This can be called the value column for your integrand.

In the final column we sum the entries in the previous column, each multiplied by \(d\). This again requires one entry and copying down. And that is essentially it. This is the integral column. Its entry is the integral from the beginning point to the next entry in the argument column, using the "left hand rule".

**Come again?**

Before fleshing this out in detail, we will digress to discuss,"rules" to use for the numerical integration.

The goal of any integration scheme is to estimate the area in each interval of given width say \(w\), accurately. There is no problem in doing this if the integrand is essentially constant in that interval, but if it is not, we need a plan for doing the estimation. Any such plan is called a **rule** for numerical integration.

**Here are the simplest rules, starting from the least sensible ones.**

1. Estimate the height of the interval by the value of the integrand at the interval's **leftmost** point. This is called the **left hand rule**.

2. Estimate the height of the interval by the value of the integrand at its **rightmost** point. This is the **right hand rule**.

3. Estimate the height of the interval by **the average of the previous two.** This is called the **trapezoid rule**.

4. Estimate the height of the interval by the value of the integrand smack in the **middle of it.** This has the disadvantage that you need to find it in the middle of the interval rather than at an end. It is sometimes called the **midpoint rule**.

5. Choose the combination of the previous two **that is exactly satisfied by quadratic functions**. This is called **Simpson's Rule**.

**Enough! Are there more rules?**

Yes, you can do even better.

**Better? How well do these rules do?**

Well, the first two rules have errors in them that decline linearly with \(d\). So if you divide \(d\) by two, the error decreases by a factor of \(2\) as well.

The next two have errors that are quadratic in \(d\); this means that they decline by a factor of \(4\) when \(d\) decreases by a factor of \(2\).

Simpson's rule has an error which is quartic in \(d\); it declines by a factor of \(16\) as \(d\) decreases by a factor of \(2\); and you can achieve a decline by a factor of \(64\) if you want, or even more.

The trapezoid rule uses as height of each interval half the value at each end. This gives a weight of \(\frac{1}{2}\) to the endpoints of integration, and \(\frac{2}{2}\) to each intermediate point, (\(\frac{1}{2}\) from the interval on each side of it).

Simpsons rule amounts to doubling the contribution from the odd numbered points but then using \(3\) as the denominator instead of \(2\); so the first and last points (the last being necessarily even) get weight \(\frac{1}{3}\), the odd ones get weight \(\frac{4}{3}\) and the other even ones get weight \(\frac{2}{3}\).

**And are these rules hard to apply?**

No, the first three are very easy, and you can get Simpson's by a clever trick from the third. With another similar trick you can get the super Simpson rule with a factor \(64\) error decline for each decrease of \(d\) to \(\frac{d}{2}\).

**So how accurate can you get with such integrations?**

For most integrands, over finite intervals you should be able to get ten place accuracy, if you want it, which is far more than any problem you encounter will need.

**OK, you got me curious. Why does the trapezoid rule do better than the first two? And why is Simpson's Rule still better?**

Suppose your integrand is \(f(x)\) and \(f\) has a power series expansion about the point \(q\) whose coefficients do not go wild. We can then write, for \(x\) near \(q\)

\[f(x) = f(q) + (x-q)f'(q) + (x-q)^2\frac{f^{(2)}(q)}{2} + (x-q)^3\frac{f^{(3)}(q)}{3!} + (x-q)^4\frac{f^{(4)}(q)}{4!} + \ldots\]

where \(f^{(k)}(q)\) is the \(k^{th}\) derivative of \(f(x)\) evaluated at \(x = q\).

We want to find the area under \(f(x)\) when \(x\) ranges from \(q-d\) to \(q+d\), an interval of length \(2d\).

Notice that if we form \(f(q+d) + f(q-d)\) the contributions of \(f'(q)\) which is the second term on the right here, will cancel out. In fact all the terms involving odd derivatives will cancel out:

\[f(q+d) + f(q-d) = 2f(q) + 2d^2\frac{f^{(2)}(q)}{2!} + \text{terms coming from higher derivatives of } f\]

We can conclude, by integrating both sides of this equation:

\[\int_{q-d}^{q+d} f(x)dx = \int_{-d}^{d} dxf(q) + \int_{-d}^d y^2dy \frac{f^{(2)}(q)}{2!} + \text{terms coming from higher derivatives of }f\]

The first term on the right is \(2df(q)\) and the next is \(2d^3\frac{f^{(2)}(q)}{3!}\), and the remaining terms are proportional to higher powers of \(d\).

The first term of \(2df(q)\), by the previous equation differs from \(d(f(q+d) + f(q-d))\) by \(-d^3f^{(2)}(q)\) and terms of higher power in \(d\).

The upshot of this argument is that approximating the integral of \(f\) in the interval within \(d\) of \(q\) by \(d(f(q+d)+f(q-d))\) produces an error proportional to \(d^3\) as well as terms of higher odd powers of \(d\) and no error linear in \(d\) or quadratic in \(d\). In fact, the error term cubic in \(d\) is \(\frac{2}{3}d^3f^{(2)}(q)\).

If we decrease \(d\) by a factor of \(2\), \(f(q)d\) decreases by a factor of \(2\), but this is compensated for by the fact that there are now twice as many intervals, each of half the size of those present before the decrease. Thus, the contributions of the first terms to the integral as a whole will usually not change much. (The only changes will come from changes in the evaluations of terms like \(f(d+q)\) when \(d\) changes.) On the other hand, the contributions of the cubic error terms will decrease by roughly a factor of \(8\) and, since there will be twice as many intervals, their contribution to the overall error will decrease by roughly a factor of \(4\).

**So?**

**This means that if we take four times the estimate after the split and subtract the estimate before the split we will (almost) eliminate the first error term, and the resulting next error term will decrease on splitting by a factor of \(16\). We get estimates \(4-1\) or \(3\) times so we divide by \(3\) to get back to an estimate of the integral.**

So here is our plan:

First we calculate the integral from the starting point to any later point using the left hand rule. This is very easy to do, as we shall see.

**How?**

**We can do this by creating two columns: an \(x\) column and an integral column.**

**The \(x\) column (say column A) starts, (say in A5) with the lower end of the integral. Then for \(k\) larger than \(5\) we set Ak to A(k-1) + d.**

**In the left rule integral column we set Ck to C(k-1) + d*f(Bk).**

**The left hand rule integral from the content of A5 to that of Ak will be C(k-1).**

Actually, when you want to change the function you are integrating, you are better off if all evaluations of the function are in one column, Thus you might want to set Bk to d*f(Ak), and Ck to C(k-1) + Bk.

Next we convert the left hand rule integral in column C to the trapezoid rule integral in column D.

**How?**

In D we add to Ck a term which gets rid of half of A5*d and half of d*Ak. Explicitly, we set Dk to Ck - (A$5+Ak)*d/2 .

The Trapezoid rule integral from B5 to Bk now appears in Dk.

Next we convert this to Simpson’s Rule in column E.

**How?**

**Lets call the Trapezoid rule result with interval size \(d\) with endpoint \(\text{start} + kd,\) \(T(d,k).\)**

**What we want to do is to repeat the Trapezoid calculation for interval size \(2d\) and then form \(\frac{4T(d,k)-T(2d,k)}{3}.\)** The result will have error that behaves as \(d^4\). This estimate is called Simpson's Rule.

Why do we compute Simpson's rule in this way? Because it is easy to apply the left hand rule, easy to get the Trapezoid rule results in a column from it, and easy to double the interval size in another column. Having done so, its easy to form the new rule from the old ones on a spreadsheet according to the expression in the last paragraph above, in a third column.

It is easiest to accomplish this when \(T(2d,k)\) is in the same row as \(T(d,k)\). Recall that the contribution to the left hand rule at row \(k\) is \(df(k)\) which is added to the result in the row corresponding to \(k-1\). For \(2d\), this contribution is doubled and added to the result for the previous doubled \(d\) row which is \(2\) above it.

The correction to make the left hand rule into the trapezoid rule for the integral from start \(s\) to \(s+kd\) is to subtract half of the first and last contribution from the partial sum that includes the initial value and the sum up \(s+kd\).

It turns out that there is not much difficulty in putting \(T(2d,2k)\) in the same line as \(T(d,2k)\) which makes it easy to form \(\frac{4T(d,k)-T(d,2k)}{3}\).

**Well, what do you actually do?**

Put your interval size \(d\) in some box, say A1.

In column A, put your start point in say A5, and from A6 at each step increase the value by \(d\). Thus in A6, you can put =A5+A$1, and A6 can be copied down column A as far as you want.

In column B, put the value of the integrand times \(d\) at the corresponding argument: in B5 put =A$1*f(A5), and copy this down column B.

In column C, put the partial sums of column B: which means, in C5 put =C4+B5, and copy down column C.

In column D, put in D5: =-(B5+B$5)/2 and copy this down column D.

The Trapezoid answer from your start (which is in B5) to the value in Bk will be Ck+Dk which you can put in box Ek, by setting E5 to =C5+D5 and copying this down column E.

In F, set F5 =2*B5+F3 and copy down F (This will put the left hand \(2d\) rule results in the odd numbered rows of F beyond row 5. The even numbered rows contain useless junk.)

In G5, set =2*D5 +F5 and copy down G, which will give the integral for interval \(2d\) trapezoid result in the odd numbered entries of column G.

In H5, set =(4*E5-G5)/3 and copy down. Simpson's rule for the integral from the content of B5 to that of B(2k+1) will then appear in H(2k+1), for \(k\) least \(3\).

Next we do an explicit example for the function \(x\sin(x)\).

Preliminaries:

Set A1 to Integrate f(x), B1 to f(x) = xsin(x)

Set A2 to d, B2 to 0.01

Set A3 to startpoint, B3 to 1

Create Columns: Set A5 to =B3, A6 to =A5+B$2 and copy A6 down column A.

Set the \(5^{th}\) row of columns B through H as follows:

In B5, enter =B$2*A5*sin(A5); in C5, =C4+B5; in D5, =-(B5+B$5)/2; in E5, =C5+D5; in F5, =2*B5+F3; in G5, =2*D5+F5; in H5, =(4*E5-G5)/3. Copy all these down their columns.

The entries in H(5+2j) etc give Simpson’s rule, integrating from the value in A5 to that in A(5+2j).

Column A contains the variable, B contains \(f(x)d\) which is the integrand times the width of the interval, C contains its partial sums, D contains the correction to make this the trapezoid rule which is in E, F jumps by \(2\) in taking sums and doubling the width which corresponds to doubling \(d\), G corrects the endpoints to create the trapezoid rule for \(2d\) for appropriate intermediate endpoints and H creates Simpson’s rule from the \(d\) and \(2d\) trapezoid rule answers.

Having done this you can change \(d\) and the starting point by changing the content of A1 and B1. To change the integrand you need only change column B.

You should test your answer with an integral whose value you know to find any bugs in your spreadsheet. You can try doubling \(d\) to see whether that changes your answer much. If not you have computed your integral quite well.

**Does this always work?**

No. You obviously cannot do this if you want to integrate to infinity. You can also run into trouble if your integrand goes to infinity at some intermediate point. Or if it wobbles insanely.

You may be able to subtract something from it that you know about and has the same singular behavior, and then be able to handle the rest.

**Exercise: Try to find a function for which this procedure will fail (one that does not blow up).** Things like square roots near \(0\) might be such.

If you add columns that jump by \(4\) and do a similar thing with them, you can get the Trapezoid rule for \(d\) replaced by \(4d\). With that you can get two Simpson rule calculations for \(d\) and \(2d\). Taking \(16\) times the first less the second, and dividing by \(15\) you will get a super Simpson rule, that improves by a factor of \(64\) when \(d\) decreases by a factor of \(2\).

**Here are the results for the integral of \(x\sin x\) from \(x=1\) to \(2\)**

Number of increments

25

Number of digits after decimal point

10

The exact answer, given in column I, is \(\sin x - x \cos x\) (differentiate and see).

To get column I, just enter, in I5, =SIN(A5)-A5*COS(A5)-SIN(A$5)+A$5*COS(A$5) and copy down.

Simpsons rule here is accurate to about \(10\) significant figures. The value in red is our Simpson’s Rule answer. The value to its immediate right is the computer evaluation of \(\sin x - x \cos x\) which is the value of this integral.

Notice that you can switch to integrating some other integrand merely by changing column B. This involves a new entry in B5 and copying it down that column.

The starting point (called the lower limit of integration) can be changed by changing B3.

In the computation above, \(10\) place accuracy occurs for \(d = 0.001\).

In this Chapter we discuss determinants, whose magnitudes describe areas and volumes and etc.

15.2 Representing Parallel Sided Figures

15.3 Properties of Determinants

15.5 The Alice in Wonderland Method for Evaluating Determinants on a Spreadsheet

Area, like distance, and volume in customary language are quantities that are always positive. However, we will find it useful to give signs to them.

Thus if you are driving a car, and another car is \(2\) car lengths ahead of you, you might assign a positive distance to the distance between your car and it, and if it is behind you, we can assign a negative distance to the same.

The same sort of thing can be done with area and volume. If you have an x-axis, you can assign positive area to area above it, and negative area to that below it. There are other ways to give signs to areas and volumes, as we shall see.

**Why would you want to do this?**

If you plot the distance between you and an oncoming vehicle, when you are standing still, this distance will decrease as it approaches, and then increase again, after it goes past you. Thus the plot of its distance will look like a V. If we use signed distance, and the vehicle is moving at a uniform speed, the distance from where you were before you dived out of the way will be a straight line. After it passes you its distance becomes negative. Straight lines are so much easier to deal with than V-like curves (since they have linearity properties) that we prefer to deal with them and that is why we introduce these signs. For many purposes the sign is irrelevant.

The area of a rectangle, as I hope you remember, is the product of the lengths of its sides, if we ignore signs, which we normally do. This is the basic fact we start from.

Similarly, the volume of a cube is the cube of its side length. The analogue of a rectangle in three dimensions is called a "rectangular parallelepiped" and its volume is the product of the lengths of its three sides. And you can imagine similar statements in more dimensions.

We will now discuss the areas of tilted parallelograms, and the volumes of general parallelepipeds, which are three dimensional six sided figures whose opposite sides are parallel to one another.

**Why?**

You will soon see why. Be patient and you might learn something you do not now know.

The first thing we need is a way to describe parallel sided figures.

And here is one way. Imagine we have x and y coordinates in the plane, and we put one corner of our figure at the origin, by which we mean the point \((0, 0)\), for which \(x = 0, \, y = 0\).

Then suppose that the "corners" of the parallelogram that are at the other end of edges that contain the origin, are located at points \((a, b)\) and \((c, d)\). The last corner will be at \((a + c, b + d)\) because the sides are parallel. **(Choose values for \(a, b, c\) and \(d\) and draw yourself a figure to verify this statement.)**

Then, one way to describe the parallelogram is to give a square array consisting of the numbers \(a, b, c\), and \(d\), arranged as follows:

\[ \begin{pmatrix} a & b \\ c & d \end{pmatrix} \]

**(The rows here are the vectors defined by the edges of the parallelogram that meet the origin.)**

For example, the parallelogram with corners \((0, 0), (1, 2), (0, 1)\) and \((1, 3)\) can be represented by the array

\[ \begin{pmatrix} 1 & 2 \\ 0 & 1 \end{pmatrix} \]

And in three dimensions, we can describe a parallelepiped with one corner at the origin, \((0, 0, 0)\), by putting the coordinates of the three corners that share edges with the origin as the three rows of a \(3\) by \(3\) array.

And in one dimension we can represent a line segment, which starts at the origin and goes to a point \(x\), by the single entry, \(x\).

Thus, the three matrices

\[ \begin{pmatrix} 5 \end{pmatrix}, \begin{pmatrix} 1 & 2 \\ 0 & 1 \end{pmatrix}, \begin{pmatrix} 1 & 2 & 3 \\ 2 & 1 & 8 \\ 1 & 4 & 1 \end{pmatrix} \]

can represent a line segment, a parallelogram and a parallelepiped respectively.

We give a uniform name to the (signed) **length** of the line segment, the area of the parallelogram and the **volume** of the parallelepiped, all with some appropriate sign.

Each is called the **determinant** of the given array. And we can define the determinant just as well with similar meaning for larger square arrays of numbers.

The determinant of an array is represented sometimes by putting parallel lines on either side of the array, or by writing det({array}).

**Ok, you have defined these signed areas and so on to be determinants but what good is that?**

All these quantities in all dimensions have some wonderful properties, which we can convert into properties of determinants, and we will be able to use them to calculate all of these. Not only that, we can calculate them, in any dimension, on a spreadsheet with only one non-trivial instruction, and some copying.

(On the Excel spreadsheet, there is a command, called mdeterm(), whose argument is an array, and which computes determinants, hence areas and volumes and so on. We can do the same without using this command.)

**Exercises:**

**15.1 Represent the parallelepiped with corners at \((0, 0, 0)\) and adjacent corners at \((1, 2, 3), (1, 0, 1)\) and \((0, 1, 2)\) in two different ways.**

**15.2 Can you figure out the volume of this parallelepiped?**

**15.3 What is the relation of the area of a triangle to the area of a parallelogram having two of the sides of the triangle as sides of it?**

**15.4 If one corner of the parallelepiped described by the \(3\) by \(3\) array above is at the origin, what is the location of the "opposite" corner. (Figure out yourself what opposite means here.).**

**What are these "wonderful" properties?**

The first property, which we deduce from the definition of determinant and what we already know about areas and volumes, is the value of the determinant of an array with **all** its non-zero entries on the main diagonal. Such an array describes a figure which is a rectangle or rectangular parallelepiped, with sides that are parallel to the \(x\) and \(y\) and \(z\) and whatever axes. We already know that the magnitude of this determinant must be the product of its diagonal entries. The sign we define to be that of this product.

\[ \begin{pmatrix} 5 \end{pmatrix}, \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}, \begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1 \end{pmatrix} \]

Thus the determinants of the three arrays above are \(5\), \(-1\) and \(2\), respectively.

**This is wonderful?**

No, not yet. We really want to be able to evaluated more general determinants. The sign of a determinant with given rows **depends on the order in which you choose to list the rows which represent the edges of the figure, as we shall see.**

We are interested in the area of parallelograms that are tilted, so that sides are not perpendicular to one another, or that are rotated, so that the sides are not parallel to axes.

And here is the wonderful fact: If you fix the base of a parallelogram, (one side of it,) then its area is the height of the top of parallelogram above that base multiplied by the length of the base. It does not matter how much the parallelogram tilts, it is only the distance perpendicular between the top and bottom that counts.

A similar property holds in any dimension: The size of the n dimensional figure is the size of its \(n-1\) dimensional base, times the height of the top of the figure perpendicular to its base.

This tells us: **we can add any multiple of one row of the array to any other row, without changing its determinant.** This is so because in any dimension we can choose **any face containing the origin** to be a base, **and all but one of the lines from the origin to its neighbors, which define the rows of the array, lie in that base**. Changing the line not in the face by any vector in the face will not change the height of the figure; it can only change the way the figure tilts.

(By the way this suggests the usual way to compute determinants. We add multiples of rows to other rows to get rid of all the tilting so that the determinant is then the product of its diagonal elements. This is called row reduction.)

Another wonderful fact that follows from the first two is: the determinant is **linear in any of the rows (or columns) of its array.** This means that if you multiply some row by \(7\) the value of the determinant goes up by a factor of \(7\). It also means that if you take two arrays that differ only in some **one row**, like the following two, which differ only in their first rows:

\[ \begin{pmatrix} 1 & 2 \\ 5 & 7 \end{pmatrix}, \begin{pmatrix} 2 & 2 \\ 5 & 7 \end{pmatrix} \]

then the determinant of the array gotten by summing in the row that differs and keeping the others the same, (getting here \(3 \enspace 4\) for the first row and \(5 \enspace 7\) for the second) is the sum of the determinants of the two arrays you started with.

This statement represents the fact that the height of the summed figure above the base is the sum of the heights of the two summand figures.

**Exercise 17.5 Show, by adding rows to one another appropriately, that interchanging two rows of an array changes the sign of its determinant. (Hint add a row to another, subtract the other way and add back the first way; or something like that)**

**And what good is all this?**

Well, from the first two facts alone, we can compute the value of any determinant, and hence the area or volume or whatever of any parallel sided figure.

**How?**

Well, we can add multiples of rows to one another to get rid of off diagonal elements. When we are done, we can deduce the value of the determinant to be the product of the diagonal elements.

Actually, we need only make the elements on one side of the diagonal into \(0\), and take the product of the diagonal elements. Getting rid of the others is sometimes a nice thing to do, but will not affect the diagonal elements at all.

Let's evaluate the determinants of the following arrays:

\[ \begin{pmatrix} 1 & 2 \\ 5 & 7 \end{pmatrix}, \begin{pmatrix} 2 & 2 \\ 5 & 7 \end{pmatrix} \]

If we subtract \(5\) times the first row from the second in the first matrix we get \((0, -3)\) for the second row, so the determinant is \(-3\). In the second array we subtract \(\frac{5}{2}\) times the first row from the second and get \((0, 2)\) as new second row. The determinant of the second matrix is therefore \(2 \times 2\) or \(4\).

This tells us, by linearity that the determinant of the sum of these two matrices,

\[ \begin{pmatrix} 3 & 4 \\ 5 & 7 \end{pmatrix} \]

is \(-3 + 4\) or \(1\). We can verify this by subtracting \(\frac{5}{3}\) of the first row from the second.row, turning that second row into \((0, \frac{1}{3})\), and the product of the diagonal elements is \(1\). You can apply linearity of the determinant in this way when two arrays have the same base, and differ only in the non-base row.

This procedure for evaluating determinants (which is sometimes called "row reduction" and sometimes called "Gaussian elimination") used on the two matrices can be applied to square arrays of any size. It is easy to do for \(2\) by \(2\) arrays, but it is quite easy to make a mistake even for such. It is still reasonably easy for \(3\) by \(3\)'s but most people will make some silly mistake along the way since the steps involved in doing it are so boring, and they will get it wrong most of the time. Even you and I can expect to get \(4\) by \(4\) determinants wrong most of the time when doing it by hand by this approach, again because the steps are so straightforward and uninteresting. Your mind will stray along the way and you stand an excellent chance of screwing up.

**Is this the only way to evaluate a determinant?**

No, there are at least two other ways, one of which is equally boring and prone to your making errors. The other is magical and great fun, but surprisingly it is never taught, and few have ever heard of it.

One standard approach is to write a formula for the results of the method just described. If you start with rows \((a, b)\) and \((c, d)\). To turn the \(c\) into \(0\) you subtract \(\frac{c}{a}\) times the first row from the second. The resulting diagonal elements are then \(a\) and \(d-\frac{c}{a}\) and their product is **\(ad - bc\)**. This is the formula for the determinant of a general two by two array. A standard way to calculate three by three determinants is to take the product of entries on the three downward sloping diagonals, and subtracting from their sum the sum of the products of entries on each of the three upward sloping diagonals.

**Exercise 15.6 Evaluate the following determinants by any method above.**

\[ \begin{pmatrix} 5 \end{pmatrix}, \begin{pmatrix} 1 & 2 \\ 0 & 1 \end{pmatrix}, \begin{pmatrix} 1 & 2 & 3 \\ 2 & 1 & 8 \\ 1 & 4 & 1 \end{pmatrix} \]

**So what is the magical approach?**

As you may know, Lewis Carroll, the author of Alice in Wonderland, was a mathematician, and our method uses his celebrated theorem on determinants. That theorem goes as follows:

Suppose we have a square array \(A\) whose top and bottom rows are called \(T\) and \(B\), and whose left and right columns are \(L\) and \(R\). We define the following additional arrays. \(A_{TL}, A_{TR}, A_{BL}, A_{BR},\) and \(A_{TBLR}\), to be the arrays gotten by removing row \(T\) at the top of \(A\) and column \(L\) to its left, row \(T\) at the top and column \(R\) on the right, row \(B\) on the bottom and column \(L\) on the left, \(B\) on the bottom and \(R\) on the right, and finally, \(T\) at the top, \(B\) at the bottom, \(L\) on the left and \(R\) on the right.

If \(A\) is an \(n+2\) by \(n+2\) array then the next four are \(n+1\) by \(n+1\) array and the last is an \(n\) by \(n\) array.

Then the following equation holds:

\[Det(A_{TBLR}) \times Det(A) = Det(A_{TL}) \times Det(A_{BR}) - Det(A_{TR}) \times Det(A_{BL})\]

**What this means is that the determinant of the array \(A\) can be written as products of determinants of arrays one size smaller divided by the determinant of \(n\) array two sizes smaller \((A)\).**

**And what good is all this?**

We define the determinant of any \(0\) by \(0\) array to be \(1\), and a one by one array, which consists of a number, is its own determinant. Our two by two block consists of four one by one arrays, (\(a, b, c\) and \(d\)) in each of its corners \( \begin{pmatrix} a & b \\ c & d \end{pmatrix} \)

Applying this theorem to this case, \(A_{TL}\) is \(d\), \(A_{TR}\) is \(c\), \(A_{BR}\) is \(a\) and \(A_{BL}\) is \(b\). \(A\) is nothing at all, which we define to have determinant \(1\).

So Lewis Carroll’s theorem for \(2\) by \(2\) arrays says that the determinant of **A, which is \(ad-bc\)**, can be written as the same thing divided by 1. This is something we have already observed.

Now suppose we started with a three by three array:

\[ \begin{pmatrix} a & b & e \\ c & d & f \\ g & h & i \\ \end{pmatrix} \]

This array has four two by two sub-arrays with adjacent rows and columns, one meeting each corner and these are \(A_{TL}\), \(A_{TR}\), \(A_{BR}\) and \(A_{BL}\). The theorem tells us that the determinant of the array is given by the product of the determinants of the top left and bottom right of these arrays, minus the product of the determinants of the top right and bottom left of them, all divided by the middle element which is d. Here are, starting from the top left and going to clockwise around are \(A_{BR}\), \(A_{BL}\), \(A_{TL}\), and \(A_{TR}\).

\[ \begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} b & e \\ d & f \end{pmatrix} \] \[ \begin{pmatrix} c & d \\ g & h \end{pmatrix} \begin{pmatrix} d & f \\ h & i \end{pmatrix} \]

Computing each of the determinants of these as done above for the entries in its upper left upper corner, gives us a two by two array whose elements are the two by two determinants of these four arrays. The first great thing here is that all four can be computed merely by copying the computation for the first one, one to the right and then both down.

In this case \(A\) is the middle element of the original array, \(d\) while \(A_{TL}\), \(A_{TR}\), \(A_{BR}\) and \(A_{BL}\) are the four \(2\) by \(2\) arrays immediately above, in clockwise order starting with the one on the upper left.

If we write these four resulting \(2\) by \(2\) determinants as

\[ \begin{vmatrix} d_{11} & d_{12} \\ d_{21} & d_{22} \end{vmatrix} \]

We can compute the \(3\) by \(3\) determinant by Lewis Carroll’s method to be \(\frac{d_{11}d_{22}-d_{12}d_{21}}{d}\).

What is even better, we can set things up so that we can compute the two by two determinant of this array divided by \(d\), using the same instruction used to compute these \(d_{ij}\)’s only copied one place to the right and down four places. By Lewis Carroll’s Theorem the result is the three by three determinant of our original three by three array.

What is even better than that, if we had started with a larger initial square array, say \(n\) by \(n\), copying appropriately first gives us rise to an \(n-1\) by \(n-1\) array of \(2\) by \(2\) determinants, more copying produces from this an \(n-2\) by \(n-2\) array of \(3\) by \(3\) determinants, and so on until we get a single \(1\) by \(1\) array whose entry is the \(n\) by \(n\) determinant of the original array. (We also produce rows of junk between these as well)

We will soon see what the one instruction you have to enter is, and how this can be set up.

There are two other good things about this. First, once it is set up for any size, you can change your initial array any way you want, and the spreadsheet will immediately give you the determinant of the new array.

Second, using Cramer’s rule, you can use the same approach to instantly solve systems of \(n\) by \(n\) linear equations, again with the ability to change the equations arbitrarily and getting the solutions immediately. (We will discuss this later, perhaps.)

I am not really sure how large you can make \(n\) and have this actually work. But \(n=5\) or \(n=6\) are definitely OK.

There is one problem however. This process involves a division. To handle three by three arrays we divided by the middle element of the array, which was \(d\) above. To handle four by fours, along the way we divide by each of the four middle elements of the original array, and then by the determinant of the array that these four elements form, any one of which might be \(0\). And dividing by \(0\) is a no no. This problem is easily handled as we shall see.

**Ok, how can you do all this?**

We will illustrate this by setting up a determinant finder for four by four arrays.

Suppose we have a four by four array whose determinant you want, we locate it in positions A6 B6 C6 D6 to A9 B9 C9 D9.

Then put \(1\) in position B3 and fill down from B3 to B5 and across to D3 to D5, putting \(1\)'s in all these places. These lines are here to represent the determinants of each trivial \(0\) by \(0\) array, which we will divide by.

Now here is the key step. **Put =(A6*B7-A7*B6)/B3 into A10.** Copy this into columns B and C, the result in C will be =(C6*D7-C7*D6)/D3. And then copy row 10 down to row 18.

**What do you get?**

A10 B10 C10 contain the first row of the array of \(2\) by \(2\) determinants of the original array.

A11 B11 C11 contain the second row of same.

A12 B12 C12 contain the third row of same.

Row 13 is junk.

A14 B14 contain the first row of the \(2\) by \(2\) array of \(3\) by \(3\) determinants of the original array.

A15 B15 contain the second row of same (the rest of rows 14 and 15 are junk).

Rows 16 and 17 are both junk.

**A18 contains the four by four determinant of the original array, unless along the way you divided by 0 (all other entries in that row are junk).**

**If array entries A22 or B22 or A23 or B23 are \(0\), or their two by two matrix is \(0\), this spreadsheet will divide by \(0\), and you will get no evaluation. To fix this just replace the original entries that are \(0\) by numbers that are tiny and positive, (like \(10^{-10}\) when the entries are integers). If you also see to it that A22*B23 - B22*A23 is not \(0\), A18 will be the determinant of your \(4\) by \(4\) array.**

**Once the four central array entries are not zero, if you have A22*B23 - B22*A23 = 0, adding almost anything even smaller to any one of these four entries will make that combination non-zero and so A18 will be the determinant. Zeroes can be removed easily in a manner that cannot be seen in the answer, unless the determinant is actually \(0\).**

**Caution: if there is any dividing by \(0\) make all changes to eliminate same by changing the original array. Never make changes at later stages of the computation!**

Here is what the spreadsheet output might look like. Entries on lines 6 to 9 can be modified and the determinant in A18 will be automatically calculated. If a non-number is used, the border of the cell will turn red and calculation will be aborted. Blue entries have to be non-zero.

A

B

C

D

1

Determinant calculation

2

3

1

1

1

4

1

1

1

5

1

1

1

6

7

8

9

10

11

12

13

14

15

16

17

18

**Exercises:**

**15.7 Set the same thing up for evaluating \(5\) by \(5\) determinants. (The differences are that the array of \(1\)’s at the top has to be \(4\) by \(4\) and your input data will be \(5\) by \(5\). You divide by the \(1\) at the top left for the upper left \(2\) by \(2\) determinant, and copy that into the first \(4\) columns. If the top left \(1\) is in B3, where will the \(5\) by \(5\) determinant be?**

Cramer’s rule gives a formula for the solutions for the variables in a set of linear equations. It says that the determinant of the arrays gotten by substituting the constants in the equations for the column containing the coefficients of variable \(x\) in the equations, divided by the determinant of the original array of coefficients is the solution for \(x\).

Each of these determinants can be computed along with the determinant by this method, if you add the column of constants after the array of coefficients, and copy the array of coefficients after them. The arrays starting in the second column will be those involved in the numerators of Cramer’s rule, except for sign. This means that the solution to the linear equations will be appropriate ratios of the entries in the row containing the determinant, with appropriate signs. (The rows of \(1\)’s must be extended to the right and you must copy further to the right, as well.)

**What do I do if this process causes a division by \(0\)?**

This will happen only if A22 or A33 or A23 or A32 are \(0\) or of the determinant of the array with rows (A22 A23) and (A32 A33) is \(0\). You can check for this before you start and if any of these things occur add something tiny like \(10^{-10}\) to each \(0\); if none are \(0\) add \(10^{-10}\) to any one of them; if all are \(0\)’s add \(10^{-10}\) to three of them and \(-10^{-10}\) to the other. If one row or column of this \(2\) by \(2\) array is \(0 \, 0\) and the other row or column entries are identical, then add to one \(0\) and subtract the same thing from the other. This will always cause the procedure to avoid dividing by \(0\).

In this Chapter we discuss the concept of the limit of an infinite sequence, and other things. These are here to whet your appetite for these subjects, so read them for fun only.

Mathematicians have, especially since the 19^{th} Century, wanted to make the subject of calculus rigorous, which means completely logically defined. We have called a function differentiable at a point if its graph "looks like a straight line" at that point. This is perhaps intuitive, but is certainly more intuitive than rigorous.

To introduce rigor we define the notion of the **limit** of a sequence.

An infinite sequence of numbers is said to **converge**, if for any criterion, (say a zillionth, whatever that means), beyond some point in the sequence, the difference between any two entries is less than that criterion.

The sequence \(\{\frac{1}{n}\}\) converges because beyond the zillionth term the difference between any two entries is less than a zillionth.

A sequence of numbers in a given set **converges to a limit \(x\)** if it converges and the difference between any entry and \(x\) is less than that criterion, beyond some point.

The sequence \(\{\frac{1}{n}\}\) converges to \(0\).

We say that a function \(f\) is **continuous** at argument \(A\) if **its values on any sequence of numbers that converge to \(A\), converge to its value at \(A\). We write this as \(\lim_{x \to A} f(x) = f(A)\)**.

We say that \(f\) is **differentiable at \(A\) if for any sequence of numbers \(\{d_j\}\) none of them \(0\) that converge to \(0\), the limit as \(d\) approaches \(0\) of \(\frac{f(A+d_j)-f(A)}{d_j}\) ie \(\lim_{d_j \to 0} \frac{f(A+d_j)-f(A)}{d_j}\) exists, and we call that limit the derivative of \(f\) at \(A\)**, which we have written as both \(f'(A)\) and \(\frac{df(A)}{dA}\).

There are a lot of wonderful concepts and definitions that can be made once limits are defined.

A set \(S\) of numbers is said to be **complete, if every convergent sequence of elements of \(S\) converges to a number in \(S\).**

The **limit points of a set \(S\) are those numbers that are limits of sequences of members of that set.**

A set is **closed** if it contains all its limit points.

Notice that \(0\), by definition is not a positive number, so that there are sequences of positive numbers that do not converge to a positive number, because they converge to \(0\). Thus the **positive numbers are not closed.**

Remember that **rational numbers are those which, beyond some point endlessly repeat some sequence of decimal digits.**

Consider a number which, after the decimal point, starts with \(1\) and has a sequence of zeroes and \(1\)'s that has exactly \(j\) consecutive zeroes after the \(j^{th}\) occurance of \(1\). This number does not repeat after some point, and so it is not rational. But it is the limit of the sequence gotten by replacing all its entries beyond the \(n^{th}\) by zeros (as \(n\) goes to \(+\infty\)), all of which are rational, each ending in repeating \(0\)'s.

Thus, **the rational numbers are not closed.**

A **boundary point of a set \(S\) of real numbers is one that is a limit point both of \(S\) and the set of real numbers not in \(S\).** Thus, if \(S\) is the interval of points between \(a\) and \(b\) including the endpoints \(a\) and \(b\), then \(a\) and \(b\) are its boundary points. This \(S\) is closed, because it contains all possible of its limit points.

An **open set** is one that **contains no boundary points.** The interval of points between \(a\) and \(b\) not including its endpoints is open. If an interval is defined to contain only one of its endpoints, it is neither open nor closed.

**Closed sets are complements of open sets.** Since closed sets contain all their boundary points, their complements, which contain all points of the space not in them, contain none of them.

**What infinite sequences do not converge?**

Non-convergence always occurs when a sequence of real numbers is unbounded: for example, the sequence

\(1,2,\dots n, \dots\), does not converge.

Also, bounded sequences that have more than one limit point do not converge. For example

\(1,0,1,0,1,0,\dots\) has limit points \(1\) and \(0\), and does not converge as the difference between consecutive terms is always \(1\), and never gets below \(1\).

**A set in which every sequence of its elements has at least one limit point inside it is said to be sequentially compact.** To be sequentially compact a set \(S\) must be closed, or else, by definition, there is a convergent sequence of its elements that does not converge to a member of \(S\). \(S\) must be bounded or else there is a sequence that grows indefinitely with no finite limit point. (For example, choose as \((n+1)^{st}\) member of the sequence the smallest of its elements that is at least one greater than its \(n^{th}\) member.)

On the other hand, **if a set \(S\) of real numbers is closed and bounded, every sequence of elements of \(S\) has at least one limit point in \(S\).**

This statement follows from two observations, which we will prove. **First, if \(S\) is bounded, a sequence of its elements whose members increase, so that the \((n+1)^{st}\) member is at least as big as the \(n^{th}\), must converge.**

If \(S\) is closed, an increasing sequence must converge to the least upper bound of its members, which will be an element of \(S\) and that will be a limit point. And the same is true for a decreasing sequence.

Second, **every infinite sequence must contain either an infinite subsequence that is increasing, or one that is decreasing or both.**

**Together, these statements mean that any infinite sequence of real numbers that are bounded and lie in a closed set, have a convergent sequence and hence a limit point.**

We prove the first: **The least upper bound of a set \(Q\) of numbers is the smallest number that is at least as big as every element of \(Q\).** If \(Q\) consists of the members of an increasing sequence, this least upper bound must be a limit point of the sequence. It certainly cannot be less than any member of the sequence. And if it is a zillionth greater than all members, it is not their least upper bound. This proves the first observation.

There is an elegant proof of the second observation, gotten by considering a finite sequence of numbers, say of length \(N\). We show that every such sequence has either an increasing or decreasing subsequence of length at least the square root of \(N\). Since the square root of an infinite number is still infinite, this result tells us that any infinite sequence must have an infinite increasing or decreasing subsequence either of which must have a limit point.

To show this, starting at the beginning of the sequence, keep track of the length of the longest increasing and longest decreasing sequence which ends at each member. The first such pair will be \((1,1)\), then \((2,1)\) or \((1,2)\) depending on whether the second member is bigger or smaller that the first (if it is the same as the first we would get \((2,2)\)).

The wonderful fact is that no two members can have the same pair of numbers. If say, some member, \(m\) has pair \((i,j)\) then any subsequent member, \(q\), will get at least pair \((i+1,j)\) or \((I,j+1)\) because there will be an increasing sequence gotten by adding \(q\) to the old increasing sequence ending at \(m\) if \(q\) is bigger than \(m\), and a decreasing sequence gotten by adding \(q\) similarly if \(q\) is less than \(m\).

Our claim follows from the fact that the number of distinct ordered pairs of positive integers both less than \(n\) is \((n-1)^2\). Another way to express this fact is that there must be a "monotone" sequence of length at least \(n\) among the first \((n-1)^2 + 1\) members of any sequence.

The two claims together tell us that **any bounded closed set of real numbers is sequentially compact.**

**Exercise: Similar results hold for sets of ordered pairs, (or ordered \(n\) tuples) of real numbers, which correspond to points in a two dimensional space (or \(n\) dimensional space). Generalize the definitions above to apply to such sets, and prove that any bounded closed set of such pairs is sequentially compact.**

A set of real numbers \(S\) is said to be **covered by a collection \(O\) of open sets**, when **every element of \(S\) is contained in at least one member of \(O\).** (The members of \(O\) can contain numbers outside of \(S\) as well as those in \(S\).)

\(S\) is said to **compact**, if, **for every covering \(O\) of \(S\) by open sets, \(S\) is covered by some finite set of members of \(O\).** A significant fact about a covering by open intervals is: **if a point \(x\) lies in an open set \(Q\) it lies in an open interval in \(Q\) and is a positive distance from the boundary points of that interval.**

We will now prove, just for fun, that **a bounded closed set of real numbers is compact.** The argument does not depend on how distance is defined between real numbers as long as it makes sense as a distance.

Open sets of real numbers are each unions of disjoint open intervals on the real line. We can consider a covering \(O\) of \(S\) by open sets to be a covering by their open intervals. Each open set in the covering containing the number \(x\) has an open interval in it containing \(x\). Thus a covering of \(S\) by open sets is actually a covering by open intervals as well.

Any interval in the covering of \(S\) by \(O\) that is contained in the union of other intervals in \(O\), is redundant in \(O\) and can be removed from it and the rest of \(O\) will still be a cover.

Every closed set of real numbers is a collection of disjoint closed intervals. For example, the collection \(S\) of intervals \([\frac{1}{2n+1}, \frac{1}{2n}]\) and \([\frac{-1}{2n}, \frac{-1}{2n+1}]\) for all positive \(n\) and the number \(0\), is a closed set. The minimum number in \(S\) is \(-\frac{1}{2}\) and the maximum is \(\frac{1}{2}\). Here the number \(0\) must be in it because it is a limit point of sequences of other of its numbers.

We can provide an explicit cover for \(S\) by the following infinite collection of open intervals:

\[\left\{\ \frac{1}{2n+1} - [\frac{1}{(2n+1)^2}, \frac{1}{2n} + [\frac{1}{(2n)^2} \right\}\] \[\left\{\ -\frac{1}{2n} - [\frac{1}{(2n)^2}, \frac{1}{2n+1} + [\frac{1}{(2n+1)^2} \right\}\] \[\text{and} \medspace \{-a,a\} \medspace \text{for some (small)} \medspace a.\]

To prove this claim we make use of several facts: first, *if a sequence of numbers in \(S\) is infinite, it must have at least one limit point \(s\) in \(S\) since \(S\) is closed and bounded.* Second, *if a number \(x\) is covered by an open set \(U\), then \(U\) contains numbers both smaller and larger than \(x\).* Finally, *an open set containing a limit point of a sequence of distinct numbers must contain an infinite number of those numbers.*

We can prove this result by actually constructing a finite set of open intervals that cover \(S\). To do so we set \(A_0\) to be the smallest number in \(S\), and let \(A_{j+1}\) be the smallest number in \(S\) greater than \(A_j\) that is in no member of \(O\) that contains \(A_j\).

\(A_{j+1}\) can be a lower boundary point of \(S\) (as will happen for all but one \(j\) encountered in the example above,) or it can be in the middle of or on the upper boundary of a closed interval of \(S\).

We will define an open interval \(O_j\) containing \(A_j\) inductively starting with the largest value of \(A_{j+1}\) as follows:

If \(A_{j+1}\) is the smallest number in its interval in \(S\) then let \(B_{j}\) be the largest number in \(S\) smaller than \(A_{j+1}\), and let \(O_J\) be any open interval in \(O\) containing both \(A_j\) and \(B_j\).

Otherwise, let \(B_j\) be any number in \(O_{j+1}\) that is smaller than (and not equal to) \(A_{j+1}\). By definition of \(A_{j+1}\) there is an open interval in \(O\) containing both \(A_j\) and \(B_j\), and let any such open interval be \(O_j\).

By construction, \(O_j\) contains only one of the \(A\)'s, namely \(A_j\). Thus it can contain no limit point of \(A\)'s by our third fact above. This means that the \(A\)’s, and \(O\)'s are finite in number. Also the \(O\)'s cover \(S\). Thus the \(O\)'s are a finite sizes covering of \(S\) by open intervals.

In the example above. the \(A\)'s are numbers of the form \(-\frac{1}{2n}\) from \(n=1\) until a value for \(n\) for which \(2n\) is approximately \(\frac{1}{a}\), with a similar number of positive \(A\)'s, for a total of roughly \(\frac{1}{a}\) \(A\)'s all together.

**Exercises:
1. Prove the claim above that the \(O\)'s defined cover \(S\).
2. Show that the finite set of open intervals chosen from the members of \(D\) by the construction above contains the fewest open intervals possible in a cover of \(S\) by open intervals.**

We provide the example and construction above to give you intuition about what this result means. The usual simple proof involves breaking any closed set \(S\) in half choosing a number in S, and repeating these actions on a half of \(S\) which requires an infinite number of members of \(O\) to be covered. At each stage the size of the new \(S\) is half of that of the old \(S\), and at least one of the halves must need an infinite number of members of \(O\) to be covered if \(S\) does. The sequence of numbers chosen must, if infinite, have a limit point \(x\), and with \((a,b)\) any open interval covering \(x\), \((a,b)\) will contain all points in our sequence whose interval has length less than the smaller of \(b-x\) and \(x-a\); and that will be larger than the length of our interval \(O_j\) at some stage, and all subsequent members of our sequence will be inside \((a,b)\) and covered by it. (This is exactly what happens at \(0\) in the example.) This means that the later \(S\)'s require only one member of \(O\) to be covered. And one is very finite. An advantage of this proof is that it works as well for an \(n\) dimensional space whose elements are \(n\)-tuples of real numbers, exactly as it works for real numbers. (This argument is repeated in detail below.)

We have been discussing these various concepts in the context of real numbers, but they can also be defined in many other contexts. The definition of limit requires a definition of distance, but given such a definition, the concepts of closed, open, sequentially compact, complete and compact are also defined. Sets of points in which a distance between any pair of them is defined is said to be metric.

The concepts open closed and compact mentioned here can also be defined when there is no metric, by specification of which subsets of the whole set are open.

In a metric space \(S\) for any distance d we define the d neighborhood of a point \(x\) to consist of all elements of \(S\) whose distance from \(x\) is strictly less than \(d\). Any open set \(O\) in \(S\) is a union of neighborhoods of its elements each intersected with \(O\).

Suppose a closed bounded subset \(C\) of a finite dimensional space S is covered by open sets. Then it is covered by neighborhoods contained in those sets as well. We will argue that it must be covered by a finite collection of those neighborhoods, hence by a finite number of those open sets.

If \(S\) is \(n\) dimensional , we can cut \(C\) into a finite size grid of n-cubes (by which we mean sets of points whose length in any direction is at most some constant, and if \(O\) is necessarily infinite the number of members of \(O\) that are needed to cover least one of the \(n\)-cubes must be infinite as well. We choose a point in any such \(n\)-cube, reduce attention to that \(n\)-cube and repeat these cut and choose steps. The resulting sequence of elements must converge to some point \(x\), whose cover must still require an infinite number of neighborhoods. But a single point can be covered by one neighborhood, which, by this argument tells us that an infinite number of neighborhoods never were required.

An argument like that proves that a closed and bounded set in \(S\) is compact for any finite dimensional space defined over the real numbers.

When there is no metric strange things can happen. Suppose we have the integers, or rational numbers or real numbers, (with no definition of distance among them) and the closed sets consist of all finite sets. That means the open sets are all sets of elements that lack only a finite number of elements.

In any such space of points and definition of open sets, all sets are compact!

Given any set \(Q\), and any cover of \(Q\) by open sets, and any open set \(O\) in that cover, \(O\) can miss only a closed set which means a finite set, say \(d\), of elements of \(Q\). These can be covered by at most \(d\) open sets in the cover of \(Q\), which means that there is a cover of \(Q\) by at most \(d+1\) open sets that are in the original cover.

Thus compact sets need not, in general, be closed or bounded with these definitions.

A definition of open sets in a set of points is called a topology.

The subject considered above, called point set topology, was studied extensively in the \(19^{th}\) century in an effort to make calculus rigorous. It contains many interesting results of which what is above is a tiny and random sample.

The concept of limit allows calculus to be discussed without reference to scale. That is, if we are interested in a function \(f(x)\), then we can change \(x\) to \(y\) with \(y=cx\), and the resulting function of \(y\) has the same differentiability properties as \(f\), only with a different derivative.

In the real world, we can use calculus in contexts in which this is not so. Here are some examples. To a scale appropriate to discussing the solar system, the earth has a smooth and differentiable surface, and is roughly spherical. To the scale of us poor mortals, this is quite untrue: there are mountains, tall buildings, holes in the ground, trees, eaves of buildings, and living creatures, and the surface defined by these at any given instant is not differentiable at all. The top of a kitchen table may look flat to us, and its surface is differentiable, but at an atomic scale it is full of holes and whatnot, and at a subatomic scale we have no idea what it looks like. When we store values of a function on a computer, when those values in actuality are irrational, what is stored differs by seemingly random amounts from the actual values, and differences between data points are meaningless at a scale of the size of those differences.

These facts do not take away from the value of the attempt to make calculus rigorous and scale independent. It would be awkward and annoying to us if we had to describe the scale at which functions are continuous or differentiable. It is fairer to say that the functions we define are differentiable at any scale but the models we use to describe reality agree with these functions only at appropriate scales, which phenomena we can look into when we choose to do so.

They also mean that we can use calculus on functions that represent quantities that "look like a straight line" to the scale we are concerned with, even when they do not do so at an infinitesimal scale. It also justifies our trying to draw conclusions from data by staying as far as we possibly can from the limit of small \(d\), something we did with numerical computations.

The Riemann integral with integrand \(f(x)\) over a given interval is defined as the limit as \(d\) approaches \(0\) of the sum over subintervals of size \(d\), of \(d\) multiplied by the value of \(f\) at an arbitrary point in that subinterval, when that limit exists and is the same for every choice of points in all the intervals. Otherwise the function is said to be not integrable over that interval.

A function that is \(1\) on all rational numbers and \(0\) on all other numbers is obviously not integrable in this sense, since every interval of any non-zero size contains both rational and irrational numbers and hence values for this function of both \(0\) and \(1\). There are many more irrational numbers than rational ones, which suggests the possibility that we could just forget about the rational ones and say that the integral is \(0\). On the other hand, if we perform a numerical computation on a computer, since the computer rounds off every point to a rational one, we would find the value \(1\) for every interval.

There is another way to define the integral of a function for which the function just described is integrable.

Instead of splitting the area computed by the integral by slices parallel to the \(y\) axis, we can split into pieces by slices parallel to the \(x\) axis.

Suppose, for convenience that the integrand function is non-negative and we integrate over a finite interval. Then for each slice we will find that for some points within the domain the slice is below the integrand, for some it is above it, and for some the integrand lies within the slice. As the size of the slices decreases, the contributions from the last of these becomes negligible and the integral will be the sum of the contributions for which the slices are below the integrand.

For a continuous integrand, for each slice the points below it will form some set of intervals on the real line. There are many kinds of integrals as we shall soon notice. In each we define a "measure" to each slice of the point set for which the slice is below the integrand, and the sum of these measures over all slices must converge to the integral.

What constitutes a measure? The main necessary condition is the sum of the measures of disjoint sets (these are sets having no common point) is the measure of their union (the union being the set of points that are in either one). This must apply to the union of any finite number of mutually disjoint sets: their union must have measure equal to the sum of their measures. Since any point in a countable list of points is in a finite initial segment of the countable list, it makes sense that the measure of a union of any countable number of sets must be the sum of their measures.

In the case of the usual integral \(\int f(x)dx\), the measure of an interval is its length. A single point is an interval of length \(0\), and has \(0\) measure.

This tells us that the measure of any countable set of points must be a countable sum of \(0\)'s and hence must be \(0\) as well with this measure (and measures like them.)

We can conclude that for the usual integral, that weird function that is \(0\) on rational numbers and \(1\) on the rest of the real numbers, is integrable. The rational numbers, being countable, have measure that is a countable sum of \(0\)'s and hence is \(0\). Thus the measure of the rest is the length of the interval integrated over, and this weird function is indeed integrable.

**What other measures are there?**

We have encountered some other measures already; if we deal with \(\int f(x)dy(x)\), which is \(\int f(x)y'(x)dx\), we are integrating \(f(x)\) using a measure defined by \(y'(x)\). For example, if \(y(x)\) is \(x^2\), the measure of an interval is not its length, but the difference of \(x^2\) between its values at the endpoints of the interval.

Also, ordinary sums, for example \(f(x_1) + f(x_2) + f(x_3)\), can also be written as Lebesgue integrals, in this case using the measure that is \(1\) on the points \(x_1, x_2\), and \(x_3\), and \(0\) elsewhere.

That sort of integral has been used since the \(19^{th}\) Century by physicists, and for some time was frowned upon by mathematicians. Physicists introduced the "delta function" \(\delta(x-x_0)\) which is \(0\) unless \(x\) is \(x_0\), but its integral is \(1\). The integral giving the sum of the previous paragraph can then be written as

\[\int dx f(x)(\delta(x-x_1) + \delta(x-x_2) + \delta(x-x_3))\]

The obvious problem with the delta function \(\delta(x-x_0)\) is that it would have to be infinite when \(x\) is \(x_0\). Fortunately, the consequences of allowing it to have an unmeasurably small width around \(x_0\) as a function of \(x\), in which case it can remain finite, are undetectable for all uses in which, when it is ultimately applied, it is integrated over. And from the point of view of Lebesgue integration, it is just another measure.

If you found any of this stuff fun, learn more about it!

We examine some important physical models whose consequences we already know how to derive.

We observed, back in **Chapter 4** that the rules for differentiating rational functions could all be deduced from one master rule: each occurrence of the variable in the function to be differentiated can be replaced by \(1\), ignoring the others, and the derivative is the sum of the results. This statement represents the fact that the derivative is the slope of the linear approximation to the function and is linear in the variable, and linear contributions can be evaluated one by one and added.

Essentially this same property implies that in constructing models of the behavior of derivatives for real phenomena, the effects on the derivative from different sources can be entered separately, one at a time, ignoring the others, and the total effect will be the sum of these.

Consider now vertical motion of an object. Newton observed that, if an object is left alone, it will continue doing what it was doing, so that its speed will stay constant. How that speed changes, which can be described by its derivative, is then proportional to what he called the "forces" compelling it to change.

Apples fall from trees, and with increasing speed as they fall. He attributed that behavior to the force of gravity, and his model for that force is that objects experience a constant negative pull of gravity toward the earth, while on its surface.

It is obvious that heavier objects require more force to move them. His model therefore was that the weight (mass) \(m\) of the object multiplied by the second derivative of its height \(h(t)\) is given by the force of gravity acting on it. Noticing that objects fall at rates independent of their weight, his model for that force was \(mg\), with \(g\) a universal constant.

His model for falling objects was then

\[mh''(t) = -mg\]

We can solve this equation. The velocity \(h'(t)\) must have derivative \(-g\), which is a constant. The general solution to this is \(h'(t) = h'(t_0) - g(t-t_0)\). This tells us that the derivative of \(h(t)\) is a linear function of \(t\), and that means that \(h(t)\) is a quadratic function:

\[h(t) = h(t_0) + h'(t_0)(t-t_0) - g\frac{(t-t_0)^2}{2}\]

Now let is consider air resistance. Objects have air resistance that depends on their shape and size. For any object there is no air resistance when it is at rest, and so the simplest model for it is that the force of the air on it is linear in its speed and in the opposite direction to it: say \(-ch'(t)\).

The equation for \(h''(t)\) then becomes

\[h''(t) = -g - \frac{c}{m}h'(t)\]

Notice that the right side of this equation is \(0\) when \(h'(t) = -\frac{gm}{c}\). This means that a falling object starting at rest, will fall faster and faster until its downward speed reaches this value, at which time its speed will become constant. Thus the object (imagine a person with a parachute) will achieve this ‘terminal velocity’ instead of falling ever faster until it hits something.

We can solve this equation by defining \(y\) to be \(h'(t) + \frac{gm}{c}\); \(y\) has the same derivative as \(h'(t)\) so that the equation for it is \(y' = -\frac{c}{m}y\). The solution to this equation is \(y(t) = y(t_0)e^{-\frac{c}{m}(t-t_0)}\), which means that a falling object, according to this model, approaches its terminal velocity exponentially fast, with exponent \(-\frac{c}{m}t\).

You will notice that while in the model for \(h''(t)\) the contributions from gravity and air resistance are separate added terms, these contributions mix completely in the solution for \(h'(t)\).

There are more interesting problems of this sort involving the behavior of objects in ordinary three dimensional space.

Newton invented calculus to solve the equations that derived from his models. In particular he applied it to describing the motion of planets which are attracted to one another and to the sun by the force of gravity with each pair separately attracting each other with a force proportional to the masses of both divided by the square of their distance. Again, the forces on any planet are the sum of those from all the others. To a first approximation the attraction from the sun is dominant, and he was able to solve the equations for planetary motion of one planet about the sun. The solutions are that the orbit is an ellipse with the Sun at one of its foci. Calculus with many variables allows formulation and solution of these equations.

A spring is a device that you can expand or contract, but when you do so it tries to get back to its equilibrium position. Suppose an object with weight \(m\) is attached to the end of a spring of negligible weight, and its equilibrium position is at \(x=0\). Then the force on the spring when the weight is at position \(x\) is \(-kx\), where \(k\) is what is called the "spring constant" of the device.

The equation of motion of the system is then \(mx'' = -kx\), or \(mx'' + kx = 0\).

We know the general solution to this equation, because we can recognize it as the equation satisfied by \(\sin\omega t\) and \(\cos\omega t\) when \(\omega = \left(\frac{k}{m}\right)^{\frac{1}{2}}\). Here the ‘frequency’ of the oscillation is \(\frac{\omega}{2\pi}\), since sines and cosines repeat as a function of their argument with period \(2\pi\). (We are using radians as our angle measure.)

**Exercise 17.1: Differentiate the function below and show that it it is the general solution to the frictionless spring equation:**

\[x(t) = x(t_0)\cos(\omega(t-t_0)) + x'(t_0) \frac{\sin(\omega(t-t_0))}{\omega}\]

This general solution can also be written as the sum of exponentials; (\(a e^{i \omega t} + be^{-i \omega t}\)), for appropriate \(a\) and \(b\).

This solution has the spring oscillating on forever.

In reality there is also friction in the motion which as in the previous section can be modeled by adding a term to the force of the form \(-fx'\).

The equation of motion then becomes

\[mx'' + fx' + kx = 0\]

We can solve this equation by looking for solutions of the form \(e^{zt}\). On substituting this form into the equation for \(x\) we get:

\[mz^2 + fz + k = 0\]

which quadratic function of \(z\) has solutions \(z = \frac{-f \pm (f^2 - 4km)^{\frac{1}{2}}}{2m}\). The two solutions for \(z\) here take the place of the two exponentials \(i\omega\) and \(-i\omega\) that appear in the frictionless problem. The first term in these solutions, \(-\frac{f}{2m}\), produces an exponential damping factor in the solutions of \(e^{-\frac{f}{2m}(t-t_0)}\)

As long as \(f^2\) is less than \(4km\), the second terms in these solutions for \(z\) are imaginary so that they give rise to sinusoidal behavior with reduced frequency compared to \(\omega\). \(X\) therefore dies off exponentially according to the factor discussed in the previous paragraph, and oscillates according to this factor as it does so.

When \(f^2\) is \(4km\) or greater, the solutions for \(z\) are real numbers and the spring is said to be critically damped. There is no oscillation at all, only exponential decay in the displacement from equilibrium \(x\), as a function of time \(t\).

The models described above are useful, but not terribly exciting. More interesting results can be obtained when we consider a spring subject to external stimuli.

When the object on the spring experiences an external force, that is some function, \(g(t)\), the model we have been considering becomes

\[mx'' = -kx - fx' + g(t)\]

The forcing can be of any kind. We handle that by looking at the response to forcing that is sinusoidal with any given frequency. We do this for three separate reasons.

First, we can solve the resulting equations, and the solutions have properties that are interesting in their own right.

Second, the very same equations arise in many other contexts, such as in the study of electric circuits, and these properties are very important.

Third, the solutions can be used to solve the general problem. Any stimulus can be written as a sum or integral of sinusoidal functions, and these solutions then can be used to obtain a corresponding sum or integral that describes the solution.

Our model is then described by the equation

\[mx'' + kx + fx' = A\sin\omega t\]

Given any solution to this equation we can add any solution to the equation with \(0\) as the right hand side term, and we will still have a solution. As we saw in the previous section, such solutions decay exponentially in \(t\) as long as \(f\) is non-zero. Solutions to the ‘homogeneous’ equation (with right hand side zero) are called transient solutions because of this decay. Thus we concentrate our attention on steady state solutions, which persist in time because the forcing function persists.

These solutions will have the same frequency and periodicity as the forcing function, and so we look at solutions of the form \(B\sin\omega t + C\cos\omega t\). We find

\[(B(-m\omega^2 + k) - fC\omega - A)\sin\omega t +(C(-\omega^2 + k) + fB\omega)\cos\omega t = 0\]

From these we deduce that both coefficients here must vanish which tell us:

\[C = \frac{-B\omega f}{k - m\omega^2}\]

and

\[B = \frac{A}{k - m\omega^2 + \frac{(ωf)^2}{k-m\omega^2}}\]

which lead to

\[B = \frac{A(k -m\omega^2)}{(k -m\omega^2)^2 + (\omega f)^2}\]

and

\[C = \frac{-\omega fA}{(k-m\omega^2)^2 + (\omega f)^2}\]The magnitude of the response to the forcing is \((B^2 + C^2)^{\frac{1}{2}}\) and that becomes

\[\frac{A}{(k -m\omega^2)^2 + (\omega f)^2}\]

The unforced and undamped spring has its ‘natural frequency’ \(\omega_0\) given by \(\omega_0^2 = \frac{k}{m}\). The magnitude just described can be written in terms of ω_{0} as

\[\frac{A}{m^2(\omega_0^2 - \omega^2) +(\omega f)^2}\]

When \(f\) is reasonably small compared to \(m\), this response exhibits a phenomenon called **resonance**. That is, when \(\omega_0^2 - \omega^2\) is very small, and \(f^2\) is small compared to \(m\), the denominator becomes very small and the response gets very large.

If we use wires to connect simple devices together we can form an electric circuit. Such circuits can be used for all sorts of purposes. Their study took off in the development of radios, a hundred years ago.

One simple circuit consists of four elements as follows: there is a power source which produces a difference of potential between its two terminals; a coil, which consists of a winding of wire, a gap in the wire and perhaps a device that offers resistance to the flow of electrons in it. Electrons flow through the circuit and pile up on the side of the gap. If we represent the total charge that does so as \(q(t)\), in some unit, the current, \(i(t)\) that flows in the circuit is \(q'(t)\). The resistance \(R\) of the circuit by Ohm’s law, produces a difference of potential of \(Ri(t)\), the difference of potential across the gap is \(\frac{q(t)}{C}\) where \(C\) is called the capacity of the gap. Changes of current, according to Faraday’s law, cause a difference of potential across the coil of \(Li'(t)\), for some constant \(L\) and so if the power source produces a difference of potential \(Vsin(\omega t)\) we find that this system obeys the equation

\[ \begin{aligned} Vsin(\omega t) &= Ri(t) + \frac{q(t)}{C} + I'(t)L \\ \quad &= Rq'(t) + \frac{q(t)}{C} + q''(t)L \end{aligned} \]

This is exactly the same as the equation of the forced harmonic oscillator, and has all the same consequences.

Suppose we have two sorts of animals. Type \(A\) animals eat type \(B\) ones. We suppose that in the absence of type \(B\) animals, type \(A\) ones will not get enough to eat and will die off or move away to avoid doing so. On the other hand we assume that in the absence of type \(A\) animals, type \(B\) ones will have a better chance to survive and will experience population growth.

Let \(A\) represent the population in our area of type \(A\) animals. In the absence of type \(B\) creatures the simplest model for these is that change of \(A\) population is a negative multiple of \(A\) itself:

\[\frac{dA}{dt} = -cA\]

Similarly, in the absence of \(A\) creatures, the simples model for behavior of the population \(B\) of \(B\) types will show increase, and obey, for some \(r\),

\[\frac{dB}{dt} = rB\]

The interaction between \(A\) and \(B\) must be \(0\) if either \(A\) or \(B\) is \(0\). The simplest interaction model is then that the contribution to the change per unit time in \(A\) from the presence of \(B\) is \(sAB\) for some \(s\), and the effect on \(B\) is \(-qAB\) for some \(q\).

Our simplest model for this situation then has the form

\[\frac{dA}{dt} = -cA + sAB\]

and

\[\frac{dB}{dt} = rB - qAB\]

So what can we say about the behavior of these populations in this model?

We can first look for steady state solutions. These happen when both derivatives are \(0\). When that happens the populations of the two species stays the same. Such solutions are called fixed points of the equations.

Here this happens when \(cA = sAB\) and \(rB = qA\) for which \(sB = c\), and \(r = qA\).

A fixed point solution is said to be stable if small deviations in \(A\) or \(B\) tend to die out and disappear or at least don’t spiral outward. You can investigate the stability of this fixed point, for values of \(c\), \(s\), \(r\) and \(q\) of your choosing, by "integrating" these equations numerically. Start with a value of \(A\) and \(B\) slightly off from the fixed point and move forward in time, finding both \(A(t)\) and \(B(t)\) exactly like one does in an integral (discussed in Chapter 14), using the left hand rule.

What tends to happen here in the \((A, B)\) plane, is that solutions generally spiral in toward the fixed point.

Given one orbit in this plane, no other orbit can cross it, because at a common point the derivatives would be the same in both, which would mean that the orbits were identical afterward.

**Exercise: Set this up with a spreadsheet, and check on this conclusion.**

You can see qualitatively what happens if, for example, the population \(B\) is suddenly reduced from its fixed point value. This causes a reduction in \(\frac{dA}{dt}\) from its \(0\) value at the fixed point, so that the population \(A\) decreases. This in turn causes an increase in \(B\). Thus, if \(B\) is the vertical coordinate used in charting the orbit, starting beneath the fixed point, the orbit moves counterclockwise around it.

You do not even need a spreadsheet to see how solutions behave. Given a starting point \((A,B)\), you can draw an arrow from it, pointing in the direction whose tangent is the ratio of the \(A\) derivative to \(B\) derivative at that point. Then choose a point a small distance along that arrow, and repeat these steps. You will generate an approximate orbit of the system in the \((A,B)\) plane.

How this plays out depends considerably on how fast the two populations recover from their reductions.

An interesting case is one in which flies are the prey and birds the predators that eat them. Reducing the fly population significantly, reduces the bird population as well, as noted above. However, fly populations recover relatively quickly, in on the order of weeks, while bird populations take on the order of years to recover. Thus bird populations tend to decline for a short time, but rise again to their fixed point value only slowly. This means that fly populations increase for quite a long time and spend much of that time at levels well above the fixed point value. Thus a campaign for killing flies is not a productive way to reduce fly populations except very temporarily.

In earlier sections, we discussed models for various phenomena, and these led to differential equations whose solutions, with appropriate additional conditions, describes behavior of the systems involved, according to these models.

In this Chapter we discuss how to use a spreadsheet to find solutions to such differential equations.

19.2 First Order Differential Equations

This material is not a replacement for a course in differential equations, which courses tend to provide insights and methods that allow for algebraic solutions to many important differential equations, as well as providing insight into behavior of solutions that you can get without having to solve them in detail.

We provide it here because many traditional courses in differential equations ignore numerical computations entirely and we wish to show that these can be done with an amount of effort not much beyond what is involved in numerical integration, for all sorts of differential equations.

We will begin by solving a first order differential equation, then consider a second order equation, and finally one describing planetary motion, which is second order and has two dependent variables. (Though planets move in three dimensional space, their motions lie in a single plane. Our dependent variables are then the \(x\) and \(y\) coordinates of a planet and the independent variable is time \(t\).)

The major difference between these is in the number of columns that need be created.

**What do you mean by "we". Are you going to do it while I go to sleep?**

Well, I'll show you how to set one up, and you will see that you can change the equation without that much effort and solve them yourself which gives you powers unknown to previous generations of students.

We will first deal with a first order differential equation by which we mean, specifically, an equation of the form \(y'(x) = f(x, y)\), for some function \(f\). Suppose, further, that we know the solution at some point \(z\).

This tells us, that in the interval in \(x\) starting at \(z\) and ending at \(z + d\) for very small \(d\), we have, approximately

\[y(z + d) - y(z) = f(z, y) d\]

We can use this "linear approximation" to compute \(y(z + d)\), and then continue to compute \(y(z + 2d)\) from it, and so on.

This approach is like the left hand rule for doing integrals; the only difference is that \(y\) itself appears in \(f\).

To implement this you let \(x\) increase from its start value by \(d\) from row to row and have \(y\) increase by \(f(x,y)d\).

**Exercise 19.1 Set this up for \(f(z+y)\) given by \(xy\) and plot \(y\) vs \(x\). (This represents the differential equation: \(y'=yx\), which has solution \(lny = x + c\). Start at \(x=1, y=1\) and find \(y(2)\) numerically and exactly from that solution, and compare.**

It is a little more difficult to produce the analogue of the right hand rule, or trapezoid or Simpson's rule, since they require evaluating \(f\) and hence \(y\) beyond \(z\), and we only start with \(y(z)\) and \(y(z+d)\) is what we want to discover. If we put \(y(z+d)\) in our formula for computing \(y(z+d)\) the computer will accuse us, rightly, of using a circular reference.

There are ways to get around this and a whole sequence of formulae are known for evaluating \(y(x+d) - y(d)\) given our equation to any order in \(d\). These are called Runge-Kutta rules, and are very effective. You can see how they do in the accompanying applet.

We will only describe the simplest correction, namely approximate \(y(z+d)\) according to

\[y(z+d) = y(z) + \frac{d}{2}(f(z,y(z)) + f(z+d,y(z)+f(z,y(z))d)\]

This means we are using \(f(z,y(z))\) as the derivative of \(f\) throughout the interval between \(z\) and \(z+d\) in approximating the value of \(y(z+d)\) in the last term above.

This is still pretty easy to do, and is more or less like the trapezoid rule, differing only in that we are estimating the derivative \(f\) at argument \(z+d\) rather than knowing it.

**What do you do if you don't know \(y(z)\) at all?**

You can get a feel for what all solutions are by making a plot in two dimensional space, one dimension being \(z\) and the other \(y(z)\). If you choose a grid of points in this plot, at each point you know the derivative \(f(z,y(z))\). If you draw an arrow pointing in the direction \(\frac{dy}{dx} = f(z,y(z))\). You can then connect the arrows (like connecting dots), forming paths, and these paths each represent solutions to the differential equation.

These paths cannot cross.

**Exercise 19.2: Figure out why paths cannot cross.**

But they have can have some interesting features. Fixed points are one such feature and are what we saw in Chapter 18. A fixed point is one for which the equation implies you stay there. A stable fixed point is one such that if you are near it you rotate or spiral into it. There are also things called attractors, which are curves either in the past or the future (when the independent variable is time) which many paths crowd into. A stable fixed point is a kind of attractor.

**You tell me I can implement the integration you described on a spreadsheet?**

Yes. Put First order ODE in A1; xstart in A2; ystart into A3; d into A4. Put your data, which consists of the starting values of \(x\) and \(y\), and your choice for \(d\) in B2, B3, and B4.

Then start columns at A6, B6, C6, which will contain \(x\) and \(y\) respectively. In A6, put x; in B6, put y (trapezoidal rule); in C6, put y (left-hand rule).

So, you can put =B2 into A7, =B3 into B7, and =A7+$B$4 into A8 and copy it down column A.

In B8 put =B7+$B$4/2*(f(A7,B7)+f(A7+$B$4,B7+f(A7,B7)*d) and copy that down column B. That is it.

You can compare the result with the left hand rule computation by setting up column C and starting with =B7 into C7, but putting =C7+$B$4*f(A7,C7) into C8 and copying it down. Then you can make an \(x,y\) scatter chart of all three columns, and see what happens. The difference between the two computations gives you an impression of how bad the simpler one is.

You can see that it takes a bit more work to change functions \(f\), but is quite easy to change initial conditions. Here is the result for \(f'(x) = xy\) with \(d = 0.01\) and a starting point at \(x = 1\), \(y = 1\).

Number of steps

25

Number of digits after decimal point

10

**Exercises 19.3 Set this up for \(f(x, y) = x^2y\), and for \(f(x,y) = xsin(x, y)\), starting at \(x=0, y=1\).**

**Does this always work?**

No. For lots of interesting equations it is fine. However, sometimes your variable \(y\) can go to infinity, and then the calculation becomes quite inaccurate.

This can happen because, we are allowing any equation for \(y'\), and hence any equation for \(\left(\frac{1}{y}\right)'\). Which means \(\frac{1}{y}\) can sometimes be \(0\). If \(\frac{1}{y}\) should happen to go through \(0\), then \(y\) will go to infinity without any particular reason for it.

Most of the time you can avoid this difficulty by solving the differential equation for \(\frac{1}{y}\) while you are solving the one for \(y\). When \(y\) goes to infinity, \(\frac{1}{y}\) is quite tame and is near \(0\) (remember that\(\left(\frac{1}{y}\right)'\) is \(\frac{-y'}{y^2}\), so that if we let \(\frac{1}{y}\) be \(u\), then \(u\) obeys \(u' = -f\left(x,\frac{1}{u}\right)u^2\)). If you do this you can use as next \(y\) value the one you get from the smaller of \(y\) and \(\frac{1}{y}\).

Anyway, integrating differential equations this way is sufficiently easy that it is worth a try.

A second order differential equation is one that expresses the second derivative of the dependent variable as a function of the variable and its first derivative. (More generally it is an equation involving that variable and its second derivative, and perhaps its first derivative.)

Perhaps the easiest way to handle such an equation is to give a name to the first derivative. Then the original equation becomes a pair of coupled equations for the dependent variable and for its derivative. What you get when doing this is a pair of first order differential equations like the pair of coupled equations seen in the Predator Prey problem.

Given the equation \(x'' = f(x,x',t)\), we set \(z = x'\) and get the two equations:

\[z' = f(x,z,t) ~ \text{and} ~ x' = z\]

Starting with initial values for \(y\) and \(y'\) we can produce a left hand rule approximate solutions to these equations by keeping track of \(y, z\) and \(z'\) as \(t\) increases by some small increment \(d\). We can plot solutions in three ways, as "orbits" using \(x\) and \(z\) as axes, or plot \(x\) and/or \(z\) as functions of \(t\).

The example of forced harmonic motion:

\[mx'' = -fx' - kx + vsin(\omega t)\]

gives rise to the coupled equations;

\[mz' = -fz -kx + vsin(\omega t) ~ \text{and} ~ x' = z\]

Newton's Laws of motion yield second order differential equations for the positions of objects. There are three dimensions of motion for each particle. They are often reformulated as twice as many first order differential equations, in almost the same way. We will describe this reformulation in one dimension The same thing can be done with any number of dimensions.

In many interesting situations energy is conserved. Energy does not appear in Newton's equation \(F = ma\). We first have to define it.

The kinetic energy of an object of mass \(m\) moving in one dimension with speed \(v\) is \(\frac{mv^2}{2}\). Its momentum, \(p\), is \(mv\). \(p\) rather than \(v\) is the second variable introduced to reduced the equation to first order.

The kinetic energy is then \(\frac{p^2}{2m}\). The force \(F\) on the particle is defined to be the negative of the derivative of the potential energy with respect to the dependent variable (keeping all the other dependent variables and momenta fixed). Thus in the case of gravity on the surface of the earth, the force on an object of weight \(m\) exerted by the earth is \(-mg\), and the potential energy is \(mgh\).

The energy also called the Hamiltonian of the system and written as \(H\), is the sum of the kinetic and potential energies. (Incidentally, the \(H\) symbol originally was a Greek capital eta and was chosen to be so because energy begins with E.)

Thus for gravity on the earth’s surface the Hamiltonian is given by.

\[H = \frac{p^2}{2m} + mgh\]

The equations of motion equivalent to \(F=ma\) then become:

\[\frac{dh}{dt} = \frac{\partial H}{\partial p} ~ \text{, and} ~ \frac{dp}{dt} = -\frac{\partial H}{\partial h}\]

The quaint symbols \(\frac{\partial H}{\partial p}\) that appear here mean that you take the derivative of \(H\) with respect to \(p\) treating the other dependent variable \(h\) as a constant. This sort of derivative is called the partial derivative of \(H\) with respect to \(p\). (In complicated situations, when there are several possible other dependent variables, its meaning depends on which ones you are keeping constant. Here it is well defined.)

**Exercise 19.4 What is the Hamiltonian for an undamped and unforced harmonic oscillator (for which the force is \(-kx\)?**

The gravitational interaction between a planet and the sun is described by the inverse square central force law.

For convenience we place the sun at the origin of our coordinates, and start our planet at the point \((1,0,0)\), with the initial first derivative of its position given by \((a,b,0)\).

We will assume that the planet is much lighter than the sun, (as the earth is compared to our sun) so that the sun does not move. (Actually, what is fixed in planetary motion is the center of mass of the system. Jupiter and Saturn are sufficiently large that when they are in the same part of our sky the center of mass of all the planets does not lie inside the sun, so that the sun moves around, but not very much.)

With these coordinates and this assumption the equations of motion for \(\vec{r}\), the position vector of the planet obeys the equation

\[\frac{d\vec{r}}{dt} = -c\frac{\vec{r}}{r^3}\]

Since the force on the planet points toward the sun, and we are starting the planet in the \((x,y)\) plane, our \(z\) coordinate will always be \(0\), and we can ignore it.

This is a second order differential equation with two dependent variables, \(x(t)\) and \(y(t)\). We can set this up on a spreadsheet devoting a column each to \(t, x, y\) and the derivatives of \(x\) and \(y\). In terms of coordinates, the equations of motion are

\[\frac{d^2x}{dt^2} = -c\frac{x}{r^3} ~ \text{and} ~ \frac{d^2y}{dt^2} = -c\frac{y}{r^3}\]

Since \(r\) occurs in both of these equations and \(r\) is \((x^2 +y^2)^\frac{1}{2}\), it is convenient to devote a column to \(r\) as well. Setting \(r=1\) defines a scale for \(r\), but not for \(t\). This means we can choose our unit of time so that \(c\) is \(1\).

With that choice, we can set up our spreadsheet as follows:

We put the time variable \(t\) in column A, and start at row 7 with A7 set to 0. We must choose an increment for \(d\), and you can determine the one you like best. It must be small enough so that \(\frac{d\vec{r}}{dt}\) is small, but large enough that you can plot orbits. You might start with \(d=10^{-2}\) and change that if it does not work well. We can put the letter d in A2 and its value in B2. The other parameters we will need to specify are the initial values of the derivatives of \(x\) and \(y\). So enter "initial x speed" in A3 and its value (say 0) in B3, with "initial y speed" in A4 and its value (say 1) in B4.

Put t in A6, x in B6, y in C6, r in D6, x' in E6, and y' in F6. We put \(x\) and \(y\) in columns B and C and so put 1 in B7 and 0 in C7. We put \(r\) in column D, setting D7 to =(B7^2+C7^2)^0.5. We put \(\frac{dx}{dt}\) (call it \(x'\)) in column E, setting E7 to =B3 and put \(\frac{dy}{dt}\) (call it \(y'\)) in column F setting F7 to =B4.

We next set A8 to =A7+$B$2

B8 to =B7+$B$2*E7

C8 to =C7+B$2*F7 (you can copy B8 into C8)

Copy D7 into D8

Set E8 to =E7-$B$2*B7/$D7^3

Copy E8 to F8

Now copy A8 through F8 down the columns

This will give the crudest approximation to the solution for your values of the parameters.

**When you are done, an \(x,y\) plot of columns B and C will give the orbits in space. Adjust your parameters as needed.**

Number of steps

25

Number of digits after decimal point

10

**Exercise 19.5: Set this up. What values of \(\frac{dx}{dt}\) and \(\frac{dy}{dt}\) give circular motion in these coordinates?**

In the past, dealing with equations like these numerically was excruciatingly horrible. Instead, physicists from Newton on solved the equations by introducing quantities, namely energy and angular momentum, which do not change with this motion, and deduced orbits by reasoning rather than numerical computation.

The actual behavior of planets was carefully observed by astronomers over centuries and was crisply summarized in Kepler's three laws, which are as follows:

**1. The motion of planets and other bodies subject to the same force is in orbits that are "conic sections": ellipses or hyperbolae or in very special circumstances parabolas (all with the sun as a focus), or straight lines.**

**2. The area swept out per unit time in any orbit is constant.**

3. There is a certain specific relation between the period of an elliptical orbit and a measure of its radius, which relation we will not discuss further.

**Final Note:** The last few chapters contain lots of material that is not contained in any normal single variable calculus course. The purpose of this material is for your enjoyment and not to intimidate you. The problem is that the applets and the approaches here allow you to learn calculus much faster than you can be expected to do with a regular calculus course. But what you learn and retain is heavily dependent on how much time you spend doing it. If the end result was that you spent much less time learning calculus, that would bad for you. So you might as well spend the same amount of time, and just learn more!

i | The square root of minus one. |

\(f(x)\) | The value of the function \(f\) at argument \(x\). |

\(\sin x\) | The value of the sine function at argument \(x\). |

\(\exp x\) | The value of the exponential function at argument \(x\). This is often written as \(e^x\). |

a^x | The number a raised to the power \(x\); for rational \(x\) is defined by inverse functions. |

\(\ln x\) | The inverse function to \(\exp x\). |

\(a^x\) | Same as a^x. |

\(\log_b a\) | The power you must raise \(b\) to in order to get \(a\): \(b^{\log_ba} = a\). |

\(\cos x\) | The value of the cosine function (complement of the sine) at argument \(x\). |

\(\tan x\) | Works out to be \(\frac{\sin x}{\cos x}\). |

\(\cot x\) | The value of the complement of the tangent function or \(\frac{\cos x}{\sin x}\). |

\(\sec x\) | Value of the secant function, which turns out to be \(\frac{1}{\cos x}\). |

\(\csc x\) | Value of the complement of the secant, called the cosecant. It is \(\frac{1}{\sin x}\). |

\(\text{asin}\, x\) | The value, \(y\), of the inverse function to the sine at argument \(x\). Means \(x = \sin y\). |

\(\text{acos}\, x\) | The value, \(y\), of the inverse function to cosine at argument \(x\). Means \(x = \cos y\). |

\(\text{atan}\, x\) | The value, \(y\), of the inverse function to tangent at argument \(x\). Means \(x = \tan y\). |

\(\text{acot}\, x\) | The value, \(y\), of the inverse function to cotangent at argument \(x\). Means \(x = \cot y\). |

\(\text{asec}\, x\) | The value, \(y\), of the inverse function to secant at argument \(x\). Means \(x = \sec y\). |

\(\text{acsc}\, x\) | The value, \(y\), of the inverse function to cosecant at argument \(x\). Means \(x = \csc y\). |

\(\theta\) | A standard symbol for angle. Measured in radians unless stated otherwise. Used especially for \(\text{atan}\, \frac{x}{y}\) when \(x\), \(y\), and \(z\) are variables used to describe point in three dimensional space. |

\(\hat{i}\), \(\hat{j}\), \(\hat{k}\) | Unit vectors in the \(x\), \(y\) and \(z\) directions respectively. |

\((a, b, c)\) | A vector with \(x\) component \(a\), \(y\) component \(b\) and \(z\) component \(c\). |

\((a, b)\) | A vector with \(x\) component \(a\), \(y\) component \(b\). |

\(\left(\vec{a},\vec{b}\right)\) | The dot product of vectors \(\vec{a}\) and \(\vec{b}\). |

\(\vec{a} \cdot \vec{b}\) | The dot product of vectors \(\vec{a}\) and \(\vec{b}\). |

\(\left(\vec{a} \cdot \vec{b}\right)\) | The dot product of vectors \(\vec{a}\) and \(\vec{b}\). |

\(|\vec{v}|\) | The magnitude of the vector \(\vec{v}\). |

\(|x|\) | The absolute value of the number \(x\). |

\(\sum\) | Used to denote a summation, usually the index and often their end values are written under it with upper end value above it. For example the sum of \(j\) for \(j = 1\) to \(n\) is written as \(\sum_{j=1}^{n}j\) or \(\sum^{n}j\). This signifies \(1 + 2 + ... + n\). |

\(M\) | Used to represent a matrix or array of numbers or other entities. |

|v> | A column vector, that is one whose components are written as a column and treated as a \(k\) by \(1\) matrix. |

<v| | A vector written as a row, or \(1\) by \(k\) matrix. |

\(dx\) | An "infinitesimal" or very small change in the variable \(x\); also similarly \(dy\), \(dz\), \(dr\) etc. |

\(ds\) | A small change in distance. |

\(\rho\) | The variable \(\sqrt{x^2 +y^2 + z^2}\) or distance to the origin in spherical coordinates. |

\(r\) | The variable \(\sqrt{x^2 +y^2}\) or distance to the z-axis in three dimensions or in polar coordinates. |

\(|M|\) | The determinant of a matrix \(M\) (whose magnitude is the area or volume of the parallel sided region determined by its columns or rows). |

\(||M||\) | The magnitude of the determinant of the matrix \(M\), which is a volume or area or hypervolume. |

\(\text{det}\, M\) | The determinant of \(M\). |

\(M^{-1}\) | The inverse of the matrix \(M\). |

\(\vec{v} \times \vec{w}\) | The vector product or cross product of two vectors, \(\vec{v}\) and \(\vec{w}\). |

\(\theta_{vw}\) | The angle made by vectors \(\vec{v}\) and \(\vec{w}\). |

\(A \cdot B \times C\) | The scalar triple product, the determinant of the matrix formed by columns \(A\), \(B\), \(C\). |

\(\hat{u}_w\) | A unit vector in the direction of the vector \(\vec{w}\); it means the same as \(\frac{\vec{w}}{|\vec{w}|}\). |

\(df\) | A very small change in the function \(f\), sufficiently small that the linear approximation to all relevant functions holds for such changes. |

\(\frac{df}{dx}\) | The derivative of \(f\) with respect to \(x\), which is the slope of the linear approximation to \(f\). |

\(f'\) | The derivative of \(f\) with respect to the relevant variable, usually \(x\). |

\(\frac{\partial f}{\partial x}\) | The partial derivative of \(f\) with respect to \(x\), keeping \(y\), and \(z\) fixed. In general a partial derivative of \(f\) with respect to a variable \(q\) is the ratio of \(df\) to \(dq\) when certain other variables are held fixed. Where there is possible misunderstanding over which variables are to be fixed that information should be made explicit. |

\(\left.\frac{\partial f}{\partial x} \right|_{y,z}\) | The partial derivative of \(f\) with respect to \(x\) keeping \(y\) and \(z\) fixed. |

\(\text{grad}\,f\) | The vector field whose components are the partial derivatives of the function \(f\) with respect to \(x\), \(y\) and \(z\): \(\left(\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}\right)\) or \( \frac{\partial f}{\partial x}\hat{i} + \frac{\partial f}{\partial y}\hat{j} + \frac{\partial f}{\partial z}\hat{k} \), called the gradient of \(f\). |

\(\nabla\) | The vector operator \( \frac{\partial}{\partial x}\hat{i} + \frac{\partial}{\partial y}\hat{j} + \frac{\partial}{\partial z}\hat{k} \), called "del". |

\(\nabla f\) | The gradient of \(f\); its dot product with \(\hat{u}_w\) is the directional derivative of \(f\) in the direction of \(\vec{w}\). |

\(\nabla \cdot \vec{w}\) | The divergence of the vector field \(\vec{w}\); it is the dot product of the vector operator \(\nabla\) with the vector \(\vec{w}\), or \( \frac{\partial w_x}{\partial x} + \frac{\partial w_y}{\partial y} + \frac{\partial w_z}{\partial z} \). |

\(\text{curl}\,\vec{w}\) | The cross product of the vector operator \(\nabla\) with the vector \(\vec{w}\). |

\(\nabla \times \vec{w}\) | The curl of \(\vec{w}\), with components \( \left( \frac{\partial f_z}{\partial y} - \frac{\partial f_y}{\partial z}, \frac{\partial f_x}{\partial z} - \frac{\partial f_z}{\partial x}, \frac{\partial f_y}{\partial x} - \frac{\partial f_x}{\partial y} \right) \). |

\(\nabla \cdot \nabla\) | The Laplacian, the differential operator: \( \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2} + \frac{\partial^2}{\partial z^2} \). |

\(f'(x)\) | The second derivative of \(f\) with respect to \(x\); the derivative of \(f'(x)\). |

\(\frac{d^2f}{dx^2}\) | The second derivative of \(f\) with respect to \(x\). |

\(f^{(2)}(x)\) | Still another form for the second derivative of \(f\) with respect to \(x\). |

\(f^{(k)}(x)\) | The k-th derivative of \(f\) with respect to \(x\); the derivative of \(f^{(k-1)}(x)\). |

\(\hat{T}\) | Unit tangent vector along a curve; if curve is described by \(\vec{r}(t)\), \(\hat{T} = \frac{d\vec{r}/dt}{|d\vec{r}/dt|}\). |

\(ds\) | A differential of distance along a curve. |

\(\kappa\) | The curvature of a curve; the magnitude of the derivative of its unit tangent vector with respect to distance on the curve: \(\left|\frac{d\vec{T}}{ds}\right|\). |

\(\hat{N}\) | A unit vector in the direction of the projection of \(\frac{d\hat{T}}{ds}\) normal to \(\hat{T}\). |

\(\hat{B}\) | A unit vector normal to the plane of \(\hat{T}\) and \(\hat{N}\), which is the plane of curvature. |

\(\tau\) | The torsion of a curve: \(\left|\frac{d\hat{B}}{ds}\right|\). |

\(g\) | The gravitational constant. |

\(\vec{F}\) | The standard symbol for force in mechanics. |

\(k\) | The spring constant of a spring. |

\(\vec{p}_i\) | The momentum of the i-th particle. |

\(H\) | The Hamiltonian of a physical system, which is its energy expressed in terms of \(\{\hat{r}_i\}\) and \(\{\hat{p}_i\}\), position and momentum. |

\(\{Q, H\}\) | The Poisson bracket of \(Q\) and \(H\). |

\(\int^x f(u)du\) | An antiderivative of \(f(x)\) expressed as a function of \(x\). |

\(\int_a^b f(x)dx\) | The definite integral of \(f\) from \(a\) to \(b\). When \(f\) is positive and \(a < b\) holds, then this is the area between the x-axis the lines \(y = a\), \(y = b\) and the curve that represents the function \(f\) between these lines. |

\(L(d)\) | A Reimann sum with uniform interval size \(d\) and \(f\) evaluated at the left end of each subinterval. |

\(R(d)\) | A Reimann sum with uniform interval size \(d\) and \(f\) evaluated at the right end of each subinterval. |

\(M(d)\) | A Reimann sum with uniform interval size \(d\) and \(f\) evaluated at the maximum point of \(f\) in each subinterval. |

\(m(d)\) | A Reimann sum with uniform interval size \(d\) and \(f\) evaluated at the minimum point of \(f\) in each subinterval. |

COMPLEX NUMBERS

The red and blue arrows represent two complex numbers, namely those whose plot puts them at the tip of the arrows. You can change these as you please either by left clicking on the arrow heads and moving them, or by entering new values in the boxes on the left, over the red and blue entries respectively.

If you enter them that second way, you should clock on ‘plot‘ afterward. The range shown in the plot can be altered by entering those you wish, where appropriate.

IF you select one of the combinations below these, by clicking on the adjacent box, you will see that combination of those two numbers. Once chosen, you can move either of the numbers, and watch how that combination changes as that number changes. If you choose one of the functions in the list on the bottom, you must click on plot to get the right picture.

This tool uses JQWidgets extensively. Please verify their copyright.

Developed by Daniel Kleitman and Jean-Michel Claus.

w =

z =

w + z

w - z

w * z

w / z

f(w, z) =

CURVES IN TWO DIMENSIONS

You may enter data in this applet in two ways. In "Function form" y is a function of x. In ‘Parametric form’ both x and y are functions of a parameter t. You can choose functions in either case from the menu given or enter or change them from your keyboard.

The boxes below allow you to enter the plotting ranges . If you activate ‘show point on curve’ by clicking on it, you can wander using the slider or incremental mover to see values of variables and where they are on the curve. Don’t forget to click on ‘plot function’ when you make new entries.

This tool uses JQWidgets extensively. Please verify their copyright.

Developed by Daniel Kleitman and Jean-Michel Claus.

- Function form
- Parametric form

x_{min} =

x_{max} =

y_{min} =

y_{max} =

Show point on curve

t_{min} =

t_{max} =

x_{min} =

x_{max} =

y_{min} =

y_{max} =

Show point on curve

DERIVATIVE AND TANGENT LINE

You can enter your function, or choose one from the menu, in the first row, and choose scale for the diagram in the next rows. Moving the slider will move the tangent line across the diagram. With first and or second derivative selected, you will see curves and values of these derivatives of your function, along with the curve defined by your function itself. Press ‘plot function’ whenever you change your input function.

The derivative of your function is the slope of the moving tangent line.

This tool uses JQWidgets extensively. Please verify their copyright.

Developed by Daniel Kleitman and Jean-Michel Claus.

Function y = f(x) and its derivative y = f '(x)

x_{min} =

x_{max} =

y_{min} =

y_{max} =

Show derivative

Show second derivative

FIRST ORDER ODE

We are examining solutions to equations of the form: dy/dx = f(x,y).

You can enter f(x,y) in the usual manner in the top white space, and choose the scale in the variables in the spaces below it.

Such an equation has one free parameter, which can be the value of y for a given x value or its derivative at some x value.

You can drag the red dot to the x and y values that set the y value, and choose the y value that produces the desired derivative, at x, if that is what you want.

The solutions obtained using three different rules for approximating the solution can be see if you check the appropriate boxes. When two, including the Runge Kutta rule agree, you can believe the indicated solution.

This tool uses JQWidgets extensively. Please verify their copyright.

Developed by Daniel Kleitman and Jean-Michel Claus.

x_{min} =

x_{max} =

y_{min} =

y_{max} =

Show left-hand rule

Show trapezoidal rule

Show Runge Kutta rule

NEWTON'S METHOD

This is a way to solve an equation of the form f(x) = 0 when f is a differentiable function.

A step in Newton’s method consists in starting at a point on your curve, moving along the tangent line there to the x-axis, then jumping back to the curve, You repeat such steps until you reach a point at which the curve meets the x-axis.

You can enter your function as usual or use one on the menu. The default function illustrates what can happen. When the function has more than one zero, the one you reach depends on where you start.

The top slider and directional circles move the starting point. Those below change the number of such steps shown.

This tool uses JQWidgets extensively. Please verify their copyright.

Developed by Daniel Kleitman and Jean-Michel Claus.

Newton's method for function y = f(x)

x_{min} =

x_{max} =

y_{min} =

y_{max} =

max steps =

Show pts

NUMERICAL INTEGRATION

You can enter your own function or use one on the menu, in the top white space. The lower and upper limits of integration are entered in the next two spaces. The scale used to display your function are defined by lower and upper limits that you can enter on the next pair of spaces.

The applet calculates an approximation to the integral of your function between the given limits of integration, by dividing the region between those limits into strips, and applying each of the four approximation rules in each strip. The slider determines the number of strips, which increase as it moves to the right. The four methods are: use left hand y value for each strip, use right hand value instead, use the average of the two (This is called the trapezoid rule) and Simpson’s Rule. Simpson’s rule when there are an even number of strips, can be described as taking 4/3 of the trapezoid rule answer, and subtracting 1/3 of the trapezoid rule with half the number of strips.

If you click in either of the circles with arrows in them, the slider moves by one in the indicated direction.

This tool uses JQWidgets extensively. Please verify their copyright.

Developed by Daniel Kleitman and Jean-Michel Claus.

Rectangular, Trapezoidal, and Simpson's rule

x_{min} =

x_{max} =

y_{min} =

y_{max} =

precision =

Rectangular left

Rectangular right

Trapezoidal

Simpson's

OPERATIONS ON FUNCTIONS

You can use this applet to get an idea at what sums, products, differences, quotients, and compositions of two functions are, and what inverses of functions look like.

You can enter two functions the usual way, and look at all these things by selecting them. I find looking at more than one at a time confuses me, and looking at all at once drives me nuts.

This tool uses JQWidgets extensively. Please verify their copyright.

Developed by Daniel Kleitman and Jean-Michel Claus.

x_{min} =

x_{max} =

y_{min} =

y_{max} =

λ =

f

g

f + g

f - g

f * g

f / g

f(g)

Inverse of f

λ * f

SECOND ORDER ODE

In this applet the equation, and the end values of variables on the diagrams, can be entered when "Function Input" is highlighted.

You can express the second derivative (d^{2}u/dt^{2}) as an arbitrary function of the variables u, du/dt (here called v), and t.

Click "plot curves" when the equations and limits have been chosen.

Such equations normally have two free parameters. In this applet you can fix u(t_{1}) and v(t_{1}) by dragging the rear of the arrow to horizontal position (t_{1}) and vertical position u(t_{1}) and adjusting the front end of the arrow so that the arrow has slope v. (The arrow length does not matter, which means that choosing a large arrow gives you more control of that slope.) If you want to impose conditions at two different t values you can fix one variable at one value and adjust the other until the other condition is satisfied.

The slider controls the number of slices in the approximation used to compute the solution with the parameters you have chosen.

When the v box is checked, both the curves of u vs t and v vs t are displayed.

If you click on "Phase plane", you to see a plot of v vs u, as well as the plots of u and v vs t, but cannot see the equation and variable limits you have chosen.

This tool uses JQWidgets extensively. Please verify their copyright.

Developed by Daniel Kleitman and Jean-Michel Claus.

- Function input
- Phase plane

We define: v = du/dt

t_{min} =

t_{max} =

u_{min} =

u_{max} =

Show v = du/dt

SERIES RLC CIRCUIT

A Series RLC circuit obeys the differential equation

Lq” + Rq’+q/C = V_{0}e^{iωt}

In this applet you can change the values of L (in millihenrys), R (in ohms), V_{0} (in volts) and C (in millifarads) and f (which is ω/(2π)) by using the corresponding sliders or by clicking on the adjacent circles with arrows.

The steady state response amplitude of q here called Q_{0} (in tenths of a coulomb) (green) and of its current q´ denoted here by I_{0} (in amperes) (red) are pictured along with V_{0}, in blue.

Below the sliders the amplitudes I_{0} and Q_{0} of current and charge, the ratio of current and voltage amplitudes |z| , the phase difference (in degrees) between the current I and voltage V, and the resonance frequency f_{0} are given for the settings you choose.

This tool uses JQWidgets extensively. Please verify their copyright.

Developed by Daniel Kleitman and Jean-Michel Claus.

SLOPE OF A LINE

This applet illustrates the meaning of the slope and intercept parameters of a straight line, by graphing the line they define, and allowing you to change them as you wish.

You can change the slope a and y-intercept b by moving the sliders or by clicking inside the circles that contain arrows.

This tool uses JQWidgets extensively. Please verify their copyright.

Developed by Daniel Kleitman and Jean-Michel Claus.

Line is defined by: y = ax + b

a_{min} =

a_{max} =

b_{min} =

b_{max} =

Show a

Show b

TRIGONOMETRIC FUNCTIONS

You can move the point on the unit circle on the left by clicking on it and moving around the circle, or by using the slider below the diagram. Each function chosen is graphed on the right. (The cosine and sine are the x and y coordinates of that point, the magnitude of the tangent is the length of the tangent to the circle from the circle to the x axis, and the secant is the x coordinate of the point at which the tangent hits the x axis. The cotangent and cosecant are the same with x and y switched. All signs are positive in the first quadrant.)

I get dizzy if I try to look at all these functions at once.

This tool uses JQWidgets extensively. Please verify their copyright.

Developed by Daniel Kleitman and Jean-Michel Claus.

cosine

sine

tangent

cotangent

secant

cosecant