I was recently chatting over **Whatsapp **with a junior of mine who is pursuing economics in college. Let’s call him Amar (let it be a tribute to Aamir Khan’s character from the 90’s cult film Andaz Apna Apna!). Amar is generally a jovial and happening guy, but that day he was pretty subdued and forlorn, even for texting! After some pressing, he revealed, ‘*I am deeply concerned about how well the economy is doing and what all would actually improve its performance. But the mathematical and statistical jargon required to understand that in the econometrics paper will make me flunk this semester. I am doomed*.’ I could totally sympathize with Amar, having gone through a similar phase myself. Right then, **I decided to simplify it **to the extent that not just he, but any layman could understand it. I wish somebody had done for me what I am about to attempt for him.

Why would you, the one reading this, care about econometrics at all (let’s call it ‘metrics for short, makes it sound less daunting!)? Hmmm… Let’s see. Given that you are reading this over the internet, there is a good chance that you are active over **social media **(duh). Whether you are a student, a working professional, or a retired member of the community, you would have surely been lectured by someone or the other regarding how unproductive and time-killing it is, regardless of how addicted to it the self-professed lecturer might themselves be! Anyhow, you would want to assess it, wouldn’t you? Let’s pick **Facebook**, for instance. How long do you spend on it every day? What causes you to spend that much time on Facebook? And, how long would you spend on it if you could control these factors? Maybe ‘metrics could answer these questions. You might just learn something about yourself today!

OK. Let’s denote the number of hours you spend on Facebook daily by *FBtime*. To me, the following factors seem to influence this for you every day (there could be a host of others too, but let’s keep it simple, honey!):

- The amount of free time you have during the day (
*FreeTime*) - Whether you are bored (
*Bored*)

So how you would go about it is by collecting data on each of these factors (*FreeTime *and *Bored*) and on *FBtime*for yourself for each day. *FBtime *and *FreeTime*would be measured in the number of hours, and *Bored*as a ‘yes’ or a ‘no’ (i.e. 1 or 0; brilliant, isn’t it!). The value of each of these for a given day would be one point. For instance, *FBtime = *4, *FreeTime*= 6, and *Bored = *1 would be a single point. After collecting this for several days, you would obtain a sea of several rows (i.e. number of days) and three columns.

**KEEPING IT SIMPLE**

Your ultimate aim in all of this would be to find out your *FBtime *given the level of each factor so that you could control your own behaviour. For example, your *FBtime = *3 if *FreeTime*= 5. How ‘metrics helps you with that is by providing you with how many additional hours you would spend on Facebook if each of the factors were to be present or increased (as the case may be). For example, your *FBtime *increases from 4 hours to 4.5 hours if your *FreeTime *increases from 6 hours to 8 hours. For the time being, let us consider only these two for the typical Amar.

Remember the sea of data we had collected before? What if we plotted your *FBtime *against your *FreeTime*? Intuitively, the former would keep increasing as the latter goes up, as shown below.

Now you could ask why these points do not form a straight line if they have a positive relationship, and it is a legit question. The answer is that in the social arena (i.e. where human beings like Amar are the main actors), such perfect relationships are rare because *we have an in-built randomness in our behaviour*. If you are given 5 hours of free time on two separate days, would you definitely spend 4 hours on FB on both the days? Not quite. You might spend 3 hours on the first and maybe all of the 5 on the second. Given that, the scatterplot in itself doesn’t make much sense, right? After all, it’s just a bunch of points. To make it easier for you, we instead try to *guess the position of this straight line *(just assuming for the moment that the relationship was perfect) which could give us a better idea. We’ll call it the *indicative line. *It would look something like this:

This looks nice and gives a clear picture! As *FreeTime *increases, the indicative line predicts an increased *FBtime*. Now the difference between the indicative line and each of the dots is what we would call the *error*for each given value of *FreeTime*. The small lines shown in the image below (only 2 are shown, rest can be drawn as well) are the amounts of error for each value of *FreeTime*. What this means is that the indicative line differs from your actual *FBtime *by the amount of the error at that level.

Naturally, you would want to **construct an indicative line that has as low a collective of these errors as possible**. This is the basic method applied in ‘metrics.

Our simple version looks like this:

*FBtime *(line) *= *B1 + B2 (*FreeTime*)

[Notice how the words ‘time’ and ‘line’ above are pertinent to our context of Facebook?!]

B1 is the number of hours you would spend on Facebook even if you did not have any free time (how much do you think would that be for you?!). B2 is the slope of the indicative line and tells us by how much *FBtime *would increase with a one hour increase in *FreeTime*. *Basically, using the sea of data on the variables, we calculate the values of B1 and B2. Once we have the values of B1 and B2, we insert any values of FreeTime to see what we can expect FBtime to be at that level. *For example, given the calculated values of B1 and B2, we insert *FreeTime*= 7 and obtain *FBtime*= 2 (suppose). Therefore, *the calculation of B1 and B2 is of utmost importance, and this depends a lot on the error term*. If you wish to look at each of the actual values of *FBtime *(i.e., the dots), it would look like the following:

*FBtime (actual)= ***B1 + B2 ( FreeTime)**+ error

**ASSUMING IT AWAY**

In essence, we can say that the error term is just a representation of the factors of *FBtime *that we have not included in our equation. *The error is different for each point, but how they behave in a group matters*. Like I said above, the behaviour of the error term is vital to the calculation of the terms B1 and B2, and if we are unable to calculate those accurately, it all becomes a moot point. Thus, to keep it simple, *we assume certain things about the error*(and later break those assumptions to come closer to reality; well, that is how economists work!). One of them is that *the error does not systematically affect the indicative line*, otherwise we would have a line that either diverges from (or converges to) the actual one or one that is at a constant distance from it. The second one says that *all actual FBtime values are, on the average, equally close to the indicative line*and are thus reliable. And the third says that *errors should not be dependent on each other*, otherwise reliable calculation of B1 and B2 becomes an issue.

**JUST FREE TIME, OR SOMETHING ELSE TOO?**

I had mentioned above that we had not considered all factors for *FBtime. *Obviously, Amar does not have just free time during the day, right? He has college to attend and studying to do at home. Just like you may have an office to go to or work-from-home to attempt. Intuitively, if you could increase your work hours, you would have less free time and would spend less time on FB. So, it seems natural to include this factor in our exercise, doesn’t it? Let’s call it *WorkHours. *It should give us a clearer picture. Our simple equation then changes to this:

*FBtime *(line) *= *B1 + B2 (*FreeTime*) + B3 (*WorkHours*)

Just a slight change in how we read this. *B2 now tells us by how much FBtime would change with a one hour increase in FreeTime, keeping WorkHours constant*. Likewise, B3 tells us by how much *FBtime *would change with a one hour increase in *WorkHours*, keeping the effect of *FreeTime *aside. The idea behind B1 remains the same as before. So far so good.

But hold your horses. Don’t you think *FreeTime *and *WorkHours *are two sides of the same coin? In the 24 hours of the day, if these are the only two things you do (just for simplicity), then *FreeTime + WorkHours = *24. Or, *WorkHours = *24 – *FreeTime**. *Here, I said B2 shows the change in *FBtime *with a one hour increase in *FreeTime*, **keeping WorkHours **

**the same**. But if these two are related as above, then there is no way to keep

*WorkHours*constant while changing

*FreeTime*. If

*FreeTime*is 8,

*WorkHours*becomes 16. If

*FreeTime*is 10,

*WorkHours*becomes 14. Try it for yourself! No matter what you do, you cannot keep one constant and change the other.

If we are to include more than one factor, none of those factors should be related to each other, otherwise, you cannot calculate B1, B2, and B3. Since this is the case here, we cannot include *WorkHours*. Moral of the story: **add a factor only if it is not related to another one; otherwise, keep it simple**.

**DUMMIES…. FOR DUMMIES!**

A factor I had mentioned before which is not related to *FreeTime *and which is bound to affect how long Amar spends on FB is **how bored he is**. Don’t you reach out for your newsfeed several times during the day just because you don’t really have anything else to do? You got me. Let’s call it *Bored*. The typical Amar is either bored (1) or he isn’t (0). This factor is either present or absent. The simple equation then changes to this:

*FBtime *(line) *= *B1 + B2 (*FreeTime*) + B3 (*Bored*)

B3 now tells you that given your *FreeTime*, *how many more hours you would spend on FB if you were bored*. From the example image below, you can see that if your *FreeTime = *6, then you would spend 3 hours on FB if you are not bored but around 5 hours on it if you are, indeed, bored. Such a factor which can take only separate values like 0 or 1 is called a *dummy *in ‘metrics.

**IT’S TOO FAR APART!**

In our simple equation with just *FreeTime *as the sole factor, we had assumed that the actual *FBtime *values (i.e. the dots) are equally spread around the indicative line on the average. However, in reality, this may not be the case (breaking the assumptions!). Like I said before, there might be more than one *FBtime *value for a given *FreeTime*. For instance, you may have 5 hours of *FreeTime *on three separate days. But you may spend 3 hours, 2 hours, and 4 hours of *FBtime *respectively on those separate days. Thus, we have three different dots for a single *FreeTime *value of 5 hours.

Now, what happens is, when Amar has a large amount of *FreeTime *on certain days, like 8 hours or 9 hours, there is a constant battle in his head. One part of him wants to use this extra time productively while the other half wants to kill it. Thus, on some of these days, he is productive and spends less time on FB (thanks, guilty!), while on the others he isn’t and has a high *FBtime *value. This may not happen with low values of *FreeTime *because things are pretty structured then. Therefore, the spread of the dots around the indicative line increases with a higher value of *FreeTime*, which is shown below. This is an issue which affects our calculation of B1 and B2, but you don’t have to worry about the remedies as of now!

**SHALL I BE ON FACEBOOK MORE SO TODAY THAN YESTERDAY?**

So it seems that the factors *FreeTime *and *Bored *explain a lot about *FBtime*. We may just have found a simple explanation! But wait a second. What if your *FBtime *today depends on your *FBtime *yesterday? Doesn’t it happen that whenever we tend to spend extra time on Facebook, we console ourselves that it is just a tad more than yesterday? Or when we spend a little less than usual, we tend to spend at least as much as we did the previous day, consoling ourselves again? Well, since this has not been accounted for, the error term will capture this. But this poses problems for calculating B1 and B2 once again and one has to carry out remedial measures. But as before, don’t worry about those at this stage of introduction!

**TAKEAWAYS**

Using the data that you had collected for yourself, you created an indicative line that could tell you (approximately) how long you would spend on Facebook if your factors like *FreeTime*and *Bored *changed. The great thing about this is that *you can now predict your FBtime even for values of FreeTime and Bored which are not available in your data*. **This is the true power of ‘metrics: prediction, mostly for the purpose of controlling outcomes**. I hope Amar reads this and finds it useful!

As for your *FBtime, *try to increase your *FreeTime *allocation to other activities, be less *Bored *by spending time on your passions, and don’t base your *FBtime *on that from yesterday!

**SIMPLIFIED JARGON**

I had simplified a lot of the stuff so that even the layman could understand ‘metrics. Here is a short list of the jargon you would find in ‘metrics books, for which I had used simple terms:

Point = **Observation**

Factor = **Regressor or Explanatory Variable**

Indicative line = **Regression Line**

B1 = **Intercept**

B2 and B3 = **Regression coefficients**

Basic method applied in ‘metrics = **Ordinary Least Squares Estimation**

Simple equation = **Classical Linear Regression Model**

Error does not systematically affect the indicative line = **Zero conditional mean of the error term**

Actual *FBtime*values are, on the average, equally close to the indicative line = **Homoscedasticity**

Error terms should not be dependent on each other = **Autocorrelation**

*FreeTime*and *WorkHours*are two sides of the same coin = **Multicollinearity**

**P.S. **Amar is a fictional character (duh!).

**P.P.S. **Any resemblance of the character to a person living or dead is purely incidental. No animals were hurt during the production of this story.

**P.P.P.S.**I’m not a psychologist. If you find any irregularities regarding how the human brain actually decides what amount of time to spend on Facebook, please keep them to yourselves!

*By Mr. Shiv Hastawala*

*Former Research Associate (Econ), Indian Institute of Management Lucknow*