I was recently chatting over Whatsapp with a junior of mine who is pursuing economics in college. Let’s call him Amar (let it be a tribute to Aamir Khan’s character from the 90’s cult film Andaz Apna Apna!). Amar is generally a jovial and happening guy, but that day he was pretty subdued and forlorn, even for texting! After some pressing, he revealed, ‘I am deeply concerned about how well the economy is doing and what all would actually improve its performance. But the mathematical and statistical jargon required to understand that in the econometrics paper will make me flunk this semester. I am doomed.’ I could totally sympathize with Amar, having gone through a similar phase myself. Right then, I decided to simplify it to the extent that not just he, but any layman could understand it. I wish somebody had done for me what I am about to attempt for him.
Why would you, the one reading this, care about econometrics at all (let’s call it ‘metrics for short, makes it sound less daunting!)? Hmmm… Let’s see. Given that you are reading this over the internet, there is a good chance that you are active over social media (duh). Whether you are a student, a working professional, or a retired member of the community, you would have surely been lectured by someone or the other regarding how unproductive and time-killing it is, regardless of how addicted to it the self-professed lecturer might themselves be! Anyhow, you would want to assess it, wouldn’t you? Let’s pick Facebook, for instance. How long do you spend on it every day? What causes you to spend that much time on Facebook? And, how long would you spend on it if you could control these factors? Maybe ‘metrics could answer these questions. You might just learn something about yourself today!
OK. Let’s denote the number of hours you spend on Facebook daily by FBtime. To me, the following factors seem to influence this for you every day (there could be a host of others too, but let’s keep it simple, honey!):
- The amount of free time you have during the day (FreeTime)
- Whether you are bored (Bored)
So how you would go about it is by collecting data on each of these factors (FreeTime and Bored) and on FBtimefor yourself for each day. FBtime and FreeTimewould be measured in the number of hours, and Boredas a ‘yes’ or a ‘no’ (i.e. 1 or 0; brilliant, isn’t it!). The value of each of these for a given day would be one point. For instance, FBtime = 4, FreeTime= 6, and Bored = 1 would be a single point. After collecting this for several days, you would obtain a sea of several rows (i.e. number of days) and three columns.
KEEPING IT SIMPLE
Your ultimate aim in all of this would be to find out your FBtime given the level of each factor so that you could control your own behaviour. For example, your FBtime = 3 if FreeTime= 5. How ‘metrics helps you with that is by providing you with how many additional hours you would spend on Facebook if each of the factors were to be present or increased (as the case may be). For example, your FBtime increases from 4 hours to 4.5 hours if your FreeTime increases from 6 hours to 8 hours. For the time being, let us consider only these two for the typical Amar.
Remember the sea of data we had collected before? What if we plotted your FBtime against your FreeTime? Intuitively, the former would keep increasing as the latter goes up, as shown below.
Now you could ask why these points do not form a straight line if they have a positive relationship, and it is a legit question. The answer is that in the social arena (i.e. where human beings like Amar are the main actors), such perfect relationships are rare because we have an in-built randomness in our behaviour. If you are given 5 hours of free time on two separate days, would you definitely spend 4 hours on FB on both the days? Not quite. You might spend 3 hours on the first and maybe all of the 5 on the second. Given that, the scatterplot in itself doesn’t make much sense, right? After all, it’s just a bunch of points. To make it easier for you, we instead try to guess the position of this straight line (just assuming for the moment that the relationship was perfect) which could give us a better idea. We’ll call it the indicative line. It would look something like this:
This looks nice and gives a clear picture! As FreeTime increases, the indicative line predicts an increased FBtime. Now the difference between the indicative line and each of the dots is what we would call the errorfor each given value of FreeTime. The small lines shown in the image below (only 2 are shown, rest can be drawn as well) are the amounts of error for each value of FreeTime. What this means is that the indicative line differs from your actual FBtime by the amount of the error at that level.
Naturally, you would want to construct an indicative line that has as low a collective of these errors as possible. This is the basic method applied in ‘metrics.
Our simple version looks like this:
FBtime (line) = B1 + B2 (FreeTime)
[Notice how the words ‘time’ and ‘line’ above are pertinent to our context of Facebook?!]
B1 is the number of hours you would spend on Facebook even if you did not have any free time (how much do you think would that be for you?!). B2 is the slope of the indicative line and tells us by how much FBtime would increase with a one hour increase in FreeTime. Basically, using the sea of data on the variables, we calculate the values of B1 and B2. Once we have the values of B1 and B2, we insert any values of FreeTime to see what we can expect FBtime to be at that level. For example, given the calculated values of B1 and B2, we insert FreeTime= 7 and obtain FBtime= 2 (suppose). Therefore, the calculation of B1 and B2 is of utmost importance, and this depends a lot on the error term. If you wish to look at each of the actual values of FBtime (i.e., the dots), it would look like the following:
FBtime (actual)= B1 + B2 (FreeTime)+ error
ASSUMING IT AWAY
In essence, we can say that the error term is just a representation of the factors of FBtime that we have not included in our equation. The error is different for each point, but how they behave in a group matters. Like I said above, the behaviour of the error term is vital to the calculation of the terms B1 and B2, and if we are unable to calculate those accurately, it all becomes a moot point. Thus, to keep it simple, we assume certain things about the error(and later break those assumptions to come closer to reality; well, that is how economists work!). One of them is that the error does not systematically affect the indicative line, otherwise we would have a line that either diverges from (or converges to) the actual one or one that is at a constant distance from it. The second one says that all actual FBtime values are, on the average, equally close to the indicative lineand are thus reliable. And the third says that errors should not be dependent on each other, otherwise reliable calculation of B1 and B2 becomes an issue.
JUST FREE TIME, OR SOMETHING ELSE TOO?
I had mentioned above that we had not considered all factors for FBtime. Obviously, Amar does not have just free time during the day, right? He has college to attend and studying to do at home. Just like you may have an office to go to or work-from-home to attempt. Intuitively, if you could increase your work hours, you would have less free time and would spend less time on FB. So, it seems natural to include this factor in our exercise, doesn’t it? Let’s call it WorkHours. It should give us a clearer picture. Our simple equation then changes to this:
FBtime (line) = B1 + B2 (FreeTime) + B3 (WorkHours)
Just a slight change in how we read this. B2 now tells us by how much FBtime would change with a one hour increase in FreeTime, keeping WorkHours constant. Likewise, B3 tells us by how much FBtime would change with a one hour increase in WorkHours, keeping the effect of FreeTime aside. The idea behind B1 remains the same as before. So far so good.
But hold your horses. Don’t you think FreeTime and WorkHours are two sides of the same coin? In the 24 hours of the day, if these are the only two things you do (just for simplicity), then FreeTime + WorkHours = 24. Or, WorkHours = 24 – FreeTime. Here, I said B2 shows the change in FBtime with a one hour increase in FreeTime, keeping WorkHours the same. But if these two are related as above, then there is no way to keep WorkHours constant while changing FreeTime. If FreeTime is 8, WorkHours becomes 16. If FreeTime is 10, WorkHours becomes 14. Try it for yourself! No matter what you do, you cannot keep one constant and change the other.
If we are to include more than one factor, none of those factors should be related to each other, otherwise, you cannot calculate B1, B2, and B3. Since this is the case here, we cannot include WorkHours. Moral of the story: add a factor only if it is not related to another one; otherwise, keep it simple.
DUMMIES…. FOR DUMMIES!
A factor I had mentioned before which is not related to FreeTime and which is bound to affect how long Amar spends on FB is how bored he is. Don’t you reach out for your newsfeed several times during the day just because you don’t really have anything else to do? You got me. Let’s call it Bored. The typical Amar is either bored (1) or he isn’t (0). This factor is either present or absent. The simple equation then changes to this:
FBtime (line) = B1 + B2 (FreeTime) + B3 (Bored)
B3 now tells you that given your FreeTime, how many more hours you would spend on FB if you were bored. From the example image below, you can see that if your FreeTime = 6, then you would spend 3 hours on FB if you are not bored but around 5 hours on it if you are, indeed, bored. Such a factor which can take only separate values like 0 or 1 is called a dummy in ‘metrics.
IT’S TOO FAR APART!
In our simple equation with just FreeTime as the sole factor, we had assumed that the actual FBtime values (i.e. the dots) are equally spread around the indicative line on the average. However, in reality, this may not be the case (breaking the assumptions!). Like I said before, there might be more than one FBtime value for a given FreeTime. For instance, you may have 5 hours of FreeTime on three separate days. But you may spend 3 hours, 2 hours, and 4 hours of FBtime respectively on those separate days. Thus, we have three different dots for a single FreeTime value of 5 hours.
Now, what happens is, when Amar has a large amount of FreeTime on certain days, like 8 hours or 9 hours, there is a constant battle in his head. One part of him wants to use this extra time productively while the other half wants to kill it. Thus, on some of these days, he is productive and spends less time on FB (thanks, guilty!), while on the others he isn’t and has a high FBtime value. This may not happen with low values of FreeTime because things are pretty structured then. Therefore, the spread of the dots around the indicative line increases with a higher value of FreeTime, which is shown below. This is an issue which affects our calculation of B1 and B2, but you don’t have to worry about the remedies as of now!
SHALL I BE ON FACEBOOK MORE SO TODAY THAN YESTERDAY?
So it seems that the factors FreeTime and Bored explain a lot about FBtime. We may just have found a simple explanation! But wait a second. What if your FBtime today depends on your FBtime yesterday? Doesn’t it happen that whenever we tend to spend extra time on Facebook, we console ourselves that it is just a tad more than yesterday? Or when we spend a little less than usual, we tend to spend at least as much as we did the previous day, consoling ourselves again? Well, since this has not been accounted for, the error term will capture this. But this poses problems for calculating B1 and B2 once again and one has to carry out remedial measures. But as before, don’t worry about those at this stage of introduction!
Using the data that you had collected for yourself, you created an indicative line that could tell you (approximately) how long you would spend on Facebook if your factors like FreeTimeand Bored changed. The great thing about this is that you can now predict your FBtime even for values of FreeTime and Bored which are not available in your data. This is the true power of ‘metrics: prediction, mostly for the purpose of controlling outcomes. I hope Amar reads this and finds it useful!
As for your FBtime, try to increase your FreeTime allocation to other activities, be less Bored by spending time on your passions, and don’t base your FBtime on that from yesterday!
I had simplified a lot of the stuff so that even the layman could understand ‘metrics. Here is a short list of the jargon you would find in ‘metrics books, for which I had used simple terms:
Point = Observation
Factor = Regressor or Explanatory Variable
Indicative line = Regression Line
B1 = Intercept
B2 and B3 = Regression coefficients
Basic method applied in ‘metrics = Ordinary Least Squares Estimation
Simple equation = Classical Linear Regression Model
Error does not systematically affect the indicative line = Zero conditional mean of the error term
Actual FBtimevalues are, on the average, equally close to the indicative line = Homoscedasticity
Error terms should not be dependent on each other = Autocorrelation
FreeTimeand WorkHoursare two sides of the same coin = Multicollinearity
P.S. Amar is a fictional character (duh!).
P.P.S. Any resemblance of the character to a person living or dead is purely incidental. No animals were hurt during the production of this story.
P.P.P.S.I’m not a psychologist. If you find any irregularities regarding how the human brain actually decides what amount of time to spend on Facebook, please keep them to yourselves!
By Mr. Shiv Hastawala
Former Research Associate (Econ), Indian Institute of Management Lucknow