Kopec Explains Software
Computing concepts simplified
3 years ago

#5 How does the Internet work?

You use it every day, but do you know its many layers?

Transcript
Speaker A:

You use it every day and it's such a big part of your life. But what really is the Internet? Welcome to COPEC Explain Software, the podcast where we make computing intelligible.

Speaker B:

Hi Dave. Today we have an interesting and important question, I think, which is how does the Internet work?

Speaker A:

Well, that's really a very large question. There's so many different components to it. So where do you want to start?

Speaker B:

I guess we can just start with a really brief history of the Internet.

Speaker A:

Well, the Internet actually got started as a government program. Actually it was part of the Defense Department. The Defense Department sponsored an advanced research network called the ARPANET. And that got started in the late 1960s, early 1970s. And they actually got a lot of universities involved and a lot of universities connected to one another to exchange research materials primarily. And so this went on for about 20 years. And then in 1990, the ARPANET, which was that Defense Department research network, started to become commercialized. And yes, Al Gore actually did have a part of it. He was one of the sponsors of the legislation that took this government network and privatized it and turned it into a commercial network that we now know today as the Internet. And that happened around the year 1990 and the full commercialization was finished by the mid 1990s. I think 1990 is also a really important year for the Internet, not only because that's when it became a commercial entity, but also because that's when the Web was created. Now I want to really be clear that the Web and the Internet are not the same thing. The Web is one part of the Internet. The Internet is a much larger set of different protocols and we'll talk today about what protocols are and a set of different technologies and the Web is just one part of it. And so it's kind of like when you talk about a tree. A tree is part of a forest, right? But a tree is not a forest. So the Web happens to be the most important part of the Internet. It's the part we use by far the most and it's also the part that really caused it to become popular. So it's no coincidence that the World Wide Web came out in 1990 and then a few years later the Internet really took off and became something that people were connecting to from their homes. It was really the protocol and the application that really excited people and took the Internet to the level that we know of today.

Speaker B:

So I've heard you describe the Internet as a networking technology in the past. Can you talk a little bit about that?

Speaker A:

All that networking means is how do computers talk to one another? So how is it that we take data from one computer and get it to another computer? So you can have a local network, you can have a network just in your home between just a few different computers and they can talk to one another and that's networking. And you can of course also have a global network like the Internet that allows any computer connected to it to talk to any computer anywhere else on the network. So it really just means computers talking to each other. But there are different what we call layers of networking. I think we'll get into more of those later in this episode.

Speaker B:

So if I have to visualize the Internet as a whole, there's different components of it in different parts and really it's about different technologies and computers communicating with one another and sending each other information, is that correct?

Speaker A:

Yeah. And they send that information in what we call packets. So packets are little chunks of information that have a place that they started from and a place that they're going to. And they're not usually all the information that you want to communicate at any given time. They're usually just little chunks that can be transmitted efficiently across the network. And so for example, let's say you want to send somebody a text document that is 64 KB, right? Well, sending the whole 64 once could actually be problematic in terms of causing congestion. So what we might want to do is chop it up into littler bits and maybe arbitrarily we say for now that they're 1 KB each. And so we send 64 1 KB packets across the network and then we might reassemble them back on the other side to create the 64 kilobyte document. So packets are little chunks of information that we send across the network. And what the internet is constantly doing is sending those packets and correctly guiding them through all the different computers along the network till they get to their destination.

Speaker B:

The way you're describing it is like it's a physical thing of this packet of information.

Speaker A:

Well, there is real physical devices, of course, so the type of devices that we're thinking about usually are the end computers. So that's going to be usually a client or a server. A client being something that requests information and a server being something that provides information. But there's also what's called peer to peer protocols. And that's where we really just have clients. So we just have the end computer user and the other end computer user talking directly to one another. But we have clients and servers. So those are like the major computers on the network that are communicating with one another. And then we also have routers, we also have links, we also have switches. And these are basically hardware devices that are responsible for causing the packets to navigate correctly to the next point in the network that gets them closer to their destination. So they're kind of like guideposts saying, okay, you're here right now, I'm going to direct you this way so you get to the next guidepost before you get to the destination, whether that destination be a server or a client. So all the time it's all about routing. We have all these packets coming through and we need to make sure they're going in a direction that's eventually going to get them to the place that they want to go efficiently. And we don't always route things perfectly. Sometimes, let's say I'm communicating here from Vermont and I want to get a packet out to Florida. I don't always go directly down the Eastern Seaboard in the most efficient way from Vermont to Florida. I might sometimes navigate that packet over to Chicago first, or maybe up to Canada before it gets down to Florida. The reason for that can be many, but actually it's a very hard problem to perfectly route something through many possible stops along the way to its destination. So we don't always do a perfect job. We use heuristics to try to get pretty close to doing a perfect job, but all kinds of things can come up in the meantime and we still need it to be reliable. I'll tell you what some of those things are. Maybe there's a power outage somewhere along the way, maybe there's just a lot of traffic. So maybe there's so much traffic going on in New York that when my packet gets down in New York, it actually gets lost in the shuffle because there's so much traffic that it can't keep up with the amount of traffic going through maybe some router in New York. So there can be all kinds of problems that happen. We don't necessarily go the right direction always to get as efficiently as possible to the destination, and yet it still works amazingly. Reliably right. The internet is incredibly reliable. There's a lot of robustness built in, there's a lot of parity checking built in, and there's a lot of really great engineering that went into the original protocols that has really made them stand the test of time. Here we are, 30 years since the Internet's been commercialized and we're still really running a lot of the same protocols as we were 30 years ago.

Speaker B:

Can packets ever wind up, say they are a text document and you're sending a certain size of one? Can they wind up in the wrong order, where then the sentences that you wrote are out of order?

Speaker A:

Absolutely. And that actually happens all the time. So typically the packets will not if there's a lot of them, they'll not all arrive in the right order. And it's the responsibility of one of the protocols. We'll get into the different protocols a little bit later to actually take them and reassemble them and put them back into the right order. So for example, let's say I was sending you five packets and they originally were in a very specific order, so they were numbered one through five and packet five got there before packet three. It's actually the responsibility of the destination to go and take those packets look at the order that they were supposed to be in and reassemble them back into the right order. So it happens all the time, actually, that packets do not come in the right order, but we have built in robustness checks in place to reassemble them. And we actually also have checks in these protocols to make sure that if some packet didn't come through, we request it again. So it can very much happen that okay, we sent packets one through, five out, but it turns out the destination never got packet four. And so the destination says, hey, I got packet five, but where's four? And maybe there's a set certain amount of time that we were waiting and we usually call that a timeout and we still didn't get it. And so we go back and we say back to the original sender, hey, I never got four, please send it again. And then four sends it again. And there can be a certain number of what we call retries. And if several of those retries fail, we then say, okay, this whole thing's not working out and maybe then there's a failure there. But usually it does work out. Usually we get things pretty quickly and that's all to do with the fact that our physical technology, so what we call the link layer and we'll get into that a little bit later, is actually pretty great.

Speaker B:

So the Internet is pretty much all this information and these packets getting sent all over the place and these protocols and hardware in place to help make sure that what's being sent winds up at its destination in the right order. And you've said there are some different layers to the Internet. Can you talk a little bit about those layers and how we interact with them?

Speaker A:

Yeah, there's really four different layers, and depending on what textbook you learned it in or where you read about it, you might hear different names for them. But four of the layers are known typically as the link layer, the Internet layer, the transport layer, and the application layer. And let me tell you just a little bit really briefly about each of them. So the link layer is really the physical layer. It's how do we actually get information physically from one component to the next component of the network. And so that's technologies you probably heard about like Ethernet or WiFi, how do we actually and WiFi is how we wirelessly connect our computers together, right? Ethernet is how we connect our computers usually with our cat cables together. Right? But that's the first point, right? That's the physical connection and the protocol for the devices to physically communicate with one another and say to each other, okay, directly the next computer over, did you just get this thing I just sent? Okay, then we have the Internet layer, and the Internet layer is about the actual routing of those packets. So it's like, how do we describe a packet and how do we route it from one component of the network to the next component of the network. The transport layer is what we talked about earlier, about ensuring the reliability of that transmission and assembling the packets back into the right order and re requesting packets when we need them and making sure that packets are actually getting to where they were intended to get to. And then the most interesting layer to me and probably to most programmers, is the application layer. And so this layer sits on top of all the other layers. And this is where we actually do the things that we actually want to do, like browse the web or read our email or talk to somebody on a video chat. So the application layer is where all the technology is in place that actually allows us to get information that's intelligible and actually do something with that information. And so the application layer sits on top of the other layers, and most of those other layers will be the same for various different applications. So whether I'm using my web browser or I'm using email, if I'm on the same computer, typically the transport layer, the Internet layer, and the link layer will all be the same. And so it's nice that we have this kind of pluggable architecture where we can plug different applications on top of the same basic working software infrastructure.

Speaker B:

So the application layer is what I as a user, or feel like I'm really interacting with. And these other layers are kind of happening behind the scenes to make everything work and send and receive the information that's needed.

Speaker A:

I think that's a totally fair way of thinking about it. And the application layer is going to be things that you're familiar with every day. But we have actual languages that we speak between computers to actually make them happen. So let me give you some examples. So use the web. Everyone uses the web all the time. And when you use the web, you might be familiar that sometimes you type into the address bar at the top of your web browser. Http, right? Most modern web browsers don't require you to type it in anymore. But believe it or not, early web browsers actually required you to type that in every time. But anyway, that http It actually stands for Hypertext Transport Protocol. And you might wonder, okay, what is hypertext? What is transport? We're going to get more into Http specifically in our next episode. You have to wait till next week. How does the web work? But Http is the protocol that we use for web browsers to communicate with web servers. In other words, a protocol is a language that allows two computers to communicate with one another. So it's a way of saying in the terms of the web, hey, I want this web page. And then the server says, okay, here it is. And then of course, there's data attached to that that actually allows us to see the web page. And so it's a way of two different computers exchanging information in a way that makes sense for one particular application. And in the case of Http, it's for web browsers and web servers, but there's all kinds of other protocols. There's tons of protocols, and there's actually a lot of different protocols in the application layer that you use on a regular basis. So protocols, you might be familiar with IMAP, pop three, SMTP, and you might only be familiar with the application that they're used for, which is email. So I'm sure you've heard of email because I'm sure you use it all the time, but just below the surface, you don't ever type in like SMTP into your email client or pop three like you used to do Http in your web browser, but just below the surface. Those are the protocols, or in other words, the languages that email clients use to talk to email servers, and also that email servers use to talk to each other. And so there has to be a language for every application. Basically. There's always some kind of protocol for each application. So email has SMTP, pop three, IMAP web browsers have Http, Https, but then we also have, let's say, a protocol for Skype, or a protocol for Kaza, or New Tele, if you've ever used those file sharing programs, or if you ever use Internet Relay Chat. It has its own protocol, IRC. Right? There's a different protocol for basically every different Internet application. I should also mention the names of two other protocols that you've probably actually heard of, TCP IP. So TCP stands for Transport Control Protocol, and it's actually that transport layer that we talked about earlier. It's about, oh, I got all these packets, did I get them reliably? Can I put them back into the right order? And IP stands for Internet Protocol, so that actually defines what is a packet and how do I route a packet throughout the Internet network. So we have the TCP and IP basically being the same for any application, whether we're doing a web browser, an email client, whatever. But then we have the application protocols, http, SMTP, FTP file transport protocol, another one you might have heard of, those ones are changing from application to application.

Speaker B:

So protocols are the standard communication tools for a specific application.

Speaker A:

Yes. And then, like I said, there's also these transport protocols and Internet protocols and link protocols for that matter, that are sitting below the application protocols and are the same for multiple different applications.

Speaker B:

Interesting. So what about a domain name? What is that actually?

Speaker A:

So a domain name is just a human readable way of talking about a server, is one way of thinking about it. So, for example, you're probably familiar with@yahoo.com, right? And when you go to@yahoo.com in your web browser, what's actually happening is we have to figure out the address of the server that's going to give you the@yahoo.com web page. How do we figure out where that server is? Well, we have something called a domain name server. And what a Domain Name Server does is it maps human readable names like@yahoo.com to what are called IP addresses. And so that stands for Internet Protocol Address. And every computer on the Internet has its own Internet Protocol address. You've probably seen them before. They usually are four numbers separated by three periods. So something like 1921-6811, something like that. And there's a newer Internet protocol called IP version Six. And it has much, much longer set of characters to represent each address because what actually happened is we're actually running out of Internet Protocol addresses. Believe it or not, we have so many devices connected to the Internet that we've actually used up the address space. And we have all kinds of ways of mitigating that, such as changing the Internet Protocol Address between the switch that's at the end of your neighborhood and the one that's used internally within your house and such. But the fact of the matter is we needed a newer set of addresses, so we'd have a much bigger address space. And so we're slowly moving over from IP version four to IP version six. But I went on a bit of a tangent there. Anyway, so the reason for Domain Name servers is that it would be very hard for you to remember that IP address. Let's say the IP address is 2556-4836, right? That would be hard to remember. Right, but remember in@yahoo.com that's not that hard to remember, right? And so we have these servers that go and figure out what is the IP address for us so we don't have to type it in every time. And that's what a DNS server does.

Speaker B:

Actually, are on the Internet.

Speaker A:

Right? The IP address is like, where is it actually? And that's how the Internet Protocol actually routes, is by going from one IP address to another IP address. But so what a Domain Name Server has on it basically is a giant table. You can think about it as a giant table with two columns. One column is the domain name. So that's going to be like@yahoo.com Harvard.edu, USA, Gov, whatever the domain name. And then the second column is the IP address that is, let's say, the main server for that domain name. That then might further root it another way. Okay, but there has to be some initial place that we send the packet to and then it gets rooted again within the local network of@yahoo.com or Harvard.edu or whatever. But yeah, we need to have some way of actually making these human readable names intelligible to the computing systems. And that's the purpose of the DNS server. So every time you make a request and you type in an address in your web browser, these DNS servers are getting contacted and they're doing that translation and looking up, what is the IP address that we want to actually connect to?

Speaker B:

Interesting. So growing up, or when we first got the Internet in my house, right, we use dial you typed in and you heard the I wish I could do the different tones. Yeah, exactly. And now obviously we don't have to do that. We don't even need to be connected physically to anything. So what are the different ways that we can connect to the Internet?

Speaker A:

Yeah, so at that point, we're thinking about the link layer. Typically that's more of the physical layer, and there are quite a few different ways. So it's a different link layer. When you use your cell phone and you're connecting over what's called LTE to some cell tower somewhere than it is when you're in your house and you're using WiFi to your WiFi router. And then your WiFi router is probably connected to a cable modem. And then that cable modem is connecting over literally the same cable that you get cable TV with. So there's many different physical ways of connecting to the Internet. What you were describing earlier, and that a lot of us that grew up in the remember, is connecting over a dial up modem. And so that was actually using the phone lines as our physical layer. And the phone lines obviously were not designed for the Internet, and they were very noisy. In fact, there was a lot of interference on the phone lines. And so there were some real physical limitations about how fast a connection not to mention that we tied up the phone so that our parents couldn't make any phone calls. But there were some real physical limitations about how fast we could actually transmit data. And we basically tapped out in the late 1990s at 56 KB, not kilobytes. If people don't know the difference between a byte and a bit, go back to episode three of our podcast, what is a Byte? But we were basically tapped out at 56 Kbps, which is not a lot that's pretty small, especially for modern applications like streaming video. It's basically impossible to do at 56, least in any way that you'd actually want to watch. So the phone lines weren't great. What's been much more effective for broadband Internet has been things like DSL, which stands for Digital Subscriber Line or cable modems. As you might imagine, you need a lot of bandwidth actually just to do cable television, right? So luckily we had these pipes that a lot of the world was already hooked up with in the form of the cable that we already had laid for cable television. And that turned out to be a very effective way to deliver Internet that is many orders of magnitude faster than the dial up Internet that we had in the 1990s. But we also today, a lot of us are connecting through our cell phones all the time. And the speed of those connections, those wireless connections to cell towers has evolved tremendously over the past couple of decades. We went from being about as slow as dial up in the early OS to now being almost as fast, if you look at the latest four G and five G networks as pretty standard cable internet. So we've really made an incredible advance in the bandwidth capability of our wireless internet protocols.

Speaker B:

It's really interesting to think about how much it has developed, how the Internet has really grown as we've become more and more ubiquitous in our everyday lives and the ways in which that we're connecting to it and the fact that there's so much happening just below the surface of the applications that we're using, and it's become so integral to the way that we interact with the world.

Speaker A:

Yeah. Two other items I want to mention. One is that I mentioned earlier that this routing is not always the most accurate thing, and it's actually the means of attack sometimes. There's some pretty famous cases where people's packets were getting rooted to all kinds of strange places that you wouldn't really expect. So, for example, there was a point, I think it was about ten years ago, where there was this hacker group that rerouted a lot of packets over to Belarus. Why did they want to do that? Well, they wanted to scan the packets and try to get the information out of them before they got to their destination. So how can we protect against attacks like that? And you might be thinking also the fact that these packets kind of go around all over the place, jump around all these different servers. Like, for example, if I'm connecting from Vermont to Florida, my packets might actually stop at like 20 different physical devices along the way from Vermont to Florida. And so you might be pretty worried, like, okay, does that mean that there's 20 different points where somebody could be reading my information? Well, the answer is yes, there are 20 different points where somebody could be reading your information. And so our protection against that is encryption. And encryption is such a vital part of the internet. Encryption is about using mathematical formula to actually go and scramble, for lack of a better word, the information in such a way that only a particular computer that holds what we call a key can descramble the information. And without we can maybe do a whole episode on encryption, but without getting into all the details. Typically the way it works is I scramble it in such a way at the starting computer, and I have one key that's used for scrambling it, and then there's another key at the destination computer, and only that computer that holds that other key can descramble the information and see the original information again. And without this encryption, there's a lot of things we wouldn't feel comfortable doing on the Internet. For example, credit card transactions. Because if we weren't encrypting it, then at every point where the packets stopped along the way and even sometimes in between the places when it's going through the air wirelessly, other devices can pick up on it. Right. We would feel scared that our credit card could get stolen very easily. And in fact, in the early internet, in the mid 1990s, there were a lot of websites and other internet applications that were sending credit cards and sensitive information without encryption and there were people who were just harvesting them and stealing a lot of people's identity and other personal information. And so encryption is actually a critical component of the internet. We couldn't do all the things that we're excited about today, such as online shopping, such as having private communication with our friends and family, such as feeling secure that our health records, that our government records, that our school records, are not going to be easily tapped into by other people. So encryption is a critical component of the internet and often, I think, underappreciated by the general public.

Speaker B:

Yeah, I mean, thinking about all the data that we send nowadays back and forth between different services and all the different places we're logging onto, it is so important that our data is kept secure. And I think you're right, that's not something we always think about where it could go wrong and how much infrastructure is built in to make sure that it's not.

Speaker A:

Yeah. So encryption, even though it's not one of the official layers, it's not the link layer, the internet layer, the transport layer, it is sometimes part of the application layer. A lot of the application protocols specifically say, oh, all the data has to be encrypted in this way. It's still an enabling technology of the internet and one that we really, really need. And so there's a lot of talk in the country, actually, and in the wider world too, about whether or not governments should be able to break into encrypted systems. I don't want to get into a political discussion, but the danger is that if one actor, let's say, whether it be the government or a criminal or somebody else, has the ability to break into these encrypted systems, well then other people are probably going to find that way in as well. And so we want to be really careful about what we wish for. There's a real balance to be struck there. But I'm not actually sure there's that much of a balance because the problem is this technology exists. It's been out there for a long time. And if we just stop regular people from having fully encrypted servers, the criminals will still have full access to the technology. The technology doesn't cost anything. There's open source implementations of just about every valuable encryption protocol and so criminals are never not going to have access to encryption. But if we then go and hinder the encryption on regular applications, like our banking applications or our messaging applications, all. That means is that regular people can now maybe get attacked by criminals, too, because they'll find ways in those backdoors, too, and the criminals will keep encrypting their own information because you can't take the genie out of the bottle. These technologies are there and freely available. Even if you outlawed them, it wouldn't stop people who are already doing illegal things from using them, because it's as easy as downloading a piece of software to use them.

Speaker B:

Well, that's a little bit of a scary thought to end on, but I think an important one to keep in mind and for us to understand in these discussions and news stories that we hear about encryption, of what's at stake.

Speaker A:

Yeah, there's a lot at stake, but it's really been wonderful, too, because how could we be doing all these things we enjoy doing without it? So I don't want to say and on a negative note and on a positive note, of what a great technology encryption is, because it enables us to do our banking online. It enables us to do online shopping. It enables us to communicate with each other without being worried that a criminal is getting access to our communications.

Speaker B:

Absolutely.

Speaker A:

All right, well, it's been great talking to you about the Internet, and we're going to kind of follow on this topic next week with what?

Speaker B:

How does the web work?

Speaker A:

Right. Looking forward to that. Okay, well, thanks for listening.

Speaker B:

Have a good week and we'll talk to you soon.

Speaker A:

Also, don't forget to subscribe to us on your podcast player of choice and leave us a review on Apple Podcasts overcast Spotify. Wherever you listen to podcasts, it really helps out with the show. Thanks.

The Internet is not a single technology—it’s a combination of networking technologies including protocols, physical devices, and software. In this episode we delve into its many layers and try to provide an intuitive understanding about how they all fit together. We cover topics like routing, packets, application protocols, and encryption.

Theme “Place on Fire” Copyright 2019 Creo, [CC BY 4.0] (https://creativecommons.org/licenses/by/4.0/)

Find out more at http://kopec.live