Data traces 2: Call my website and I'll tell you, who you are [Update]
This is the second part of the series “data traces”, a bunch of posts that explain, where you leave intended or unintended data behind and why this is good or bad. You can find more information in the initation post of this series.
In this part I want to tell you, which data are seen by a website respictively it's operator, once you have accessed it.
While you have accessed this page, some minutes ago, you either typed in the address (=URL) in the address bar of your browser or you were following a link. So you've send a query to the host on which this site is (=Webserver). This query is saying somehting like “ Please give me the site http://xzy.net/i/want/this/site.html”. Then the webserver answers with a “Ok, here we go: [data, data, more data...]” This “conversation” is following certain rules - a protocol, in this case it's called HTTP. The webserver needs to know an “address” to which the data packets should go. This address is called IP-address. This is sent with with the website query as well as with the answer. But, the topic of this post is actually data traces and not an introduction to the functionality of the webs, right? This pre-skirmish was needed, because now I can explain which data is also sent when you request a normal website and so that you are leaving traces behind.
IPaddress
You already know, that the IP-Address is transfered during a normal HTTP transfer. But I want to go deeper into that point. What does this IP address really saying? Each IP address must be unique at this moment and on that planet, so that the packets “know” where to go. Thus you are quite clear identifiable. Of course the IP address doesn't contain you name, but if someone would asking your internet provider, to whom the IP 217.229.253.17 was assigned at time x, the result is very clear. And because system administrator are folks, that like to have material to analyse (just in the case of a failure), accesses are loged very often. Of course this only alarming if you are doing incorrect things. But that's not the end of this post...
Each internet provider has a special space for addresses, that he can assign. This space is known and is mostly divided in regions. So you can't find out the exact street of the actual owner of a special IP address (like you can see in some not so real movies), but the near region is mostly quite correct. If you are wondering why the ad on the one or the other fun website shows largely more or less attractive people of your region, that's because the website can detect your area with your IP address.
Where do you come from?
Another statement that is part of a normal HTTP transfer is the so called referer. This in fact quite harmless statement is often the reason for the largest amazement when I'm telling people about it. The referer is simply declaration where you are coming from. If you have accessed the homepage of this blog and after that you were following a link to that post, you have sent the referer http://geeks-have-feelings-too.net/ while you're browser has made the query for the second transfer. That's not tragically, but it may be used for monitoring your way through a website. But it's getting more interesting, if you were following a link that posted in a forum for example. The operator of the linked website could possibly notice the URL you were coming from. For me as a websiteoperator it's interesting, because amongst other things I can see which search terms people have typed in at Google for example. I can also which sites have set links to my page, when visitor from this site are visiting my page. The user is however relatively transparent, because his way is traceable and to say it exaggreated, he can be hunted.
Which software are you using?
Also in an ordinary HTTPtransfer there is the so called “user agent string” sent. This a term, that differs from browser to browser and it stands for the name of the browser. Usually a statement about the operating system is provided, too. Thus it's possible to build statistics with it, for example which software spreaded. That's interesting for technic freaks like I am.
How big is your monitor?
Let's come to JavaScript. In fact JavaScript is a technology to manipulate the website after it was loaded without an extra data transfer (for my technical advanced users: I know this sounds more like a definition for AJAX, but I think the sense of JavaScript was always to manipulate the document dynamically on the client. I think the definition is okay.). In the first moment, that sounds not so dangerous and spying. And the size of your monitor is not gettable by this techology. But with JavaScript the resolution of your screen and other technical properties of your system are pickable. Afterwards the data are sent to the website in the background. With a special tracking technic (that is mostly working with cookies) the user can be traced over many sites.
Summary: Why is HTTP and all that so evil?
No, it isn't. HTTP wasn't developed to collect the data of internetusers. There are only some possibilites that are amongst other interesting for statistic systems. Especially in the business world the companies want to know how often their sites are visited and where the people where comming from and how much the expensive “search engine specialist” has really done.
But please be aware, that you are not so anonym in the internet. If govermental instruments want to know, who had a sepcial IP adress at a certain time, because there was a webserver intrusion with that address or someone has donwloaded copyright protected material using this address, the only thing they have to do, is to ask the provider to give them the information. And in the most cases the provider will not make any problems. As well if someone was breaking the rules in a forum, he/she has reckong that his/her account or IP adress will be locked out. So, behave yourself.
This article was posted in english on 14th of august 2006. Due to technical system attributes the displayed date of this post is the date of the original artice, that was written in german.
Trackback(s)
A trackback is a kind of a source link. If i was finding an interesting entry in another blog, that i am commenting in my own blog, i am linking to the original message via the trackback. Has someone commented one of my entries in his/her weblog, you can find this link here under trackbacks, too. More informations about trackback are in the Wikipedia for example.
Trackback URL: /trackback/data-traces-2-call-my-website-and-ill-tell-you-who-you-are/