{"id":1106,"date":"2015-10-18T20:09:29","date_gmt":"2015-10-19T00:09:29","guid":{"rendered":"https:\/\/www.redline13.com\/blog\/?p=1106"},"modified":"2022-01-03T20:01:41","modified_gmt":"2022-01-04T01:01:41","slug":"http-archive-and-google-big-query","status":"publish","type":"post","link":"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/","title":{"rendered":"HTTP Archive and Google Big Query"},"content":{"rendered":"<p>The HTTPArchive has a mission of recording the performance information and makeup of the web.<\/p>\n<blockquote><p>In addition to the content of web pages, it&#8217;s important to record how this digitized content is constructed and served. The <a href=\"http:\/\/httparchive.org\/\">HTTP Archive<\/a> provides this record. It is a permanent repository of web performance information such as size of pages, failed requests, and technologies utilized. This performance information allows us to see trends in how the Web is built and provides a common data set from which to conduct web performance research.<\/p><\/blockquote>\n<p>They do not track every\u00a0domain but limit themselves to the Alexa 1,000,000<\/p>\n<blockquote><p>Starting in November 2011, the list of URLs is based solely on the <a href=\"http:\/\/www.alexa.com\/topsites\">Alexa Top 1,000,000 Sites<\/a> (<a href=\"http:\/\/s3.amazonaws.com\/alexa-static\/top-1m.csv.zip\">zip<\/a>). Use the <a href=\"http:\/\/httparchive.org\/urls.php\">HTTP Archive URLs<\/a> page to see the list of the top 10,000 URLs used in the most recent crawl.<\/p><\/blockquote>\n<p>The URL is loaded 3 times via IE9 and iPhone4 and the data from the median run is used to populate the HTTPArchive database.<\/p>\n<ul class=\"indent\">\n<li>the test agents are located in Redwood City, CA<\/li>\n<li>the default WebPagetest connection speed is used<\/li>\n<li>empty cache (&#8220;first view&#8221;)<\/li>\n<li>Data is collected via a <a href=\"http:\/\/httparchive.org\/about.php#harfile\">HAR file<\/a>.<\/li>\n<\/ul>\n<p>Data is available as a CSV or Mysql dump and is generated\u00a0on the 1st and 15th of every month. \u00a0Full instructions\u00a0and files are available\u00a0via their download page [<a href=\"http:\/\/httparchive.org\/downloads.php\">http:\/\/httparchive.org\/downloads.php<\/a>].<\/p>\n<blockquote><p>The results from each crawl are saved as MySQL dump files in both MySQL format and CSV format. Dumps are made for both the desktop and mobile crawls.<\/p><\/blockquote>\n<p>Even\u00a0easier Ilya Grigorik\u00a0in 2013 started pushing the data into Google Big Query. \u00a0His post has the information you need if you want to start doing the queries yourself &#8211;\u00a0<a href=\"https:\/\/www.igvita.com\/2013\/06\/20\/http-archive-bigquery-web-performance-answers\/\">https:\/\/www.igvita.com\/2013\/06\/20\/http-archive-bigquery-web-performance-answers\/<\/a><\/p>\n<blockquote><p>Well, good news, now you can satisfy your curiosity in minutes (or seconds, even). The full HTTP Archive dataset is now available on BigQuery! To get started, <a href=\"https:\/\/developers.google.com\/bigquery\/sign-up\">signup for BigQuery<\/a> and head to <a href=\"https:\/\/bigquery.cloud.google.com\/\">bigquery.cloud.google.com<\/a> and\u00a0&#8230;<\/p><\/blockquote>\n<p>I don&#8217;t know much of the history of\u00a0bigqueri.es other than their &#8216;about&#8217; page. \u00a0The goodness however has come by people asking questions about the HTTP Archive data set and many providing example queries as starting points for your own curiosity. \u00a0\u00a0<a href=\"http:\/\/bigqueri.es\/c\/http-archive\">http:\/\/bigqueri.es\/c\/http-archive<\/a><\/p>\n<blockquote><p>Community for curious minds exploring big data with the help of Big Query!<\/p><\/blockquote>\n<section class=\"about admins\"><\/section>\n<p>So what can we do with this?<\/p>\n<ul>\n<li><a href=\"https:\/\/www.redline13.com\/blog\/2015\/10\/general-slow-down-for-first-byte-across-many-sites\/\">We started looking at the details of the decline in performance for Time To First Byte<\/a><\/li>\n<li><a href=\"http:\/\/bigqueri.es\/t\/how-frequently-do-servers-use-chunked-encoding-for-various-content-types\/600\">How frequently do servers use chunked encoding for various content types?<\/a><\/li>\n<li><a href=\"http:\/\/bigqueri.es\/t\/are-popular-websites-faster\/162\">Are Popular Websites Faster?<\/a><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The HTTPArchive has a mission of recording the performance information and makeup of the web. In addition to the content of web pages, it&#8217;s important to record how this digitized content is constructed and served. The HTTP Archive provides this record. It is a permanent repository of web performance information such as size of pages, failed requests, and technologies utilized. This performance information allows us to see trends in how the Web is built and provides<a class=\"more-link\" href=\"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/\">Read More &rarr;<\/a><\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":{"0":"entry","1":"post","2":"publish","3":"author-richardfriedman","4":"post-1106","6":"format-standard","7":"category-uncategorized"},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.12 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>HTTP Archive and Google Big Query - RedLine13<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"HTTP Archive and Google Big Query - RedLine13\" \/>\n<meta property=\"og:description\" content=\"The HTTPArchive has a mission of recording the performance information and makeup of the web. In addition to the content of web pages, it&#8217;s important to record how this digitized content is constructed and served. The HTTP Archive provides this record. It is a permanent repository of web performance information such as size of pages, failed requests, and technologies utilized. This performance information allows us to see trends in how the Web is built and providesRead More &rarr;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/\" \/>\n<meta property=\"og:site_name\" content=\"RedLine13\" \/>\n<meta property=\"article:published_time\" content=\"2015-10-19T00:09:29+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-01-04T01:01:41+00:00\" \/>\n<meta name=\"author\" content=\"Rich Friedman\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Rich Friedman\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/\"},\"author\":{\"name\":\"Rich Friedman\",\"@id\":\"https:\/\/www.redline13.com\/blog\/#\/schema\/person\/0fadb7f3ef665407f3c93c8ec84e741a\"},\"headline\":\"HTTP Archive and Google Big Query\",\"datePublished\":\"2015-10-19T00:09:29+00:00\",\"dateModified\":\"2022-01-04T01:01:41+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/\"},\"wordCount\":430,\"publisher\":{\"@id\":\"https:\/\/www.redline13.com\/blog\/#organization\"},\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/\",\"url\":\"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/\",\"name\":\"HTTP Archive and Google Big Query - RedLine13\",\"isPartOf\":{\"@id\":\"https:\/\/www.redline13.com\/blog\/#website\"},\"datePublished\":\"2015-10-19T00:09:29+00:00\",\"dateModified\":\"2022-01-04T01:01:41+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.redline13.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"HTTP Archive and Google Big Query\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.redline13.com\/blog\/#website\",\"url\":\"https:\/\/www.redline13.com\/blog\/\",\"name\":\"RedLine13\",\"description\":\"(Almost) Free Load Testing in the Cloud\",\"publisher\":{\"@id\":\"https:\/\/www.redline13.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.redline13.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.redline13.com\/blog\/#organization\",\"name\":\"RedLine13\",\"url\":\"https:\/\/www.redline13.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.redline13.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.redline13.com\/blog\/wp-content\/uploads\/2013\/06\/cropped-rl13-header-logo.jpg\",\"contentUrl\":\"https:\/\/www.redline13.com\/blog\/wp-content\/uploads\/2013\/06\/cropped-rl13-header-logo.jpg\",\"width\":300,\"height\":68,\"caption\":\"RedLine13\"},\"image\":{\"@id\":\"https:\/\/www.redline13.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.redline13.com\/blog\/#\/schema\/person\/0fadb7f3ef665407f3c93c8ec84e741a\",\"name\":\"Rich Friedman\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.redline13.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/8651ce662fc18353b90c1922f9d29efb01173afa5500224b4d9a355d858a7bd9?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/8651ce662fc18353b90c1922f9d29efb01173afa5500224b4d9a355d858a7bd9?s=96&d=mm&r=g\",\"caption\":\"Rich Friedman\"},\"sameAs\":[\"http:\/\/richardfriedman@yahoo.com\"],\"url\":\"https:\/\/www.redline13.com\/blog\/author\/richardfriedman\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"HTTP Archive and Google Big Query - RedLine13","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/","og_locale":"en_US","og_type":"article","og_title":"HTTP Archive and Google Big Query - RedLine13","og_description":"The HTTPArchive has a mission of recording the performance information and makeup of the web. In addition to the content of web pages, it&#8217;s important to record how this digitized content is constructed and served. The HTTP Archive provides this record. It is a permanent repository of web performance information such as size of pages, failed requests, and technologies utilized. This performance information allows us to see trends in how the Web is built and providesRead More &rarr;","og_url":"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/","og_site_name":"RedLine13","article_published_time":"2015-10-19T00:09:29+00:00","article_modified_time":"2022-01-04T01:01:41+00:00","author":"Rich Friedman","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Rich Friedman","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/#article","isPartOf":{"@id":"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/"},"author":{"name":"Rich Friedman","@id":"https:\/\/www.redline13.com\/blog\/#\/schema\/person\/0fadb7f3ef665407f3c93c8ec84e741a"},"headline":"HTTP Archive and Google Big Query","datePublished":"2015-10-19T00:09:29+00:00","dateModified":"2022-01-04T01:01:41+00:00","mainEntityOfPage":{"@id":"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/"},"wordCount":430,"publisher":{"@id":"https:\/\/www.redline13.com\/blog\/#organization"},"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/","url":"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/","name":"HTTP Archive and Google Big Query - RedLine13","isPartOf":{"@id":"https:\/\/www.redline13.com\/blog\/#website"},"datePublished":"2015-10-19T00:09:29+00:00","dateModified":"2022-01-04T01:01:41+00:00","breadcrumb":{"@id":"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.redline13.com\/blog\/2015\/10\/http-archive-and-google-big-query\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.redline13.com\/blog\/"},{"@type":"ListItem","position":2,"name":"HTTP Archive and Google Big Query"}]},{"@type":"WebSite","@id":"https:\/\/www.redline13.com\/blog\/#website","url":"https:\/\/www.redline13.com\/blog\/","name":"RedLine13","description":"(Almost) Free Load Testing in the Cloud","publisher":{"@id":"https:\/\/www.redline13.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.redline13.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.redline13.com\/blog\/#organization","name":"RedLine13","url":"https:\/\/www.redline13.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.redline13.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.redline13.com\/blog\/wp-content\/uploads\/2013\/06\/cropped-rl13-header-logo.jpg","contentUrl":"https:\/\/www.redline13.com\/blog\/wp-content\/uploads\/2013\/06\/cropped-rl13-header-logo.jpg","width":300,"height":68,"caption":"RedLine13"},"image":{"@id":"https:\/\/www.redline13.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.redline13.com\/blog\/#\/schema\/person\/0fadb7f3ef665407f3c93c8ec84e741a","name":"Rich Friedman","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.redline13.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/8651ce662fc18353b90c1922f9d29efb01173afa5500224b4d9a355d858a7bd9?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/8651ce662fc18353b90c1922f9d29efb01173afa5500224b4d9a355d858a7bd9?s=96&d=mm&r=g","caption":"Rich Friedman"},"sameAs":["http:\/\/richardfriedman@yahoo.com"],"url":"https:\/\/www.redline13.com\/blog\/author\/richardfriedman\/"}]}},"_links":{"self":[{"href":"https:\/\/www.redline13.com\/blog\/wp-json\/wp\/v2\/posts\/1106","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.redline13.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.redline13.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.redline13.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.redline13.com\/blog\/wp-json\/wp\/v2\/comments?post=1106"}],"version-history":[{"count":1,"href":"https:\/\/www.redline13.com\/blog\/wp-json\/wp\/v2\/posts\/1106\/revisions"}],"predecessor-version":[{"id":8700,"href":"https:\/\/www.redline13.com\/blog\/wp-json\/wp\/v2\/posts\/1106\/revisions\/8700"}],"wp:attachment":[{"href":"https:\/\/www.redline13.com\/blog\/wp-json\/wp\/v2\/media?parent=1106"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.redline13.com\/blog\/wp-json\/wp\/v2\/categories?post=1106"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.redline13.com\/blog\/wp-json\/wp\/v2\/tags?post=1106"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}