Data - Google Trends

From Q
Jump to: navigation, search

Extracts Google Trends data. See this blog post for an example.

The results are scores indicating search frequencies over time, in the range 0 to 100. The scores are relative, such that the difference between the highest score and lowest score is close to 100. Thus comparing apples with oranges may result in different scores for oranges than when comparing bananas with oranges. See the Google Trends website for more information.

Example

The popularity of searches for food groups.

Options

Topic(s) The topic(s) to be searched for. Where there is more than one topic, they should be comma-separated.

Type Whether to search the web, news, images, YouTube, or Froogle (Google shopping).

Region Either World, a selected country from the list, State or Multiple regions.

State code The ISO-3166 code for a state (e.g., US-CA for California or AU-NSW for New South Wales in Australia). See this link for more information on ISO 3166 codes.

An imperfect mechanism for extracting the relevant state-based codes of a country is to create an R Output and paste in the two lines of code below, replacing the US with the appropriate country code. It is imperfect because many of the codes that are returned will not work.

library(gtrendsR)
geo.codes <- sort(unique(countries[substr(countries$sub_code, 1, 2) == "US",]$sub_code))

Region codes A comma-separated list of either ISO-3166 country or state codes (e.g. FR, MX, GB for France, Mexico and Great Britain). See this link for more information on ISO 3166 codes. A mixture of countries and states is not permitted.

Period The time period for which the Google Trends are reported. Either Custom range (yyyy-mm-dd) or a specific period. The frequency of the data points depends on the period as follows,

Last hour and Last 4 hours : every minute
Last 24 hours : every 8 minutes
Last seven days : hourly
Past 30 days and Past 90 days : daily
Past 12 months and Last five years : weekly
2004 - present : monthly

Long format output If checked, the result is returned in long format i.e. with one row of data per topic per time period. Otherwise, it is returned in wide format i.e. with time periods along the rows and topics along the columns.

Automatic updating Whether to regularly update the data.

Update period The time unit for regular updates.

Frequency The multiple of the Update period for regular updating.

Start date and time The date and time of the first update in the format dd-mm-yyyy hh:mm or mm-dd-yyyy hh:mm.

US date format Whether the Start date and time is expressed in US format i.e. mm-dd-yyyy hh:mm.

Time zone An optional time zone for the Start date and time, or else default of UTC applies. Format must be Continent/City, e.g. America/Los_Angeles. See Wikipedia for a list of time zones.

Update exported documents Whether exported documents that refer to the data should also be updated regularly.

Acknowledgements

Uses the gtrendsR package.

Code

form.setHeading("Google Trends");
form.textBox({name: "formTopics",
              label: "Topic(s)", 
              required: true, prompt: "For example: Britney Spears, Justin Bieber"});
form.comboBox({name: "formGprop",
               label: "Type", 
               alternatives: ["Web", "News", "Images", "Froogle", "YouTube"],
               default_value: "Web", prompt: "The type of searches, within which to calculate trends"});
var country = form.comboBox({name: "formCountry",
               label: "Region", 
               alternatives: ["World", "State", "Multiple regions", "Afghanistan (AF)", "Albania (AL)", "Algeria (DZ)", "American Samoa (AS)", "Andorra (AD)", "Angola (AO)", "Anguilla (AI)", "Antarctica (AQ)", "Antigua and Barbuda (AG)", "Argentina (AR)", "Armenia (AM)", "Aruba (AW)", "Australia (AU)", "Austria (AT)", "Azerbaijan (AZ)", "Bahamas (BS)", "Bahrain (BH)", "Bangladesh (BD)", "Barbados (BB)", "Belarus (BY)", "Belgium (BE)", "Belize (BZ)", "Benin (BJ)", "Bermuda (BM)", "Bhutan (BT)", "Bolivia, Plurinational State of (BO)", "Bonaire, Sint Eustatius and Saba (BQ)", "Bosnia and Herzegovina (BA)", "Botswana (BW)", "Bouvet Island (BV)", "Brazil (BR)", "British Indian Ocean Territory (IO)", "Brunei Darussalam (BN)", "Bulgaria (BG)", "Burkina Faso (BF)", "Burundi (BI)", "Cabo Verde (CV)", "Cambodia (KH)", "Cameroon (CM)", "Canada (CA)", "Cayman Islands (KY)", "Central African Republic (CF)", "Chad (TD)", "Chile (CL)", "China (CN)", "Christmas Island (CX)", "Cocos (Keeling) Islands (CC)", "Colombia (CO)", "Comoros (KM)", "Congo (CG)", "Congo, the Democratic Republic of the (CD)", "Cook Islands (CK)", "Costa Rica (CR)", "Cote d'Ivoire !Côte d'Ivoire (CI)", "Croatia (HR)", "Cuba (CU)", "Curaçao (CW)", "Cyprus (CY)", "Czechia (CZ)", "Denmark (DK)", "Djibouti (DJ)", "Dominica (DM)", "Dominican Republic (DO)", "Ecuador (EC)", "Egypt (EG)", "El Salvador (SV)", "Equatorial Guinea (GQ)", "Eritrea (ER)", "Estonia (EE)", "Ethiopia (ET)", "Falkland Islands (Malvinas) (FK)", "Faroe Islands (FO)", "Fiji (FJ)", "Finland (FI)", "France (FR)", "French Guiana (GF)", "French Polynesia (PF)", "French Southern Territories (TF)", "Gabon (GA)", "Gambia (GM)", "Georgia (GE)", "Germany (DE)", "Ghana (GH)", "Gibraltar (GI)", "Greece (GR)", "Greenland (GL)", "Grenada (GD)", "Guadeloupe (GP)", "Guam (GU)", "Guatemala (GT)", "Guernsey (GG)", "Guinea (GN)", "Guinea-Bissau (GW)", "Guyana (GY)", "Haiti (HT)", "Heard Island and McDonald Islands (HM)", "Holy See (Vatican City State) (VA)", "Honduras (HN)", "Hong Kong (HK)", "Hungary (HU)", "Iceland (IS)", "India (IN)", "Indonesia (ID)", "Iran, Islamic Republic of (IR)", "Iraq (IQ)", "Ireland (IE)", "Isle of Man (IM)", "Israel (IL)", "Italy (IT)", "Jamaica (JM)", "Japan (JP)", "Jersey (JE)", "Jordan (JO)", "Kazakhstan (KZ)", "Kenya (KE)", "Kiribati (KI)", "Korea (the Democratic People's Republic of) (KP)", "Korea (the Republic of) (KR)", "Kuwait (KW)", "Kyrgyzstan (KG)", "Lao People's Democratic Republic (LA)", "Latvia (LV)", "Lebanon (LB)", "Lesotho (LS)", "Liberia (LR)", "Libya (LY)", "Liechtenstein (LI)", "Lithuania (LT)", "Luxembourg (LU)", "Macao (MO)", "Macedonia, the former Yugoslav Republic of (MK)", "Madagascar (MG)", "Malawi (MW)", "Malaysia (MY)", "Maldives (MV)", "Mali (ML)", "Malta (MT)", "Marshall Islands (MH)", "Martinique (MQ)", "Mauritania (MR)", "Mauritius (MU)", "Mayotte (YT)", "Mexico (MX)", "Micronesia, Federated States of (FM)", "Moldova, Republic of (MD)", "Monaco (MC)", "Mongolia (MN)", "Montenegro (ME)", "Montserrat (MS)", "Morocco (MA)", "Mozambique (MZ)", "Myanmar (MM)", "Namibia (NA)", "Nauru (NR)", "Nepal (NP)", "Netherlands[note 1] (NL)", "New Caledonia (NC)", "New Zealand (NZ)", "Nicaragua (NI)", "Niger (NE)", "Nigeria (NG)", "Niue (NU)", "Norfolk Island (NF)", "Northern Mariana Islands (MP)", "Norway (NO)", "Oman (OM)", "Pakistan (PK)", "Palau (PW)", "Palestine, State of (PS)", "Panama (PA)", "Papua New Guinea (PG)", "Paraguay (PY)", "Peru (PE)", "Philippines (PH)", "Pitcairn (PN)", "Poland (PL)", "Portugal (PT)", "Puerto Rico (PR)", "Qatar (QA)", "Reunion !Réunion (RE)", "Romania (RO)", "Russian Federation (RU)", "Rwanda (RW)", "Saint Barthélemy (BL)", "Saint Helena, Ascension and Tristan da Cunha (SH)", "Saint Kitts and Nevis (KN)", "Saint Lucia (LC)", "Saint Martin (French part) (MF)", "Saint Pierre and Miquelon (PM)", "Saint Vincent and the Grenadines (VC)", "Samoa (WS)", "San Marino (SM)", "Sao Tome and Principe (ST)", "Saudi Arabia (SA)", "Senegal (SN)", "Serbia (RS)", "Seychelles (SC)", "Sierra Leone (SL)", "Singapore (SG)", "Sint Maarten (Dutch part) (SX)", "Slovakia (SK)", "Slovenia (SI)", "Solomon Islands (SB)", "Somalia (SO)", "South Africa (ZA)", "South Georgia and the South Sandwich Islands (GS)", "South Sudan (SS)", "Spain (ES)", "Sri Lanka (LK)", "Sudan (SD)", "Suriname (SR)", "Svalbard and Jan Mayen (SJ)", "Swaziland (SZ)", "Sweden (SE)", "Switzerland (CH)", "Syrian Arab Republic (SY)", "Taiwan, Province of China [note 2] (TW)", "Tajikistan (TJ)", "Tanzania, United Republic of (TZ)", "Thailand (TH)", "Timor-Leste (TL)", "Togo (TG)", "Tokelau (TK)", "Tonga (TO)", "Trinidad and Tobago (TT)", "Tunisia (TN)", "Turkey (TR)", "Turkmenistan (TM)", "Turks and Caicos Islands (TC)", "Tuvalu (TV)", "Uganda (UG)", "Ukraine (UA)", "United Arab Emirates (AE)", "United Kingdom (GB)", "United States (US)", "United States Minor Outlying Islands (UM)", "Uruguay (UY)", "Uzbekistan (UZ)", "Vanuatu (VU)", "Venezuela, Bolivarian Republic of (VE)", "Viet Nam (VN)", "Virgin Islands, British (VG)", "Virgin Islands, U.S. (VI)", "Wallis and Futuna (WF)", "Western Sahara (EH)", "Yemen (YE)", "Zambia (ZM)", "Zimbabwe (ZW)"],
               prompt: "Filter the trend data for a country, a state or multiple countries or multiple states.",
               default_value: "World"}).getValue();
if (country == "State")
    form.textBox({name: "formGeo", label: "State code", required: true, prompt: "For example: US-FL"});
if (country == "Multiple regions")
    form.textBox({name: "formGeo", label: "Region codes", required: true, prompt: "For example: US, MX, CA"});

var time = form.comboBox({name: "formPeriod",
               label: "Period", 
               alternatives: ["Last hour", "Last four hours", "Last 24 hours", "Last seven days", "Last 30 days", "Last 90 days", "Last 12 months", "Last five years", "2004 - present", "Custom range (yyyy-mm-dd)"],
               prompt: "The period of the trend history",
               default_value: "Last five years"});
if (time.getValue() == "Custom range (yyyy-mm-dd)")
    form.textBox({name: "formCustomTime",
              label: "yyyy-mm-dd yyyy-mm-dd",
              prompt: "Start and end dates in format yyyy-mm-dd",
              required: true, default_value: "2016-01-01 2016-12-31"});
form.checkBox({label:"Long format output", name:"formLongFormat", default_value:false, prompt: "One row per topic per time period"});

// Controls for regular updating
var updating = form.checkBox({label:"Automatic updating", name:"formUpdating", default_value:false, prompt:"Regularly update the output"}).getValue();
if (updating) {
    if (Q.fileFormatVersion() > 10.9)
        form.group("UPDATING");
    var period = form.comboBox({name: "formUpdatePeriod", label: "Update period", 
               alternatives: ["Months", "Weeks", "Days", "Hours", "Minutes", "Seconds"], default_value: "Days", prompt: "The time units for updating"}).getValue();
    var defaultFrequency = 1;
    if (period == "Seconds")
        defaultFrequency = 600;
    else if (period == "Minutes")
        defaultFrequency = 10;
    form.numericUpDown({name: "formFrequency", label: "Frequency", default_value: defaultFrequency,
                        prompt: "The update frequency in units of the update period", increment: 1, minimum: defaultFrequency, maximum: 1000000});
    var start = form.textBox({name: "formStart", label: "Start date and time", prompt: "The first update date and time",
              required: false, prompt: "Default now, or e.g. 31-12-2018 18:00:00"}).getValue();
    if (start != "") {
        form.checkBox({label:"US date format", name:"formUSDate", default_value:false, prompt: "Specify update start date as mm-dd-yyyy"});
        form.textBox({name: "formTimeZone", label: "Time zone", 
                  required: false, prompt: "Leave blank for UTC or enter e.g. America/New_York"});    
    }
    form.checkBox({name: "formSnapshot", label: "Update exported documents", default_value: false, prompt: "Whether exported documents should be updated"});
}
library(gtrendsR)
library(flipU)
library(flipTime)
library(reshape2)
 
# avoid warning from anytime package used by gtrendsR not finding timezone
Sys.setenv(TZ = "UTC")
 
topics <- ConvertCommaSeparatedStringToVector(formTopics)
if ((lt <- length(topics)) > 5) stop("Maximum of 5 topics can be specified.")

geog <- ""
if (formCountry == "State") {
    geog <- ConvertCommaSeparatedStringToVector(formGeo)
} else if (formCountry == "Multiple regions") {
    geog <- ConvertCommaSeparatedStringToVector(formGeo)
    if (lt > 1 && length(geog) > 1)
        stop("Only one topic can be specified with multiple regions.")
} else if (formCountry != "World") {
    geog <- substring(formCountry, nchar(formCountry) - 2, nchar(formCountry) - 1)
}

time <- switch(formPeriod,
               "Last hour" = "now 1-H",
               "Last four hours" = "now 4-H",
               "Last 24 hours" = "now 1-d",
               "Last seven days" = "now 7-d",
               "Last 30 days" = "today 1-m",
               "Last 90 days" = "today 3-m",
               "Last 12 months" = "today 12-m",
               "Last five years" = "today+5-y",
               "2004 - present" = "all",
               "Custom range (yyyy-mm-dd)" = formCustomTime)
 
gt <- tryCatch({gtrends(topics, geo = toupper(geog), gprop = tolower(formGprop), time = time)[[1]]}, 
        error = function(e) {stop("Geographic code not recognized. Please use ISO-3166 country or state codes (not a mixture of both).")})
if (is.null(gt))
    stop("No data was returned for your search terms.")
gt$hits[gt$hits == "<1"] <- 0
gt$hits <- as.numeric(gt$hits)

if(!formLongFormat) {
    gt <- dcast(gt, date ~ keyword + geo, value.var = "hits")
    rownames(gt) <- gt$date
    gt$date <- NULL
    if(length(geog) == 1)
        colnames(gt) <- gsub("_.*", "", colnames(gt))
    else
        colnames(gt) <- gsub(".*_", "", colnames(gt))
}

# Create regular updating message
if (formUpdating) {
    options <- ifelse(formSnapshot, "snapshot", "wakeup")
    if (formStart != "") {
        if (formTimeZone == "") formTimeZone <- "UTC"
        UpdateAt(formStart, us.format = formUSDate, time.zone = formTimeZone,
                units = tolower(formUpdatePeriod), frequency = formFrequency, options = options)
    } else
        UpdateEvery(formFrequency, units = tolower(formUpdatePeriod), options = options)
}

google.trends <- gt