Using the Google Custom Search API from the command line

Using the Google Custom Search API from the command line is easy, however I could’nt find this information clearly laid out anywhere on the web in a simple 5 step process, so here it is.

If you did’nt know, Google have for a long time banned unauthenticated bot usage on Google Search (your IP will be blocked if you do so) to prevent abuse, so this is why we are using Google CSE, which requires authentication.

The first 100 queries per day are free. Any more, then you have to pay $5 per 1000 queries, for up to 10,000 queries per day, just enable billing to do so. Each query returns a maximum of 10 results, so you can retrieve 1000 URL’s from your search per day for free.

Google searches and Google Custom Searches will yield slightly different results. You can read up on the reasons why here : http://support.google.com/customsearch/bin/answer.py?hl=en&answer=2633385

Read up on Google CSE here -> https://www.google.com/cse/compare

And see -> https://developers.google.com/custom-search/v1/overview

 

Step 1 – Create a CSE
A CSE is a Google Custom Search Engine. Typically you configure it to search your own website, however we can also configure it to search the entire web.

1. Goto https://www.google.com/cse/all
2. Create a CSE, enter “www.google.de” and then click create
3. Then goto Control Panel / Setup / Basics / Sites to search = Search entire web

It looks like “cx: yyyyyyyyyyyyyyyyyyyyy:yyyyyyyyyyy”
Test some queries using the CSE within a web browser before crafting a URL for use with curl.
https://www.google.com/cse/publicurl?cx=yyyyyyyyyyyyyyyyyyyyy:yyyyyyyyyyy

Step 2 – Create a Developer API Key
https://code.google.com/apis/console/

Step 3 – Send the query
Perform a query using :

key = developers API Key
cx = custom search
q = query
start = is the URL position, and is from 1 to 101. So start=11 is page 2 of the results
num = maximum of 10 search results are allowed

More information of the query parameters can be found here.
https://developers.google.com/custom-search/v1/using_rest?hl=en#query-params

Output is in JSON [JavaScript Object Notation] format.

Each query returns a maximum of 10 results (URL’s and more).

$ curl "https://www.googleapis.com/customsearch/v1?key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&cx=yyyyyyyyyyyyyyyyyyyyy:yyyyyyyyyyy&q=Singapore&filter=1&start=1&num=10&alt=json" > search.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 18389    0 18389    0     0  12291      0 --:--:--  0:00:01 --:--:-- 13984
  • If the Dload is more than 1000 bytes in size, then it was successful, else you have errors in the output file.

Step 4 – Download a JSON command-line parser
Homepage for jq http://stedolan.github.io/jq/

$ sudo wget http://stedolan.github.io/jq/download/linux64/jq -O /usr/local/bin/jq

Step 5 – Parse the JSON output file, to see our results

$ jq '.items[].link' search.json | awk '{print substr($0, 2, length() - 2)}'

http://www.singaporeair.com/
http://www.yoursingapore.com/
http://wikitravel.org/en/Singapore/
http://www.gov.sg/
https://www.cia.gov/library/publications/the-world-factbook/geos/sn.html
http://www.lonelyplanet.com/singapore
http://topics.bloomberg.com/singapore/
http://www.singapore.sg/
http://www.bbc.co.uk/news/world-asia-15961759
http://www.tripadvisor.com/Tourism-g294265-Singapore-Vacations.html

Errors

When you’ve reached your daily quota, this is what you will see in the JSON output.

{
 "error": {
  "errors": [
   {
    "domain": "usageLimits",
    "reason": "dailyLimitExceeded",
    "message": "Daily Limit Exceeded"
   }
  ],
  "code": 403,
  "message": "Daily Limit Exceeded"
 }
}

If you don’t encode the query string, you’ll see this.

Error 400
Your client has issued a malformed or illegal request.

 

You may find find that your search query does’nt match any indexed data.

$ jq '.searchInformation | { totalResults }' /var/tmp/search1.json
{
  "totalResults": "0"
}
Advertisements
This entry was posted in internet, linux. Bookmark the permalink.

5 Responses to Using the Google Custom Search API from the command line

  1. Amine says:

    Excellent article. Thank you for this sharing.

    My script was using the old API. It was a simple CURL. No API key or something. It stopped working suddenly. It still does a couple of queries and it starts returning invalid results (API abuse alert). Even with “sleep 1” between searches, still the API wasn’t working. I figured out by reading your post today.

    The new search API allows up to 100 searches a day then it’s paid per blocks or 1000 searches. I activated the billing. It gives me $300 API credit and 60 days free. Not too bad for testing.

  2. Pingback: Python:“daily limit exceeded” when using Google custom search API – IT Sprite

  3. Hey,

    my name ist Patrick, I am the author of WP Keyword Monitor. I have added a link to your tutorial in this plugin. Hopefully it’s okay for you 🙂 Maybe some people will find the way to your blog through my plugin.

    You can find the plugin here: https://wordpress.org/plugins/wp-keyword-monitor/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s