SQL Cloud Big Data — Bigquery Sandbox — free edition

Antonella Blasetti
Data
September 17, 2020

Do you have some SQL practice? Do you want to take up a challenge? Have fun and learn several advanced Data Science techniques. 3 Articles.

 

Image for post

Photo by KOBU Agency on Unsplash

The trick is:

  • BigQuery, a tool for Big Data and Analytics from Google Cloud, has a free environment and offers many Open Data, checked and updated regularly.
  • It is much easier to understand concepts with engaging analysis and investigation. COVID is a sad topic, but for sure you are informed and interested. First rule: feel the data.
  • All this offers a good chance for learning advanced techniques, Analytics, Data Science and Machine Learning in an easy way.

 

Image for post

So, we are about to work empirically with this data, without formulas and theories. With an experimental, multimedia strategy.

We start with a challenge. Ready to go?

  • You will use BigQuery in the Sandbox version, free and without any credit card data.
    Here a short video of the tool.
  • You will become familiar with the environment and connect to the global data of COVID-19 (continuously updated without the need for action on your part).
  • You will then face the challenge; if you are an expert in SQL you will solve it, otherwise you will have the opportunity to learn some advanced SQL.

I disclose to you now the challenge to solve:

Compute the ranking of the 5 most affected countries (in the world) each month, starting from January 2020. In 1 SQL Statement.

Following you will find all the instructions to get started.

The next articles (links at the bottom) will explain the solution and the next steps.

Environment and data

Enter the Google Cloud https://console.cloud.google.com/

If you don’t have an account don’t accept the 300$ credit and just get in.

Don’t give your credit card data.

Image for post

Enter the BigQuery sandbox environment. Detailed description here, if you want to know how Sandbox works.

 

Image for post

We need first to access public data about COVID:

Add Data → Public Datasets → COVID-19 Open Data→ View Dataset

Image for post

 

Image for post

On the Bigquery screen you will see your project (mine is codi-282609) and then bigquery-public-data.

Don’t ask me why it tells you to choose a dataset and then it gives you all of them anyway….Mystery.

Select covid19_ecdc and covid_19_geographic_distribution_worldwide and look at the schema/preview so to familiarize with data:

Image for post

We have for each (day+country) the number of new cases and inhabitants.

Results that we want to compute

We want to compute the ranking of the 5 most affected countries (in the world) each month, starting from January 2020.

Something like this:

Image for post

Most affected means with the greater number of cases, any seriousness. Obviously in relation to the population.

Analyzing results you will be amazed! Different from what you would think …

Challenge!

How would you get the required result?

We don’t need to use any coding and one SQL Statement will be enough.

UP TO YOU! TRY and PLAY with us.

Next article will give (my) solution. I hope your method will be better than mine!

Visual Recap

Image for post


Antonella Blasetti

What really matters to me is helping people work better and learn more, HAVING FUN.
I have lots of tech knowledge and expertise but all that is USELESS until it becomes a strength for someone else.

Broad experience in Software Consultancy and Development, also in International Projects.
Deep knowledge of Technology, with the awareness that Technology is a means, not an end; a way to build something new, communicate and improving the way People work.
Leader of the Google Developer Group for Rome, Lazio and Abruzzo.
Big expertise with DBMS/Data Science: Tuning and Performance improvement of Systems and Applications.
Json/XML, all Web Technologies, Cloud Services; Microservices, JSON, XML, Google Technologies, Open Source development, Javascript/Ajax, noSQL Databases, often with and through RADIX, our Integration tool.
Now Radix works with Angular, mySQL, MS SQL Server, Serverless, Paas and Containers, NodeJS, MongoDB. That is, there are no limits. A single and solid architecture suited for many environments. And flashing fast, too.
RADIX is based extensively on XML and JSON Technology, with many flavours of Databases.
We think that Software Development, more than ever, is a very complex activity. So the ability to understand, communicate and work with the Development Team and all the other people involved is vitally important.

Application Expertise: Automotive, Health Care, Art Cataloguing, Logistics, Real Estate Mangement. Document and Content Management
Financial and Budget Control

Keep Reading

Newsletter EuropeClouds.com

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form