Skip to content

Session 6: Data Servers on the Cloud

MSU Mara Hyena Project Database Entity Relationship Diagram

Introduction

Data servers like Relational Databases can be a powerful tool for even small research projects. When we say "Data Servers" or "Data Systems" we mean any server with data processing capabilities that you connect to with a client to send data, commands and receive results. The most widely used and classic example is the relational database management system (RDBMs) invented in the 1970s but there are many other types. A central advantage of data servers is ability to have multiple econnections at once from many users, a busy web application, or parallel processing. Like Big Data tools, these data systems don't require cloud computing, but cloud companies offer database servers with a few clicks that require very littls management. A data server could be a productive addition to your cloud architecturee, or the central aspect of your fellowship project.

Meeting October 29 2:00-3:30pm

Readings

From the free textbook "Big Data and Social Science: Data Science Methods and Tools for Research and Practice" see the Chapter 4. Databases by Foster, Ghani, Jarmin, Kreuter and Lane which could be a valuable resource for learning about the data science methods that you may se on the cloud.

Textbook Chapter 3s: Blog Post from eScience: Azure’s new CosmosDB Planet-Scale Database

Difference between SQL and 'NoSQL' style databases

Towards DataScience blog Introduction to Databases for Data Scientists Note this blog post may require a free account but reading in an incognito or private window worked for me

A description of SQL vs NoSQL from a company called xplenty Which Modern Database Is Right for Your Use Case? this has many adds/pop-ups but is a good read

Activities

Azure offers a database user interface for your computer called Azure Data Studio and has a tutorial for using, but you need to create a database server first.

If you are interested in using SQL for your project, consider the following two activities based on the Open Source database system Postgresql

  1. Quickstart: Create an Azure Database for PostgreSQL server by using the Azure portal
    • I suggest using a password rather than ssh key for now to make it easier, but not recommend for long-term use
    • record the admin user name and password you used when creating the database
    • in the Azure portal, visit the Postgresql Database resource page, and look at the connection strings page to use in the next tutorial
  2. Quickstart: Use Azure Data Studio to connect and query PostgreSQL
  3. Delete the database in your resource group when you have finished with the tutorial. If you would like to estimate costs for using database, keep it for a few days, then visit the "Cost Analysis" page of your resource group in the Azure portal. SQL services can get expensive quickly compared to VMs, but installing and running postgresql server software is a daunting task and comes with security risks.

Using a Database Server in the cloud has many layers so if you are interested in using SQL feel free to reach out for help provisioning and connecting to a SQL system.

Optional : Data Analytics on the Google Platform

Google Offers a service called "BigQuery" which does not require you to create a server, only an account. It can process huge amounts of data, and has several datasets available for free. It also has

You may try BigQuery for free in their Sandbox, with only a Google Account. If you have a gmail account, use that to log in with an incognito/private browser window. You may try your msu.edu email but it may or may not work.

In the following tutorial you may ignore mentions of "firebase" which is another database offering for web apps from google

Using the BigQuery sandbox

If you are a python user, Google offeres a free notebook service called "Co-Lab" and you can connect to both google drive and big query from Colab.

Google Colab + Google BigQuery + GoogleDrive is definitely cloud computing but not in the sense we usually think about it as these are all SAAS level services, chained together via a cloud computing backdrop.