A Deep Dive into DynamoDB Partitions

maxresdefault

Databases are the backbone of most modern web applications and their performance plays a major role in user experience. Faster response times – even by a fraction of a second – can be the major deciding factor for most users to choose one option over another. Therefore, it is important to take response rate into consideration whilst designing your databases in order to provide the best possible performance. In this article, I’m going to discuss how to optimise DynamoDB database performance by using partitions.

Continue reading

Posted in databases | Tagged , , , , | Leave a comment

Filling the BigQuery paddling pool from the Kinesis Hosepipe

Introduction

Shine team sending logs downstream

Shine team sending logs downstream

This blog post details how we solved the problem of analysing large amounts of HTTP request logs for one of our clients. Spoiler: we used Amazon’s Kinesis and Lambda to stream the data into Google’s BigQuery for analysis. Read on for the juicy details! Continue reading

Posted in Node.js | Tagged , , , , , , , | 3 Comments

The most important thing when picking HTTP status codes

6508023065_1b3bf710ce_o

Courtesy of HTTP Status Cats. More of a dog person? There’s one for you too.

Every couple of months I’m in a meeting where a couple of developers start arguing about which HTTP status codes to use in their RESTful API, or where they decide to not use HTTP status codes at all and instead layer their own error-code system on top of HTTP.

In my experience, HTTP status codes are more than adequate for communicating from servers to clients. Furthermore, it’s preferable to stick with this standard, because that’s what most client and server-side HTTP libraries are used to dealing with.

When it comes to which status code to use, the truth is that most of the time it doesn’t matter, just so long as it falls within the correct range. In this post I’m going to outline what the important ranges are, and when you should use each one.

If you control both the client and server, these guidelines should do just fine. If you’re writing a more generic RESTful service where other people are writing the clients, you may have to be a bit more nuanced. Either way, this rule-of-thumb is a good starting point to work towards the simplest solution possible for your particular problem.

Continue reading

Posted in Opinion, Services | Tagged , , , | Leave a comment

Orchestrating Tasks Using AWS SWF

t_WilkinsConducting

Lately I’ve been doing a lot of work managing batch-processing tasks. Broadly speaking, there are 2 types of trigger for such a task: time-based and logic-based. The former can be easily done by cron/scheduled jobs. The latter can be a bit tricky, mostly because it can involve dependencies on other tasks. In this post, I will talk about how I’ve been using the AWS Simple Workflow service (SWF) to take some of the headache out of orchestrating tasks.

Continue reading

Tagged , , , | Leave a comment

The Emergence of The 3 Towers: DevSecOps

Three-Towers

I had the opportunity to attend AWS bootcamp in Sydney a couple of weeks ago. The session I chose to attend was entitled “Securing Cloud Workloads with DevOps Automation”. There were many interesting concepts discussed, all hinging around the new term ‘DevSecOps’. In this post, I’d like to talk about what this is and how it relates to traditional DevOps.

Continue reading

Posted in Continuous Delivery, Continuous Integration, DevOps | Tagged , , , | 1 Comment

Shine hosts a successful Digital Leaders Breakfast

ii_1539d01e002005c8
Yesterday Shine hosted a number of Digital Leaders to breakfast in Melbourne, aiming to share experiences and learning.
More than 20 participants from organisations including Coles, Energy Australia, NAB, ANZ, Telstra, Fairfax Media and Australia Post heard presentations from Todd Copeland (GM Digital, NAB), Simon Noonan (CIO, Sportsbet) and Jeff Mentiplay (GM Analytics & Commercial Delivery).
The response was extremely positive, including that participants greatly valued the presentations, the subsequent discussion and the opportunity to meet peers.
Digital Leaders Breakfast April 2016 There were some key themes from the presentations, which generated a great deal of open discussion amongst the presenters and participants.  These topics included the ability to learn and act quickly upon customer needs, the need to change traditional organisational structures, financial measures and team office locations to accelerate delivery.  In addition, the importance of “looking externally” to learn from global Digital leaders and the value of sharing platforms and customer insights across organisational silos was discussed.
Based on the overwhelmingly positive feedback and rich discussions, Shine plans to run another Digital Leaders Breakfast later in 2016.
IMG_0718  IMG_0713
Posted in Uncategorized | Leave a comment

Creating a serverless ETL nirvana using Google BigQuery

5044773

Quite a while back, Google released two new features in BigQuery. One was federated sources. A federated source allows you to query external sources, like files in Google Cloud Storage (GCS), directly using SQL. They also gave us user defined functions (UDF) in that release too. Essentially, a UDF allows you to ram JavaScript right into your SQL to help you perform the map phase of your query. Sweet!

In this blog post, I’ll go step-by-step through how I combined BigQuery’s federated sources and UDFs to create a scalable, totally serverless, and cost-effective ETL pipeline in BigQuery.

Continue reading

Posted in databases, Javascript | Tagged , , , , , , , , , , , , | 4 Comments

Pablo rocking the stage at Google’s annual cloud event!

Last week, Shine’s very own Pablo Caif gave a presentation at GCP Next 2016 in San Francisco, which is Google’s largest annual cloud platform event. Pablo delivered an outstanding talk on the work Shine have done for Telstra, which involves building solutions on the GCP stack to manage and analyse their massive datasets. More specifically, the talk focused around two of Google’s core big data products –BigQuery & Cloud Dataflow.

Continue reading

Tagged , , , , , , , | Leave a comment

Shine’s Pablo Caif to present at GCP Next 2016!

next

Shine is extremely proud to announce that Pablo Caif has been invited to present at GCP Next 2016, which is Google’s largest annual cloud platform event held in San Francisco.

Pablo will be presenting on the work Shine have done for Telstra, which involves building solutions on GCP to manage and analyse their massive datasets. More specifically, the talk will focus around Google’s two core big data products – BigQuery & Cloud Dataflow.

Pablo will be presenting on Thursday 24th March in the ‘Data & Analytics’ track. Be sure to pop by and say “g’day” if you are going to the event! You can find more information about GCP Next 2016 here.

 

Posted in databases | Tagged , , , , , , , | Leave a comment

NoSQL in the cloud: A scalable alternative to Relational Databases

cloud-db.jpg

With the current move to cloud computing, the need to scale applications presents itself as a challenge for storing data. If you are using a traditional relational database you may find yourself working on a complex policy for distributing your database load across multiple database instances. This solution will often present a lot of problems and probably won’t be great at elastically scaling.

As an alternative you could consider a cloud-based NoSQL database.  Over the past few weeks I have been analysing a few such offerings, each of which promises to scale as your application grows, without requiring you to think about how you might distribute the data and load.

Continue reading

Posted in databases, Uncategorized | Tagged , , , , , | Leave a comment