Data quality with Soda.io at americanas

About americanas

Americanas is a brazilian company which adopts a unique approach in serving its customers, offering a physical platform with different store formats, in addition to having a digital platform, with various companies(Americanas, Submarino, Shoptime, Soubarato, others) seeking to be capable of delivering the best omnichannel consumer experience in Brazil. The Company has an innovation engine to accelerate its platforms, build disruptive businesses and leverage different initiatives.

Environment

Our data team is called Bee and the platform is Bee Data platform, this team has responsible to integrate, load, transform, serving the data to users. The data product has been our mainly asset to offer to final users.

In behalf of our role we have responsability to deliver data products with quality and SLA. Some critereas have been definied to choose data quality and observability tool:

  • Security;
  • Easy way to create metrics;
  • Compatible stack with our environment;
  • Open Source;
  • Scalability;

After couple of PoC we choose Soda.io, with two main services, Soda SQL and Soda Cloud to implement this six core data quality dimensions:

  • Completeness
  • Uniqueness
  • Timeliness
  • Validity
  • Accuracy
  • Consistency

Architecture

In the below image has an overview of our stack of data observability and data quality, which has 4 steps:

1 — Deploy yaml soda file with metrics by gitlab;

2 — Google Kubernetes Engine Operators by airflow;

3 — Soda SQL scans metrics sent by yaml file;

4 — Soda Cloud collect results of metrics;

Data quality architecture

As our stack is GCP, Composer/Airflow has orchestration duty then we prefer to use GKE operator to make scans into pods and make this concept much more scalable.

Results

After metrics implemented our team has proactive alerts, when some source system is delaying to deliver the data, if the data has not correct format, if the threshold is lower or higher we can identify easely and automatically by Soda.

Soda Cloud metrics

One of great feature Soda offers is anomaly detection of some metrics, as per above images.

All metrics and scans done by this tool have been exceptional becaming much better data team than past. Nowadays data team can deliver data with consistency, quality, integrity. Our techincal maturity grown up and final users are glad with good data to make their decision.

--

--

--

Data Architect, soccer addict, bass player, big data lover

Love podcasts or audiobooks? Learn on the go with our new app.

An introduction to Exploratory Data Analysis

unstructured data

The Mercator Projection

Computer Science Comps Overview: QAB app for the Oxy Baseball Team

Databricks Spark— How to Create a Dataframe from a Python List of Tuples

1 in 500 in the US have died from COVID, 1 in 1700 in the world have (as of September 18th, 2021)

Statistical and Visual Exploratory Data Analysis with One Line of Code

Understanding the difference between correlation and causation.

8 Ways to Get More Value From Your Data

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Diego Lopes

Diego Lopes

Data Architect, soccer addict, bass player, big data lover

More from Medium

Part 1 : How to automate data loading from a Cloud datalake into Snowflake ?

Data quality is often neglected in the early stage of product development because a minimal viable…

Why Data Governance Matters

Autonomous Data Trust Score for Data Catalogs