Data quality with Soda.io at americanas

Diego Lopes
3 min readApr 15, 2022

--

About americanas

Americanas is a brazilian company which adopts a unique approach in serving its customers, offering a physical platform with different store formats, in addition to having a digital platform, with various companies(Americanas, Submarino, Shoptime, Soubarato, others) seeking to be capable of delivering the best omnichannel consumer experience in Brazil. The Company has an innovation engine to accelerate its platforms, build disruptive businesses and leverage different initiatives.

Environment

Our data team is called Bee and the platform is Bee Data platform, this team has responsible to integrate, load, transform, serving the data to users. The data product has been our mainly asset to offer to final users.

In behalf of our role we have responsability to deliver data products with quality and SLA. Some critereas have been definied to choose data quality and observability tool:

  • Security;
  • Easy way to create metrics;
  • Compatible stack with our environment;
  • Open Source;
  • Scalability;

After couple of PoC we choose Soda.io, with two main services, Soda SQL and Soda Cloud to implement this six core data quality dimensions:

  • Completeness
  • Uniqueness
  • Timeliness
  • Validity
  • Accuracy
  • Consistency

Architecture

In the below image has an overview of our stack of data observability and data quality, which has 4 steps:

1 — Deploy yaml soda file with metrics by gitlab;

2 — Google Kubernetes Engine Operators by airflow;

3 — Soda SQL scans metrics sent by yaml file;

4 — Soda Cloud collect results of metrics;

Data quality architecture

As our stack is GCP, Composer/Airflow has orchestration duty then we prefer to use GKE operator to make scans into pods and make this concept much more scalable.

Results

After metrics implemented our team has proactive alerts, when some source system is delaying to deliver the data, if the data has not correct format, if the threshold is lower or higher we can identify easely and automatically by Soda.

Soda Cloud metrics

One of great feature Soda offers is anomaly detection of some metrics, as per above images.

All metrics and scans done by this tool have been exceptional becaming much better data team than past. Nowadays data team can deliver data with consistency, quality, integrity. Our techincal maturity grown up and final users are glad with good data to make their decision.

--

--