Data quality with Soda.io at americanas
About americanas
Americanas is a brazilian company which adopts a unique approach in serving its customers, offering a physical platform with different store formats, in addition to having a digital platform, with various companies(Americanas, Submarino, Shoptime, Soubarato, others) seeking to be capable of delivering the best omnichannel consumer experience in Brazil. The Company has an innovation engine to accelerate its platforms, build disruptive businesses and leverage different initiatives.
Environment
Our data team is called Bee and the platform is Bee Data platform, this team has responsible to integrate, load, transform, serving the data to users. The data product has been our mainly asset to offer to final users.
In behalf of our role we have responsability to deliver data products with quality and SLA. Some critereas have been definied to choose data quality and observability tool:
- Security;
- Easy way to create metrics;
- Compatible stack with our environment;
- Open Source;
- Scalability;
After couple of PoC we choose Soda.io, with two main services, Soda SQL and Soda Cloud to implement this six core data quality dimensions:
- Completeness
- Uniqueness
- Timeliness
- Validity
- Accuracy
- Consistency
Architecture
In the below image has an overview of our stack of data observability and data quality, which has 4 steps:
1 — Deploy yaml soda file with metrics by gitlab;
2 — Google Kubernetes Engine Operators by airflow;
3 — Soda SQL scans metrics sent by yaml file;
4 — Soda Cloud collect results of metrics;
As our stack is GCP, Composer/Airflow has orchestration duty then we prefer to use GKE operator to make scans into pods and make this concept much more scalable.
Results
After metrics implemented our team has proactive alerts, when some source system is delaying to deliver the data, if the data has not correct format, if the threshold is lower or higher we can identify easely and automatically by Soda.
One of great feature Soda offers is anomaly detection of some metrics, as per above images.
All metrics and scans done by this tool have been exceptional becaming much better data team than past. Nowadays data team can deliver data with consistency, quality, integrity. Our techincal maturity grown up and final users are glad with good data to make their decision.