Web data processing tool PoC

AI Powered Amazon Web Data Processing PoC

The client needed a smart solution that could analyze thousands of Amazon marketplace products to identify optimization opportunities.

UX Design

Computer vision

Клієнт

Under NDA

Дата

2022

Роль

Development partner

Вебсайт

Under NDA

Стартова

Наші клієнти

Web data processing tool PoC

Challenge

The client needed a smart solution that could analyze thousands of Amazon marketplace products to identify optimization opportunities. The system needed to process multiple data points including product images, descriptions, reviews, and market performance metrics to generate actionable recommendations for sellers.

Faced several critical challenges:

• Manual product optimization was becoming unsustainable with their growing portfolio (500+ products across multiple categories)

• Inconsistent decision-making in product updates due to lack of data-driven insights

• 20–30% of products were underperforming due to suboptimal listing attributes

• Limited ability to identify and react to market trends in real-time

• Need to scale operations while maintaining quality of recommendations

Faced several critical challenges:

• Manual product optimization was becoming unsustainable with their growing portfolio (500+ products across multiple categories)

• Inconsistent decision-making in product updates due to lack of data-driven insights

• 20–30% of products were underperforming due to suboptimal listing attributes

• Limited ability to identify and react to market trends in real-time

• Need to scale operations while maintaining quality of recommendations

Faced several critical challenges:

• Manual product optimization was becoming unsustainable with their growing portfolio (500+ products across multiple categories)

• Inconsistent decision-making in product updates due to lack of data-driven insights

• 20–30% of products were underperforming due to suboptimal listing attributes

• Limited ability to identify and react to market trends in real-time

• Need to scale operations while maintaining quality of recommendations

How we helped

PoC creation

The Brightgrove team was fully responsible for the E2E creation of a PoC to help the client identify if the offered

technical solution could cover his business requirements. This PoC involved the following phases:

• Focus on core functionality validation

• Use managed services to reduce operational overhead

• Implement basic monitoring

• Limited but functional error handling

Taxonomy and categories implementation

We developed an AI-powered solution that analyzes product data, identifies trends, and provides recommendations to optimize listings and boost sales. For instance, if a seller offers black 0.5L water bottles, the system analyzes similar products across marketplaces, identifies trends in consumer preferences, and might recommend optimizing to green 0.75L bottles based on market data and customer sentiment analysis.

For the PoC and further development, we needed to research language models capable of encoding text data—product descriptions and other metadata—to taxonomy representation.

We needed to evaluate various language models by several criteria:

• Model size

• Performance

• Easy to adapt to new data (fine-tuning)

In the case of the PoC, light models like tiny-GPT or tiny-BERT were used for a more limited number of categories. For the implementation of image-to-text (taxonomy) encoding, we needed an engine capable of searching similar media based on their semantics rather than visual features. With a stable image similarity search engine, the system can analyze new image data from an existing annotated image database and collect taxonomy tags from similar media.

Taxonomy tags will be aggregated based on the similarity score, and the taxonomy data will be generated automatically for new images.

Product deployment and further development

The system leverages advanced machine learning to generate visual suggestions for product improvements, optimal keyword combinations, and pricing strategies.

PoC creation

The Brightgrove team was fully responsible for the E2E creation of a PoC to help the client identify if the offered

technical solution could cover his business requirements. This PoC involved the following phases:

• Focus on core functionality validation

• Use managed services to reduce operational overhead

• Implement basic monitoring

• Limited but functional error handling

Taxonomy and categories implementation

For the PoC and further development, we needed to research language models capable of encoding text data—product descriptions and other metadata—to taxonomy representation.

We needed to evaluate various language models by several criteria:

• Model size

• Performance

• Easy to adapt to new data (fine-tuning)

Taxonomy tags will be aggregated based on the similarity score, and the taxonomy data will be generated automatically for new images.

Product deployment and further development

The system leverages advanced machine learning to generate visual suggestions for product improvements, optimal keyword combinations, and pricing strategies.

PoC creation

The Brightgrove team was fully responsible for the E2E creation of a PoC to help the client identify if the offered

technical solution could cover his business requirements. This PoC involved the following phases:

• Focus on core functionality validation

• Use managed services to reduce operational overhead

• Implement basic monitoring

• Limited but functional error handling

Taxonomy and categories implementation

For the PoC and further development, we needed to research language models capable of encoding text data—product descriptions and other metadata—to taxonomy representation.

We needed to evaluate various language models by several criteria:

• Model size

• Performance

• Easy to adapt to new data (fine-tuning)

Taxonomy tags will be aggregated based on the similarity score, and the taxonomy data will be generated automatically for new images.

Product deployment and further development

The system leverages advanced machine learning to generate visual suggestions for product improvements, optimal keyword combinations, and pricing strategies.

Modular microservices system

The PoC was architected as a modular microservices system deployed in Google Cloud Platform, utilizing containerized applications for maximum flexibility and scalability. The architecture consists of four distinct layers.

The Data Collection Layer comprises several key components: Marketplace API integration is achieved using Cloud Functions for efficient data harvesting. A product data upload service running on Cloud Run ensures scalable processing. Review data is collected and stored optimally using Cloud Firestore and BigQuery. Finally, competitor data analysis is conducted through serverless functions with Selenium.

The Processing Layer incorporates several key technologies: Image Analysis is powered by Cloud Vision AI and TensorFlow. Sentiment Analysis utilizes Cloud Natural Language services. Text Mining is implemented with NLTK and spaCy. Finally, Keyword Extraction leverages TF-IDF and TextRank algorithms.

The ML Models Layer incorporates advanced techniques for processing visual and textual data. Vision Processing leverages LAION-CLIP for feature extraction. NLP Processing utilizes Vertex AI with PyTorch integration. The recommendation system combines Collaborative Filtering using BigQuery ML, Content-Based Filtering through Vertex AI, a Hybrid Ranking system with LambdaMART, and Real-time Matching using Memorystore and FAISS.

The Output Layer delivers optimization results through Cloud Run, features a Visual Interface built on Material UI, and generates comprehensive Reports using BigQuery and Data Studio.

Results and achievements

High-Throughput Recommendation Engine

Implemented a scalable recommendation engine processing over 100,000 products daily.

Custom Computer Vision for Product Optimization

Developed custom computer vision models for product image analysis and optimization.

Adaptive Taxonomy System

Created an adaptive taxonomy system that can be modified for different product categories and industries.

What happens now

We're actively preparing for production development following a successful PoC that validated our technical approach. We project a 6–9 month implementation timeline with three phases:

Phase 1: Focuses on infrastructure scaling, reliability, and multi-region deployment.

Phase 2: Enhances the core system, deploys scaled ML models, and optimizes real-time processing.

Phase 3: Ensures production readiness with high-availability setup, comprehensive testing, and deployment.

We estimate maintaining our current core team of 8 specialists with a potential addition of 2–3 specialists during peak development phases, with projected ROI within 7–10 months post-implementation. We're looking forward to starting the main development phase as soon as the customer approves technical requirements and implementation timeline.

Estimated Business Impact:

• 65% reduction in manual optimization work

• 35% improvement in product performance

• Processing capacity increase to 5,000+ products/hour

• System response time under 1s

We're actively preparing for production development following a successful PoC that validated our technical approach. We project a 6–9 month implementation timeline with three phases:

Phase 1: Focuses on infrastructure scaling, reliability, and multi-region deployment.

Phase 2: Enhances the core system, deploys scaled ML models, and optimizes real-time processing.

Phase 3: Ensures production readiness with high-availability setup, comprehensive testing, and deployment.

Estimated Business Impact:

• 65% reduction in manual optimization work

• 35% improvement in product performance

• Processing capacity increase to 5,000+ products/hour

• System response time under 1s

We're actively preparing for production development following a successful PoC that validated our technical approach. We project a 6–9 month implementation timeline with three phases:

Phase 1: Focuses on infrastructure scaling, reliability, and multi-region deployment.

Phase 2: Enhances the core system, deploys scaled ML models, and optimizes real-time processing.

Phase 3: Ensures production readiness with high-availability setup, comprehensive testing, and deployment.

Estimated Business Impact:

• 65% reduction in manual optimization work

• 35% improvement in product performance

• Processing capacity increase to 5,000+ products/hour

• System response time under 1s

Scalability & Adaptability

While initially developed for e-commerce optimization, the system's architecture can be adapted for various data-intensive industries requiring pattern recognition and recommendation generation. The core technologies – computer vision, natural language processing, and predictive analytics – can be repurposed for applications ranging from retail to agriculture.