Cyberbullying Detection

Distributed Training Project with TensorFlow

This project implements industry best practices for distributed training using TensorFlow. It includes various components and tools to optimize and streamline the training process.

Features:

  • Data Versioning: Utilizing DVC for data versioning to track changes and manage datasets effectively.
  • Model Versioning: Employing MLFlow for model versioning to track model performance and improvements over time.
  • Distributed Training: Implementing distributed training using Parameter Strategy on Instance Group for faster training on multiple GPUs.
  • Containerization: Dockerizing the project for easy deployment and reproducibility across different environments.
  • Optimized Services: Leveraging Google Cloud Platform (GCP) services such as GCS and Instances for efficient storage and computation.
  • Project Configuration: Utilizing Hydra for project configuration to manage complex configuration settings easily.
  • Package Management: Utilizing Poetry for package management to handle project dependencies and versions effectively.
  • Data Partitioning: Using Dask for data partitioning to enable parallel processing and scale data analysis.
  • Fully Automated, Scalable, and Extendable: The project is fully automated, scalable, and designed to be easily extended with new features.

Repositories:

  1. Data Versioning:
  2. Data Preparation and Tokenizer:
  3. Cyber Bullying Detection Model: