Task Completed: Dataset Collection and Preprocessing

The second week was dedicated to the collection and preprocessing of datasets for training the deep learning model. The datasets collected included:

  • FaceForensics++
  • DeepFake Detection Challenge (DFDC)
  • Celeb-DF v2
  • Kaggle Deepfake Datasets (various open-source collections)

These datasets include both real and manipulated facial images, which are crucial for binary classification tasks (real vs. fake).

Preprocessing Steps Involved:

  • Resizing Images to a standard input size (e.g., 224x224 pixels) compatible with CNN architectures.
  • Normalization of pixel values to bring them into a consistent range (0–1).
  • Augmentation Techniques such as flipping, rotating, zooming, and color jittering were applied to artificially expand the dataset and reduce overfitting.
  • Class Balancing was considered to ensure equal representation of real and fake images, helping in better model generalization.

All preprocessing steps were implemented using Python libraries such as OpenCV and TensorFlow/Keras preprocessing modules.