Welding Defect Detection

Detecting welding defects using CycleGAN-generated synthetic data and YOLOv5 with two-stage transfer learning — overcoming the absence of real defect datasets in manufacturing.

Timeline 2023 – 2024
Role First Author
Research Design & Experiments
Advisor Jewon Kang
Ewha Womans University
Presented at — IPIU (Image Processing and Image Understanding), Feb 2024

Welding is a critical manufacturing process where defects can degrade product performance or, in severe cases, cause safety hazards. Traditional inspection relies on visual examination and non-destructive testing — methods that are time-consuming, costly, and highly dependent on the inspector's individual experience and skill.

Deep learning offers a path toward automated defect detection, but training such models requires large, high-quality datasets. In welding, this data simply does not exist publicly: real defect images are restricted due to security concerns and the high cost of collecting them in production environments.

The core insight: We couldn't collect the data, so we manufactured it. Rather than waiting for access to restricted real-world defect images, we repurposed publicly available X-ray radiographs — using them as a structural proxy for defect geometry — and used CycleGAN to translate them into the RGB visual domain of actual welding environments. The research contribution isn't just the detection accuracy; it's the design of a data generation pipeline that sidesteps a fundamental access barrier.

87.5%
mAP (IoU 0.5) with two-stage transfer learning
+9.1%p vs. no transfer learning
1,462
Synthetic training images generated via CycleGAN
Augmented from 450 base images
3
Defect types: crack, porosity, spatter
From GD-Xray & RIAWELC datasets

The core challenge was generating realistic training data without access to real defect images. We designed a pipeline that transforms publicly available X-ray radiographs into RGB welding inspection images, then trains a detection model through progressive domain adaptation.

Welding defect detection pipeline: 6-step methodology from data collection to detection and evaluation
1

Data Collection

Sourced welding radiograph images from two public datasets: GD-Xray Welds and RIAWELC. These grayscale X-ray images contain labeled defects but differ significantly from the RGB camera images used in actual inspection environments.

2

Image Preprocessing

Applied a four-stage preprocessing pipeline: contrast enhancement to reveal structural details, followed by median filtering (edge-preserving noise removal), Gaussian filtering (overall noise reduction), and wavelet filtering (frequency-level feature refinement). This sequence preserved critical defect features while suppressing artifacts.

Original radiograph
Original
After contrast enhancement
Contrast
After Gaussian filtering
Gaussian
After wavelet filtering
Wavelet
After median filtering
Median
All filters applied
All applied
Effect of each preprocessing stage on a radiograph image. The final image (All applied) combines all four filters — the input to CycleGAN translation.
3

Synthetic Data Generation

Used CycleGAN to translate preprocessed grayscale radiographs into realistic RGB welding images. Generated three defect classes — crack, porosity, and spatter — each reflecting the visual characteristics of real welding defects. CycleGAN's unpaired training capability was key, as no paired radiograph-to-RGB data existed.

Crack defect: X-ray radiograph (left) vs CycleGAN-generated RGB image (right)
Crack
Porosity defect: X-ray radiograph (left) vs CycleGAN-generated RGB image (right)
Porosity
Spatter defect: X-ray radiograph (left) vs CycleGAN-generated RGB image (right)
Spatter
X-ray radiograph (left) → CycleGAN-generated RGB synthetic image (right) for each defect class. Unpaired image translation enables domain adaptation without matched training pairs.
4

Data Augmentation

Expanded the 450 base CycleGAN-generated images to 1,462 training samples and 318 test samples through augmentation techniques, improving model generalization and reducing overfitting risk.

5

Two-stage Transfer Learning

Trained YOLOv5s in three phases: (1) pre-training on COCO for general visual features, (2) intermediate fine-tuning on two morphologically analogous datasets — a Lunar Crater dataset whose circular depression shapes closely resemble welding porosity, and a Building Defects dataset of concrete and structural cracks that mirror welding crack geometry — and (3) final fine-tuning on our synthetic welding defect data. The strategy exploits cross-domain visual analogy: by finding shapes in non-welding domains that look like welding defects, we give the model a stepping stone between general visual knowledge and welding-specific recognition. This is why two-stage transfer outperforms jumping directly to the target domain.

6

Detection & Evaluation

Evaluated YOLOv5s performance across four training conditions using mAP, precision, recall, and F1 score. All images processed at 256×256 resolution on NVIDIA RTX A6000 GPUs.

Welding defect detection results showing crack, porosity, and spatter detection with bounding boxes
Detection results: COCO-only transfer learning (left) vs. two-stage transfer learning (right). Two-stage approach detects defects more precisely with fewer false positives.

Design rationale: Two-stage transfer learning bridges the domain gap progressively — COCO provides general visual features, Crater/Crack datasets introduce defect-like patterns, and the final stage specializes for welding. This staged approach yielded +9.1%p mAP improvement over training without transfer learning.

Methods & Tools
CycleGAN YOLOv5s Transfer Learning Domain Adaptation Wavelet Filtering Data Augmentation NVIDIA RTX A6000
01

Two-stage transfer learning achieves 87.5% mAP

The progressive training strategy (COCO → Crater/Crack → welding defects) achieved the highest performance across all metrics: 87.5% mAP (IoU 0.5), 90.3% precision, 85.1% recall, and 87.6 F1 score. Single-stage COCO transfer reached 85.2% mAP, while training from scratch yielded only 78.4%.

02

Preprocessing is critical for synthetic data quality

Applying the four-stage preprocessing pipeline before CycleGAN translation produced significantly more realistic synthetic images. Without preprocessing, generated images contained artifacts that degraded downstream detection performance. The contrast-then-filter sequence preserved defect features while removing noise that confused the generator.

03

Synthetic data enables training without real defect images

By combining CycleGAN domain translation with progressive transfer learning, we trained an effective defect detector entirely without real welding defect photographs. This approach is applicable to other industrial domains where real defect data is restricted or prohibitively expensive to collect.

Read Paper →

This project solved a fundamental data access problem in manufacturing AI — how to build defect detection when real defect images simply don't exist.

At 87.5% mAP, the model demonstrates that synthetic data generation via CycleGAN, combined with progressive transfer learning, can produce production-viable defect detection without any real defect images. This eliminates the biggest bottleneck in manufacturing AI adoption: the cost and security restrictions around collecting labeled defect data.

The pipeline is domain-agnostic. The same approach — find a publicly available proxy dataset, translate it to the target visual domain, and bridge the gap with intermediate transfer learning — applies to any industrial inspection scenario where real defect data is restricted. Semiconductor wafer inspection, PCB quality control, and infrastructure monitoring all face the same data access barrier.

This project demonstrated that the right data engineering strategy can be more impactful than model architecture innovation. By designing a synthetic data pipeline that sidesteps a fundamental access constraint, we made automated defect detection feasible in environments where it was previously blocked by data availability, not by model capability.

Back to All Projects