Project Overview
In the FFPlus project we explore the use of multimodal Large Language Models (LLMs) for extracting metadata from rendered 2D CAD drawings within industrial data exchange systems. By integrating vision–language reasoning with high-resolution CAD renderings, the project aims to overcome the limitations of traditional OCR systems in handling complex, text-rich engineering layouts.
The study benchmarks state-of-the-art multimodal models such as LLaVAR and Phi-3.5 on industrial datasets, combines them with synthetic data generation for domain-specific fine-tuning, and evaluates scalable training strategies on European HPC infrastructure. The resulting proof-of-concept demonstrates how generative AI can automatically interpret and structure CAD information, enabling secure and efficient data retrieval for manufacturing and construction workflows.
Our Goal
The goal of the FFPlus project is to harness multimodal large language models for automated metadata extraction from complex CAD renderings, bridging the gap between visual and textual understanding in industrial design data. By fine-tuning vision–language models on domain-specific and synthetic datasets, LLM-CAD aims to enable accurate, scalable, and confidential AI-driven knowledge retrieval for digital manufacturing, construction, and data-exchange ecosystems


