Blockchain
NVIDIA Introduces DoRA: A Superior Tuning Method for AI Models
NVIDIA announced the development of a new fine-tuning method called DoRA (Weight-Decomposed Low-Rank Adaptation), which offers a high-performance alternative to the widely used Low-Rank Adaptation (LoRA). According to NVIDIA Tech BlogDoRA improves both the learning ability and stability of LoRA without introducing any additional inference overhead.
Benefits of DoRA
DoRA has demonstrated significant performance improvements in various large language models (LLMs) and vision language models (VLMs). For example, in common-sense reasoning tasks, DoRA outperformed LoRA with improvements such as +3.7 points on Llama 7B and +4.4 points on Llama 3 8B. Furthermore, DoRA showed better results in multi-turn benchmarks, image/video text comprehension, and visual instruction optimization tasks.
This innovative method has been accepted as an oral paper at ICML 2024, underlining its credibility and potential impact on the field of machine learning.
DoRA mechanics
DoRA works by breaking down the pre-trained weight into its magnitude and directional components, fine-tuning both. The method leverages LoRA for directional adaptation, ensuring efficient tuning. After the training process, DoRA merges the optimized components back into the pre-trained weight, avoiding any additional latency during inference.
Visualizations of magnitude and directional differences between DoRA and pre-trained weights reveal that DoRA makes substantial directional changes with minimal magnitude changes, closely resembling full fine-tuning (FT) learning models.
Performance between models
In various performance benchmarks, DoRA consistently outperforms LoRA. For example, in large language models, DoRA significantly improves common sense reasoning and conversation/instruction following. In visual language models, DoRA shows superior results in text-to-image and text-to-video understanding, as well as visual instruction optimization tasks.
Large-scale language models
Comparative studies highlight that DoRA outperforms LoRA in common sense reasoning benchmarks and multi-turn benchmarks. In testing, DoRA achieved higher average scores across various datasets, indicating its robust performance.
Visual language models
DoRA also excels at visual language modeling, outperforming LoRA on tasks such as understanding text from images, understanding text from videos, and optimizing visual instructions. The effectiveness of the method is evident in the higher average scores across multiple benchmarks.
Compression-compatible LLMs
DoRA can be integrated into the QLoRA framework, improving the accuracy of low-bit pre-trained models. Collaborative efforts with Answer.AI on the QDoRA project demonstrated that QDoRA outperforms both FT and QLoRA on the Llama 2 and Llama 3 models.
Text-image generation
The application of DoRA extends to personalizing text in images with DreamBooth, producing significantly better results than LoRA in complex datasets such as 3D icons and Lego sets.
Implications and future applications
DoRA is poised to become the default choice for tuning AI models, compatible with LoRA and its variants. Its efficiency and effectiveness make it a valuable tool for adapting base models to various applications, including NVIDIA Metropolis, NVIDIA NeMo, NVIDIA NIM, and NVIDIA TensorRT.
For more detailed information, please visit NVIDIA Tech Blog.
Image source: Shutterstock