Hydra: Multi-head low-rank adaptation for parameter efficient fine-tuning.

Kim, Sanghyeon; Yang, Hyunmo; Kim, Yunghyun; Hong, Youngjoon; Park, Eunbyung

Kim, Sanghyeon; Yang, Hyunmo; Kim, Yunghyun; Hong, Youngjoon; Park, Eunbyung.

Afiliación

Kim S; Department of Electrical and Computer Engineering, Sungkyunkwan University, 2066, Seoubu-ro, Suwon, 16419, Republic of Korea. Electronic address: shkim960520@skku.edu.
Yang H; Department of Artificial Intelligence, Sungkyunkwan University, 2066, Seoubu-ro, Suwon, 16419, Republic of Korea. Electronic address: cms8033@skku.edu.
Kim Y; Department of Artificial Intelligence, Sungkyunkwan University, 2066, Seoubu-ro, Suwon, 16419, Republic of Korea. Electronic address: yhyun225@skku.edu.
Hong Y; Department of Mathematical Sciences, KAIST, 291, Daehak-ro, Daejeon, 34141, Republic of Korea. Electronic address: hongyj@kaist.ac.kr.
Park E; Department of Electrical and Computer Engineering, Sungkyunkwan University, 2066, Seoubu-ro, Suwon, 16419, Republic of Korea; Department of Artificial Intelligence, Sungkyunkwan University, 2066, Seoubu-ro, Suwon, 16419, Republic of Korea. Electronic address: epark@skku.edu.

Neural Netw ; 178: 106414, 2024 Oct.

Article en En | MEDLINE | ID: mdl-38936110

ABSTRACT

ABSTRACT

The recent surge in large-scale foundation models has spurred the development of efficient methods for adapting these models to various downstream tasks. Low-rank adaptation methods, such as LoRA, have gained significant attention due to their outstanding parameter efficiency and no additional inference latency. This paper investigates a more general form of adapter module based on the analysis that parallel and sequential adaptation branches learn novel and general features during fine-tuning, respectively. The proposed method, named Hydra, combines parallel and sequential branch to integrate capabilities, which is more expressive than existing single branch methods and enables the exploration of a broader range of optimal points in the fine-tuning process. In addition, the proposed method explicitly leverages the pre-trained weights by performing a linear combination of the pre-trained features. It allows the learned features to have better generalization performance across diverse downstream tasks. Furthermore, we perform a comprehensive analysis of the characteristics of each adaptation branch with empirical evidence. Through an extensive range of experiments, we substantiate the efficiency and demonstrate the superior performance of Hydra. This comprehensive evaluation underscores the potential impact and effectiveness of Hydra in a variety of applications. The source code of this work is publicly opened on https//github.com/extremebird/Hydra.

Asunto(s)

Redes Neurales de la Computación; Algoritmos

Palabras clave

Adapter; Parameter efficient fine-tuning; Transformer

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Redes Neurales de la Computación Idioma: En Revista: Neural Netw Asunto de la revista: NEUROLOGIA Año: 2024 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google