Reinforcement learning for solving optimization problems: Opportunities and limitations on the example of the assignment problem

Misztal, Wojciech; Nazarewicz, Sybilla

doi:10.35784/acs_8031

Reinforcement learning for solving optimization problems: Opportunities and limitations on the example of the assignment problem.

AUT.WOJCIECH MISZTAL, AUT. KORESP.SYBILLA NAZAREWICZ.

Opis bibliograficzny

Reinforcement learning for solving optimization problems: Opportunities and limitations on the example of the assignment problem. [AUT.] WOJCIECH MISZTAL, [AUT. KORESP.] SYBILLA NAZAREWICZ. Appl. Comput. Sci. 2026, Vol. 22 Nr 1 s.47-62, il., bibliogr., sum. DOI: 10.35784/acs_8031

Skopiowane!

Kliknij opis aby skopiować do schowka

Szczegóły publikacji

Źródło:

APPLIED COMPUTER SCIENCE 2026, Vol. 22 Nr 1, s.47-62

Rok:2026

Język:Angielski

Charakter formalny:Artykuł w czasopismie

Typ MNiSW/MEiN:praca oryginalna

Streszczenia

The application of reinforcement learning techniques to optimization problems has gained increasing attention due to their adaptability, generalization potential, and capacity to handle complex decision-making processes. This study explores the opportunities and limitations of Q-learning, in the context of the classical Assignment Problem, which plays an important role in transportation logistics and resource allocation scenarios. Four variants of the algorithm were developed and evaluated: a basic version, a version incorporating min-max normalization of cost values, a long-term profitability strategy, and a backward optimization approach. For each of the algorithms, the hyperparameters were optimized using the Optuna library and tests were performed on randomly generated cost matrices of varying dimensions (5, 10, 50, 100, and 200). The quality of the solutions was evaluated based on degradation relative to the optimal objective function value. The time to generate solutions was also measured. The results indicate significant differences in the capabilities of different algorithm variants. The basic Q-learning version is characterized by limited effectiveness and high variability, particularly for larger problem instances. Normalization improved computational efficiency and reduced variance, but did not lead to substantial improvements in solution quality for more complex cases. In contrast, the long-term profitability variant demonstrated notable improvements in both solution quality and stability, especially for smaller and medium-sized problems. The backward optimization variant yielded the highest overall solution quality.

Open Access

Tryb dostępu:otwarte czasopismoWersja tekstu:ostateczna wersja opublikowanaLicencja: Creative Commons - Uznanie Autorstwa (CC-BY) Czas udostępnienia:w momencie opublikowania

Linki zewnętrzne

PBN

69fad7c19bd4585f52dc8fac

DOI

10.35784/acs_8031

Strona WWW

https://ph.pollub.pl/index.php/acs/arti…

Identyfikatory

BPP ID: (46, 53622) wydawnictwo ciągłe #53622

Metryki

70,00

Punkty MNiSW/MEiN

0

Impact Factor

Eksport cytowania

Wsparcie dla menedżerów bibliografii:
Ta strona wspiera automatyczny import do Zotero, Mendeley i EndNote. Użytkownicy z zainstalowanym rozszerzeniem przeglądarki mogą zapisać tę publikację jednym kliknięciem - ikona pojawi się automatycznie w pasku narzędzi przeglądarki.