Title Estimacija osnovnih energijskih razina molekula korištenjem ansambl metoda
Title (english) Estimation of ground state energies of molecules using ensemble methods
Author Luka Jurinčić
Mentor Zlatan Car (mentor)
Mentor Nikola Anđelić (komentor)
Committee member Saša Vlahinić (predsjednik povjerenstva)
Committee member Zlatan Car (član povjerenstva)
Committee member Ivan Volarić (član povjerenstva)
Granter University of Rijeka Faculty of Engineering Rijeka
Defense date and country 2024-07-16, Croatia
Scientific / art field, discipline and subdiscipline TECHNICAL SCIENCES Electrical Engineering Automation and Robotics
Abstract U ovom se radu razmatra problem estimacije osnovne energetske razine molekula koristeći ansambl metode. Na početku je rada dan kratak uvod u problematiku osnovne energetske razine molekula. Slijedi analiza literature i opis korištenog dataseta. U ovom se radu izvodi dodatna modifikacija dosad razmatranih dataseta na način da se za ulazne varijable uzimaju osnovni podaci o molekuli. U drugom dijelu rada analiziraju se ansambl algoritmi i različite metode skaliranja dataseta. Objašnjen je način rada svakog ansambl algoritma i opisani su principi svake metode skaliranja. U ovom radu razmatrani su sljedeći ansambl algoritmi: Decision Tree, Random Forest, Extra Tree, AdaBoost, HistGradientBoost, XGBoost, Bagging, Voting i Stacking regresori. Grafički su prikazani utjecaji pojedinih metoda skaliranja na ulazne podatke. Korištene su sljedeće metode skaliranja: Standard Scaler, Power Transformer, Normalizer, Robust Scaler, MinMax Scaler i MaxAbs Scaler. Napravljena je detaljna analiza korištenih evaluacijskih metrika koje služe kao pokazatelji točnosti estimacije, te se koriste za usporedbu učinkovitosti različitih modela. Nakon ove analize uslijedio je opis procedure treniranja i optimizacije hiperparametara pojedinih ansambl algoritama. Za treniranje modela koristila se unakrsna validacija, dok su se za optimizaciju hiperparametara koristile metode GridSearchCV i RandomizedSearchCV. U drugom dijelu rada prezentirani su dobiveni rezultati. Promatrao se utjecaj veličine foldova unakrsne validacije na točnost estimacije ansambl algoritama i međusobno su uspoređeni rezultati pojedinih ansambl metoda. Za usporedbu ansambl metoda korištene su sljedeće metrike: R2, MAE, MAPE, MSE, RMS i KGE. Najbolje rezultate estimacije osnovne energetske razine molekula ostvarili su XGBoost, Voting i Stacking regresori. Na kraju rada objašnjene su neke prednosti i nedostatci korištenih algoritama, te su se doneseni sljedeći zaključci: kvalitetnim odabirom broja foldova koji se koriste pri unakrsnoj validaciji moguće je ostvariti veću točnost estimacije i spajanje odabranih ansambl metoda u jedan ansambl ostvaruje povećanje točnosti estimacije. U zaključku se predlaže jedan od mogućih načina unaprijeđenja modela, a to je korištenje umjetne neuronske mreže u kojoj je skriveni sloj izgrađen od optimiziranih ansambl metoda.
Abstract (english) The main focus of this paper is using ensemble methods to estimate the ground state energies of molecules. In the first part we analyze different literatures that focus on issues that are encountered when predicting ground state energies of molecules. Dataset which is used in this paper is created by using basic information about molecules from PubChem database. First we create a few modified versions of dataset using various scaling methods such as: Standard Scaler, Power Transformer, Normalizer, Robust Scaler, MinMax Scaler and MaxAbs Scaler. These modified datasets will be used to train ensemble algorithms and we want to observe the impact that different scaling methods have on original dataset and how that reflects onto the precision of ground state energy estimations. The ensemble methods used in this paper are: Decision Tree, Random Forest, Extra Tree, AdaBoost, HistGradientBoost, XGBoost, Bagging, Voting and Stacking regressors. The training for each of these methods is performed on every modified dataset. The training is done by using cross-validation and finding optimal hyperparameters. To find optimal hyperparameters two functions are used: GridSearchCV and RandomizedSearchCV. The main objectives are to determine the impact that number of folds in cross-validation have on the model accuracy and to determine if we can improve the model accuracy by combining multiple basic ensembles into one ensemble. To compare the ensemble methods we use the following evaluation metrics: R2, MAE, MAPE, MSE, RMS and KGE. The second part of the paper focuses on analyzing the acquired results. By comparing evaluation metrics for each of the ensemble methods used we come to a conclusion that XGBoost, Voting and Stacking regressors perform the best and that by combining multiple ensemble methods into one ensemble we can increase the model performance. When analyzing the impact that number of folds have on the accuracy of the model we came to a conclusion that the higher number of folds doesn’t automatically result in higher model accuracy, which means that when using cross-validation it is necessary to find the optimal number of folds to maximize model performance. Lastly we compare the best ensemble methods and explain their strengths and weaknesses and offer an idea to improve the results of ensemble methods by using them in a hidden layer of an artificial neural network (ANN).
Keywords
osnovna energetska razina molekula
ansambl metode
metode skaliranja
unakrsna validacija
optimizacija hiperparametara
evaluacijske metrike
algoritmi umjetne inteligencije
Keywords (english)
ground state energy
ensemble methods
scaling methods
cross-validation
hyperparameter optimization
evaluation metrics
artificial intelligence algorithms
Language croatian
URN:NBN urn:nbn:hr:190:510807
Study programme Title: Graduate University Study of Electrical Engineering Study programme type: university Study level: graduate Academic / professional title: magistar/magistra inženjer/inženjerka elektrotehnike (magistar/magistra inženjer/inženjerka elektrotehnike)
Type of resource Text
File origin Born digital
Access conditions Open access
Terms of use
Created on 2024-07-14 09:03:18