InFeeo
Language

Should I Commit and Publish the Results? [R](reddit.com)

×
Link preview Should I Commit and Publish the Results? [R] Hello Reddit I've been working on QSPR (Quantitative Structure-Property Relationship) analysis for chemical compounds mentioned in the Jean-Claude Bradley Open Melting Point Dataset. Basically the idea is to see how accurate a model can predict melting points of compounds using only topological indices. After some work on the topological indices (feature engineering), each compound was represented by 26 features. I trained a random forest model on the data and got a test r2 score of 0.66 (which is pretty respectable, given the constraints). However, the file size of the model was around 1.23GB. I didn't like it being that big, so I opened up PyTorch to build a custom deep learning architecture that could make predictions as accurately as the random forest but with much smaller file size. After around 2 weeks of research, I build a 270,000 learnable parameter model (1.3-1.4MB according to torchinfo) that got an r2 score 0f 0.6399. Given all this context, I wanted to ask the following question: Should I commit and work on publishing the results, or should I keep working on improving the model? Note: I'm obligated by my university to not give out intricate details of my research before publication, so please forgive me if such details are required for a high quality answer. However, I can give out the metrics achieved by my little deep learning model. Here it is: === Evaluation Metrics (Expected Value) === R² Score : 0.639910 MAE : 41.246754 MSE : 2989.062744 RMSE : 54.672322 NRMSE : 0.083469 MAPE : 11.69% The unit for MAE, MSE, RMSE and NRMSE is Kelvin (K). submitted by /u/AgiGamesYT [link] [Kommentare] reddit.com · reddit.com
Hello Reddit I've been working on QSPR (Quantitative Structure-Property Relationship) analysis for chemical compounds mentioned in the Jean-Claude Bradley Open Melting Point Dataset. Basically the idea is to see how accurate a model can predict melting points of compounds using only topological indices. After some work on the topological indices (feature engineering), each compound was represented by 26 features. I trained a random forest model on the data and got a test r2 score of 0.66 (which is pretty respectable, given the constraints). However, the file size of the model was around 1.23GB. I didn't like it being that big, so I opened up PyTorch to build a custom deep learning architecture that could make predictions as accurately as the random forest but with much smaller file size. After around 2 weeks of research, I build a 270,000 learnable parameter model (1.3-1.4MB according to torchinfo) that got an r2 score 0f 0.6399. Given all this context, I wanted to ask the following question: Should I commit and work on publishing the results, or should I keep working on improving the model? Note: I'm obligated by my university to not give out intricate details of my research before publication, so please forgive me if such details are required for a high quality answer. However, I can give out the metrics achieved by my little deep learning model. Here it is: === Evaluation Metrics (Expected Value) === R² Score : 0.639910 MAE : 41.246754 MSE : 2989.062744 RMSE : 54.672322 NRMSE : 0.083469 MAPE : 11.69% The unit for MAE, MSE, RMSE and NRMSE is Kelvin (K). submitted by /u/AgiGamesYT [link] [Kommentare]

Comments

Log in Log in to comment.

No comments yet.