Disentangling the Galactic binary zoo: Machine learning classification of stellar remnant binaries in LISA data
Author(s)
Tay, Irwin Khai Cheng, Korol, Valeriya, Lechien, Thibault
Abstract
The Laser Interferometer Space Antenna (LISA) will open a new observational window in the millihertz gravitational-wave band, enabling the detection of tens of thousands of compact stellar remnant binaries across the Milky Way. Most of LISA's sources will be double white dwarf (WDWD) systems, while neutron star-white dwarf (NSWD) binaries and higher-mass systems will be orders of magnitude rarer but of significant astrophysical interest. Disentangling these populations is challenging due to the strong overlap in their gravitational-wave features. In this work, we investigate the use of machine-learning techniques to classify LISA-detectable binaries based solely on LISA observables. Using mock catalogues of Galactic binaries constructed from population-synthesis studies, we evaluate a range of machine-learning classifiers. We find that ensemble-based methods-particularly gradient-boosting algorithms such as XGBoost-deliver the best performance on our highly imbalanced dataset. WDWD systems are identified with a recall of $\sim 99\%$, reflecting their dominant presence, and high-mass binaries are also classified with high recall ($\ge 85\%$). In contrast, NSWD systems remain the most challenging population to distinguish: their features overlap strongly with those of WDWD binaries, making them particularly prone to misclassification. Despite this, XGBoost correctly identifies 85.6% of NSWD systems in our simulated LISA detections, outperforming simple statistical approaches based on kernel density estimation. We further demonstrate that machine-learning classification can effectively support the interpretation of LISA data, enabling the identification of eccentric binaries and extremely rare subclasses.
Figures
Caption
Distributions of the ten features in the data used for the machine learning classifiers, coloured by true class. Black: WDWD binaries; orange: NSWD binaries; grey: BHBH binaries; red: BHNS binaries; and maroon: NSNS binaries. Each distribution is normalised independently.Caption
Confusion matrix of the multi-class XGBoost classifier evaluated on the main catalogue’s test set. Each entry is row-normalised and colour-coded by value, with bracketed numbers indicating the absolute counts. The classifier demonstrates strong performance in separating the high-mass binaries (BHBH, BHNS, NSNS), while classification of the low-mass binaries (WDWD, NSWD) remains more challenging. The distinction between WDWD and NSWD is the most difficult, with approximately $25\%$ of NSWD class incorrectly predicted as WDWD.Caption
Performance comparison of machine-learning classifiers evaluated on the test set including NSWD and WDWD binaries only. In both plots, the top-right corner represents the ideal performance.Caption
Feature distributions for correctly (blue) and incorrectly (red) predicted low-mass binary systems (WDWD and NSWD) by XGBoost binary classifier evaluated on the main catalogue's test set. The inset in the upper-right corner shows a SHAP summary plot illustrating the impact of the ten input features on the classifier's output for each system in the low-mass component test set. The features are ranked on the y-axis in descending order of average absolute importance. The x-axis shows the SHAP value, indicating the feature's contribution to the output, where a positive value pushes the prediction towards the positive class (NSWD) and a negative value pushes it towards the other (WDWD). A SHAP value of 0, marked by the vertical grey line, represents the baseline and indicates the feature had no impact on that specific prediction. Each point corresponds to an individual system from the test set, coloured by its normalised feature value, from low (dark blue) to high (yellow).Caption
Confusion matrices evaluated on the low-mass population test set for the XGBoost (purple) and KDE (grey) classifiers. Each entry is row-normalised and colour-coded by value, with bracketed numbers indicating the absolute counts. The results show that XGBoost performs significantly better, correctly predicting 85.6\% of NSWD systems compared to 62.2\% for KDE. For the same population, XGBoost predicts 360 NSWD systems, whereas KDE predicts 312.Caption
Confusion matrices evaluated on the low-mass population test set for the XGBoost (purple) and KDE (grey) classifiers. Each entry is row-normalised and colour-coded by value, with bracketed numbers indicating the absolute counts. The results show that XGBoost performs significantly better, correctly predicting 85.6\% of NSWD systems compared to 62.2\% for KDE. For the same population, XGBoost predicts 360 NSWD systems, whereas KDE predicts 312.Caption
LISA-detectable Galactic WDWD (grey) populations and NSWD (blue and red) populations of a four-year LISA observation period shown in the gravitational-wave frequency–characteristic strain parameter space, with the LISA sensitivity curve plotted and shaded in purple. The two NSWD scatter plots represent the correctly (blue) and incorrectly (red) predicted NSWD systems by the XGBoost classifier evaluated on the test catalogue.Caption
The confusion matrices for the two experiments described in \cref{sec: eccaftergolbal}. Experiment 1 (top) presents the confusion matrix for the evaluation of the XGBoost classifier trained without the feature eccentricity, applied to classify WDWD and NSWD binaries. Experiment 2 (bottom) shows the confusion matrix for the classifier trained without the eccentricity feature but using labels \texttt{NoEcc} (for binaries with zero eccentricity) and \texttt{Ecc} (for binaries with non-zero eccentricity). Each entry is row-normalised and colour-coded by value, with bracketed numbers indicating the absolute counts.Caption
The confusion matrices for the two experiments described in \cref{sec: eccaftergolbal}. Experiment 1 (top) presents the confusion matrix for the evaluation of the XGBoost classifier trained without the feature eccentricity, applied to classify WDWD and NSWD binaries. Experiment 2 (bottom) shows the confusion matrix for the classifier trained without the eccentricity feature but using labels \texttt{NoEcc} (for binaries with zero eccentricity) and \texttt{Ecc} (for binaries with non-zero eccentricity). Each entry is row-normalised and colour-coded by value, with bracketed numbers indicating the absolute counts.References
- Amaro-Seoane, P., Andrews, J., Arca Sedda, M., et al. 2023, Living Reviews in
- Relativity, 26, 2
- Auclair, P., Bacon, D., Baker, T., et al. 2023, Living Reviews in Relativity, 26, 5
- Bailer-Jones, C. A. L., Fouesneau, M., & Andrae, R. 2019, Monthly Notices of the Royal Astronomical Society, 490, 5615–5633
- Barausse, E., Berti, E., Hertog, T., et al. 2020, General Relativity and Gravitation, 52, 81
- Berbel, M., Miravet-Tenés, M., Sharma Chaudhary, S., et al. 2024, Classical and
- Quantum Gravity, 41, 085012
- Bergstra, J. & Bengio, Y. 2012, Journal of Machine Learning Research, 13, 281
- Breiman, L. 2001, Mach. Learn., 45, 5–32
- Breivik, K., Coughlin, S., Zevin, M., et al. 2020, ApJ, 898, 71
- Broekgaarden, F. S., Berger, E., Neijssel, C. J., et al. 2021, MNRAS, 508, 5028
- Broekgaarden, F. S., Berger, E., Stevenson, S., et al. 2022, MNRAS, 516, 5737
- Chen, T. & Guestrin, C. 2016, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16
- (New York, NY, USA: Association for Computing Machinery), 785–794
- Chen, W.-C., Liu, D.-D., & Wang, B. 2020, ApJ, 900, L8
- Colpi, M., Danzmann, K., Hewitson, M., et al. 2024, arXiv e-prints, arXiv:2402.07571
- Cortes, C. & Vapnik, V. 1995, Mach. Learn., 20, 273–297
- Cover, T. & Hart, P. 1967, IEEE Transactions on Information Theory, 13, 21
- Deng, S., Babak, S., Le Jeune, M., et al. 2025, Phys. Rev. D, 111, 103014
- Duan, T., Avati, A., Ding, D. Y., et al. 2020, NGBoost: Natural Gradient Boosting for Probabilistic Prediction
- Garnett, R. 2023, Bayesian optimization (Cambridge University Press)
- Goodenough, L. & Hooper, D. 2009, arXiv e-prints, arXiv:0910.2998
- Hooper, D. 2023, SciPost Phys. Proc., 006
- Hooper, D. & Goodenough, L. 2011, Physics Letters B, 697, 412
- Hooper, D. & Linden, T. 2011, Phys. Rev. D, 84, 123005
- Iben, Jr., I. & Livio, M. 1993, PASP, 105, 1373
- Igoshev, A. P., Chruslinska, M., Dorozsmai, A., & Toonen, S. 2021, MNRAS, 508, 3345
- Karnesis, N., Babak, S., Pieroni, M., Cornish, N., & Littenberg, T. 2021, Phys. Rev. D, 104, 043019
- Katz, M. L., Danielski, C., Karnesis, N., et al. 2022, MNRAS, 517, 697
- Katz, M. L., Karnesis, N., Korsakova, N., Gair, J. R., & Stergioulas, N. 2025, Phys. Rev. D, 111, 024060
- Korol, V., Buscicchio, R., Pakmor, R., et al. 2024a, A&A, 691, A44
- Korol, V., Hallakoun, N., Toonen, S., & Karnesis, N. 2022, MNRAS, 511, 5936
- Korol, V. & Igoshev, A. 2026, A&A, 705, A154
- Korol, V., Igoshev, A. P., Toonen, S., et al. 2024b, MNRAS, 530, 844
- Korol, V., Rossi, E. M., Groot, P. J., et al. 2017, MNRAS, 470, 1894
- Lackeos, K., Littenberg, T. B., Cornish, N. J., & Thorpe, J. I. 2023, A&A, 678, A123
- Lamberts, A., Blunt, S., Littenberg, T. B., et al. 2019, MNRAS, 490, 5888
- Lamberts, A., Garrison-Kimmel, S., Hopkins, P. F., et al. 2018, MNRAS, 480, 2704
- Lau, M. Y. M., Mandel, I., Vigna-Gómez, A., et al. 2020, MNRAS, 492, 3061
- Li, Z., Chen, X., Ge, H., Chen, H.-L., & Han, Z. 2023, A&A, 669, A82
- LISA Science Requirements Document. 2018, LISA Science Requirements Document, Tech. Rep. ESA-L3-EST-SCI-RS-001, ESA, www.cosmos.esa.int/ web/lisa/lisa-documents/
- Littenberg, T. B. & Cornish, N. J. 2023, Phys. Rev. D, 107, 063004
- Lorimer, D. R. 2008, Living Reviews in Relativity, 11, 8
- Lundberg, S. M. & Lee, S.-I. 2017, 4768–4777
- Middleton, H., Kolitsidou, P., Klein, A., et al. 2025, arXiv e-prints, arXiv:2507.11442
- Moore, C. J., Finch, E., Klein, A., et al. 2024, MNRAS, 531, 2817
- Muehleisen, R. T. & Bergerson, J. 2016
- Nelemans, G. & Tout, C. A. 2005, MNRAS, 356, 753
- Nelemans, G., Verbunt, F., Yungelson, L. R., & Portegies Zwart, S. F. 2000, A&A, 360, 1011
- Nelemans, G., Yungelson, L. R., & Portegies Zwart, S. F. 2001, A&A, 375, 890
- Nissanke, S., Vallisneri, M., Nelemans, G., & Prince, T. A. 2012, ApJ, 758, 131
- O’Doherty, T. N., Bahramian, A., Miller-Jones, J. C. A., et al. 2023, MNRAS, 521, 2504
- Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal of Machine
- Learning Research, 12, 2825
- Peters, P. C. & Mathews, J. 1963, Physical Review, 131, 435
- Popov, S., Müller, B., & Mandel, I. 2025, New Astronomy Reviews, 101, 101734
- Portegies Zwart, S. F. & Verbunt, F. 1996, A&A, 309, 179
- Rajamuthukumar, Abinaya Swaruba, Korol, Valeriya, Stegmann, Jakob, et al.
- 2025, A&A, 704, A156
- Riley, J., Agrawal, P., Barrett, J. W., et al. 2022, ApJS, 258, 34
- Schafer, R. W. 2011, IEEE Signal Processing Magazine, 28, 111
- Sesana, A., Lamberts, A., & Petiteau, A. 2020, MNRAS, 494, L75
- Seto, N. 2001, Phys. Rev. Lett., 87, 251101
- Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., & de Freitas, N. 2016, Proceedings of the IEEE, 104, 148
- Shapley, L. S. et al. 1953
- Silverman, B. W. & Jones, M. C. 1989, International Statistical Review / Revue
- Internationale de Statistique, 57, 233
- Snoek, J., Larochelle, H., & Adams, R. P. 2012, in Advances in Neural Information
- Processing Systems, ed. F. Pereira, C. Burges, L. Bottou, & K. Weinberger, Vol. 25 (Curran Associates, Inc.)
- Strub, S. H., Ferraioli, L., Schmelzbach, C., Stähler, S. C., & Giardini, D. 2024, Phys. Rev. D, 110, 024005
- Tang, P., Eldridge, J. J., Meyer, R., et al. 2024, MNRAS, 534, 1707
- Tauris, T. M. 2011, in Astronomical Society of the Pacific Conference Series, Vol.
- 447, Evolution of Compact Binaries, ed. L. Schmidtobreick, M. R. Schreiber, & C. Tappert, 285
- Thiele, S., Breivik, K., Sanderson, R. E., & Luger, R. 2023, ApJ, 945, 162
- Toonen, S., Hollands, M., Gänsicke, B. T., & Boekholt, T. 2017, A&A, 602, A16
- Toonen, S., Nelemans, G., & Portegies Zwart, S. 2012, A&A, 546, A70
- Toonen, S., Perets, H. B., Igoshev, A. P., Michaely, E., & Zenati, Y. 2018, A&A, 619, A53
- Valli, R., de Mink, S. E., Justham, S., et al. 2025, arXiv e-prints, arXiv:2505.08857 van der Sluys, M. V., Verbunt, F., & Pols, O. R. 2006, A&A, 460, 209
- Vigna-Gómez, A. 2025, A&A, 701, L3
- Wagg, T., Broekgaarden, F. S., de Mink, S. E., et al. 2022, ApJ, 937, 118
- Wagg, T., Dalcanton, J. J., Renzo, M., et al. 2025, AJ, 170, 192
- Zahn, J.-P. 1977, A&A, 57, 383
- Appendix A: Details of implementation
- Appendix A.1: Binary Classifiers
- In this section, we elaborate on the methods and parameter choices for the binary classifiers not discussed in the main text.
- Table A.1 lists the optimisation and probability calibration techniques applied to each model. For most classifiers, Bayesian optimisation is used to identify the optimal set of hyper-parameters.
- The exceptions are as follows. For the KDE method, we used a simple grid search to serve as a baseline. For the KNN classifier, we selected 𝑘 = 5 in order to optimize its 𝐹1 score1. The configurations of the NN and NGBoost were determined based on preliminary experimentation and design considerations.
- For our custom NN, we employed a focal loss function to address class imbalance. To further help with this challenge, we generated an additional 100 NSWD samples for each NSWD binary system by sampling from the uncertainties derived from the
- Fisher information matrix. Each NSWD data point was weighted inversely proportional to the number of occurrences of its source system to reduce bias introduced by multiple harmonics originating from the same binary. The network architecture consists of an input layer, six hidden layers, and an output layer. ReLU activations are used between layers, and 30% dropout was applied in the first three hidden layers and subsequently 20% dropout was applied to the last three hidden layers the network. The final output layer employs a sigmoid activation. Training was performed for 100 epochs with a batch size of 128 for both training and inference.
- For NGBoost, we used the default implementation without additional modifications. Lastly, for our custom GMM, we followed the approach described in Bailer-Jones et al. (2019). We incorporated custom class priors based on the relative abundance of WDWD and NSWD systems in the Milky Way, set to
- {0 : 0.96, 1 : 0.04}. Since the GMM classifier already incorporates prior information in its probability estimates, no additional probability calibration was applied.
- The optimised decision threshold values, as described in Section 3.2, which set the required predicted probabilities before classifying a system as NSWD, are shown in Table A.2.
- Appendix A.2: Bayesian Optimisation
- All of our classifiers have one or more hyper-parameters, which are configuration variables that govern the structure and learning behaviour of a model, but are not directly learned from the training data. Examples include the number of trees in a RF, the learning rate in gradient boosting algorithms, or the kernel type in an SVM. Traditional hyper-parameter optimisation often involves exhaustive grid searches, which systematically evaluate all possible combinations within a predefined parameter space. However, this is often inefficient and random searches could perform better than grid searches, especially when the dimensionality and complexity increases (Bergstra & Bengio 2012).
- In contrast, Bayesian Optimisation (Garnett 2023) frequently produces better results with fewer evaluations (Snoek et al. 2012;
- In contrast, Bayesian Optimisation (Garnett 2023) frequently produces better results with fewer evaluations (Snoek et al. 2012;
- Shahriari et al. 2016). Its usage and popularity have risen considerably in recent years, to the extent that widely adopted libraries such as scikit-learn now provide dedicated functions for Bayesian Optimisation (Pedregosa et al. 2011), which is what we used for our optimisation process.
- 1 𝐹1 = 2TP
- 2TP+FP+FN