TY - JOUR
T1 - JUGE: An infrastructure for benchmarking Java unit test generators
AU - Devroey, Xavier
AU - Gambi, Alessio
AU - Galeotti, Juan Pablo
AU - Just, René
AU - Kifetew, Fitsum Meshesha
AU - Panichella, Annibale
AU - Panichella, Sebastiano
N1 - Funding Information:
We would like to thank (in alphabetical order) Arthur Baars, Sebastian Bauersfeld, Matteo Biagiola, Ignacio Lebrero, Urko Rueda Molina, and Fiorella Zampetti for their contribution to the implementation of the JUGE infrastructure. We would also like to thank (in alphabetical order) Azat Abdullin, Marat Akhin, Giuliano Antoniol, Andrea Arcuri, Cyrille Artho, Mikhail Belyaev, Pietro Braione, Nikolay Bukharev, José Campos, Nelly Condori, Christoph Csallner, Giovanni Denaro, Gordon Fraser, Yann-Gaël Guéhéneuc, Masami Hagiya, Mainul Islam, Dmitry Ivanov, Gunel Jahangirova, Kiran Lakhotia, Ignacio Manuel Lebrero Rial, Lei Ma, Alexey Menshutin, Arsen Nagdalian, Gilles Pesant, Simon Poulding, Wishnu Prasetya, Vincenzo Riccio, José Miguel Rojas, Abdelilah Sakti, Hiroyuki Sato, Sebastian Schweikl, Gleb Stromov, Yoshinori Tanabe, Paolo Tonella, Artem Ustinov, Sebastian Vogl, Tanja Vos, Mitsuharu Yamamoto, Fiorella Zampetti, and Cheng Zhang for their participation in previous editions of the competition and the feedback they provided on the infrastructure. Xavier Devroey was partially funded by the EU Horizon 2020 ICT-10-2016-RIA “STAMP” project (No.731529), the Vici “TestShift” project (No. VI.C.182.032) from the Dutch Science Foundation NWO, and the CyberExcellence (No. 2110186) project, funded by the Public Service of Wallonia (SPW Recherche). Alessio Gambi's work was partially supported by the DFG project STUNT (DFG Grant Agreement n. FR 2955/4-1). Sebastiano Panichella and Annibale Panichella gratefully acknowledge the Horizon 2020 (EU Commission) support for the project COSMOS (DevOps for Complex Cyber-physical Systems), Project No. 957254-COSMOS. René Just's work is partially supported by the National Science Foundation under grant CNS-1823172.
Funding Information:
We would like to thank (in alphabetical order) Arthur Baars, Sebastian Bauersfeld, Matteo Biagiola, Ignacio Lebrero, Urko Rueda Molina, and Fiorella Zampetti for their contribution to the implementation of the JUGE infrastructure. We would also like to thank (in alphabetical order) Azat Abdullin, Marat Akhin, Giuliano Antoniol, Andrea Arcuri, Cyrille Artho, Mikhail Belyaev, Pietro Braione, Nikolay Bukharev, José Campos, Nelly Condori, Christoph Csallner, Giovanni Denaro, Gordon Fraser, Yann‐Gaël Guéhéneuc, Masami Hagiya, Mainul Islam, Dmitry Ivanov, Gunel Jahangirova, Kiran Lakhotia, Ignacio Manuel Lebrero Rial, Lei Ma, Alexey Menshutin, Arsen Nagdalian, Gilles Pesant, Simon Poulding, Wishnu Prasetya, Vincenzo Riccio, José Miguel Rojas, Abdelilah Sakti, Hiroyuki Sato, Sebastian Schweikl, Gleb Stromov, Yoshinori Tanabe, Paolo Tonella, Artem Ustinov, Sebastian Vogl, Tanja Vos, Mitsuharu Yamamoto, Fiorella Zampetti, and Cheng Zhang for their participation in previous editions of the competition and the feedback they provided on the infrastructure. Xavier Devroey was partially funded by the EU Horizon 2020 ICT‐10‐2016‐RIA “STAMP” project (No.731529), the Vici “TestShift” project (No. VI.C.182.032) from the Dutch Science Foundation NWO, and the CyberExcellence (No. 2110186) project, funded by the Public Service of Wallonia (SPW Recherche). Alessio Gambi's work was partially supported by the DFG project STUNT (DFG Grant Agreement n. FR 2955/4‐1). Sebastiano Panichella and Annibale Panichella gratefully acknowledge the Horizon 2020 (EU Commission) support for the project (DevOps for Complex Cyber‐physical Systems), Project No. 957254‐COSMOS. René Just's work is partially supported by the National Science Foundation under grant CNS‐1823172. COSMOS
Publisher Copyright:
© 2022 John Wiley & Sons, Ltd.
PY - 2023/5/1
Y1 - 2023/5/1
N2 - Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit-testing of libraries versus system-testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large-scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present our JUnit Generation Benchmarking Infrastructure (JUGE) supporting generators (search-based, random-based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co-located with the Search-Based Software Testing Workshop, have taken place where JUGE was used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated on JUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact of JUGE in improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, the JUGE infrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.
AB - Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit-testing of libraries versus system-testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large-scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present our JUnit Generation Benchmarking Infrastructure (JUGE) supporting generators (search-based, random-based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co-located with the Search-Based Software Testing Workshop, have taken place where JUGE was used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated on JUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact of JUGE in improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, the JUGE infrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.
KW - JUGE
KW - benchmarking
KW - evaluation infrastructure
KW - unit test generation
UR - http://www.scopus.com/inward/record.url?scp=85144391401&partnerID=8YFLogxK
U2 - 10.1002/stvr.1838
DO - 10.1002/stvr.1838
M3 - Article
SN - 0960-0833
VL - 33
JO - Software Testing Verification and Reliability
JF - Software Testing Verification and Reliability
IS - 3
M1 - e1838
ER -