Biochemical methane potential (BMP) tests used to determine the ultimate methane yield of organic substrates are not sufficiently standardized to ensure reproducibility among laboratories. In this contribution, a standardized BMP protocol was tested in a large inter-laboratory project, and results were used to quantify sources of variability and to refine validation criteria designed to improve BMP reproducibility. Three sets of BMP tests were carried out by more than thirty laboratories from fourteen countries, using multiple measurement methods, resulting in more than 400 BMP values. Four complex but homogenous substrates were tested, and additionally, microcrystalline cellulose was used as a positive control. Inter-laboratory variability in reported BMP values was moderate. Relative standard deviation among laboratories (RSDR) was 7.5 to 24%, but relative range (RR) was 31 to 130%. Systematic biases were associated with both laboratories and tests within laboratories. Substrate volatile solids (VS) measurement and inoculum origin did not make major contributions to variability, but errors in data processing or data entry were important. There was evidence of negative biases in manual manometric and manual volumetric measurement methods. Still, much of the observed variation in BMP values was not clearly related to any of these factors and is probably the result of particular practices that vary among laboratories or even technicians. Based on analysis of calculated BMP values, a set of recommendations was developed, considering measurement, data processing, validation, and reporting. Recommended validation criteria are: (i) test duration at least 1% net 3 d, (ii) relative standard deviation for cellulose BMP not higher than 6%, and (iii) mean cellulose BMP between 340 and 395 NmLCH4 gVS−1. Evidence from this large dataset shows that following the recommendations—in particular, application of validation criteria—can substantially improve reproducibility, with RSDR < 8% and RR < 25% for all substrates. The cellulose BMP criterion was particularly important. Results show that is possible to measure very similar BMP values with different measurement methods, but to meet the recommended validation criteria, some laboratories must make changes to their BMP methods. To help improve the practice of BMP measurement, a new website with detailed, up-to-date guidance on BMP measurement and data processing was established.