前言
云原生时代,都在追求服务容器化,动态化,按需使用以节省价格高昂的服务器资源,为公司节省成本。公司使用了k8s下动态调度jenkins集群的方案,项目组新项目基于gradle管理依赖,在构建时经常出现耗时二十分钟,甚至超时失败的情况,,通过日志发现以这种动态调度的方案每次都需要新启动一个gradle dameon环境,每次都要重新下载项目依赖,虽然是从内网nexus下载但还是很慢
过程
为了使用到构建缓存,和运维同事沟通在jenkins节点启动的时候将共享存储上的.gradle目录挂载到容器对应相同gradle缓存目录,果然build阶段减少到只需要几十秒至一两分钟,但在开启多个项目并行构建的情况下,会出现如下错误
[Pipeline] // ansiColor
[Pipeline] sh
09:06:05 + gradle bootJar -x test
09:06:06
09:06:06 Welcome to Gradle 7.3.1!
09:06:06
09:06:06 Here are the highlights of this release:
09:06:06 - Easily declare new test suites in Java projects
09:06:06 - Support for Java 17
09:06:06 - Support for Scala 3
09:06:06
09:06:06 For more details see https://docs.gradle.org/7.3.1/release-notes.html
09:06:06
09:06:06 Starting a Gradle Daemon (subsequent builds will be faster)
09:07:14
09:07:14 FAILURE: Build failed with an exception.
09:07:14
09:07:14 * What went wrong:
09:07:14 Gradle could not start your build.
09:07:14 > Cannot create service of type BuildTreeActionExecutor using method LauncherServices$ToolingBuildTreeScopeServices.createActionExecutor() as there is a problem with parameter #1 of type List<BuildActionRunner>.
09:07:14 > Cannot create service of type BuildModelActionRunner using BuildModelActionRunner constructor as there is a problem with parameter #1 of type PayloadSerializer.
09:07:14 > Cannot create service of type PayloadSerializer using method LauncherServices$ToolingGradleUserHomeScopeServices.createPayloadSerializer() as there is a problem with parameter #2 of type PayloadClassLoaderFactory.
09:07:14 > Cannot create service of type PayloadClassLoaderFactory using method LauncherServices$ToolingGradleUserHomeScopeServices.createClassLoaderFactory() as there is a problem with parameter #1 of type CachedClasspathTransformer.
09:07:14 > Cannot create service of type DefaultCachedClasspathTransformer using DefaultCachedClasspathTransformer constructor as there is a problem with parameter #6 of type FileSystemAccess.
09:07:14 > Cannot create service of type FileSystemAccess using method VirtualFileSystemServices$GradleUserHomeServices.createFileSystemAccess() as there is a problem with parameter #2 of type VirtualFileSystem.
09:07:14 > Cannot create service of type BuildLifecycleAwareVirtualFileSystem using method VirtualFileSystemServices$GradleUserHomeServices.createVirtualFileSystem() as there is a problem with parameter #6 of type GlobalCacheLocations.
09:07:14 > Cannot create service of type GlobalCacheLocations using method GradleUserHomeScopeServices.createGlobalCacheLocations() as there is a problem with parameter #1 of type List<GlobalCache>.
09:07:14 > Could not create service of type FileAccessTimeJournal using GradleUserHomeScopeServices.createFileAccessTimeJournal().
09:07:14 > Timeout waiting to lock journal cache (/root/.gradle/caches/journal-1). It is currently in use by another Gradle instance.
09:07:14 Owner PID: 39
09:07:14 Our PID: 40
09:07:14 Owner Operation:
09:07:14 Our operation:
09:07:14 Lock file: /root/.gradle/caches/journal-1/journal-1.lock
看报错大概意思是多个构建的时候共享了同一份缓存,而先占用的服务会生成一个journal-1.lock文件直到build完成才去除,后面的服务只能等待
可以看出不同dameon节点通过tcp协议通信不断重试来获取锁,如果在超时时间内获取到,则可进行后面的构建流程,如果达到超时时间还未释放,则直接抛出Timeout waiting~~~的异常导致构建失败,所以想到看看能不能修改这个超时时间,让它久一点,根据报错堆栈去github跟了下官方源代码
此类有个常量默认超时时间60000ms,也就是一分钟,刚好和日志里重试的时间对上,构造方法只有一个地方引用,而这个构造方法没有对应超时参数传入,直接使用了定义好的常量,看来是没有地方可以修改,难道就没人遇到和我一样的问题?上issue上一搜,有不少解决方案都是kill -9其他dameon,简单粗暴且通用,看到后面发现一哥儿们不仅提出了问题,还提交了pr,把写死的超时时间改为可从系统配置里读取
原地址在这里
但是官方给拒了
大概意思有两点
- 是正常来说这个缓存锁释放都是比较快的,一分钟超时时间大多数情况下够用了
- 就算真的有有并行构建导致超时这种情况,应该找找其他方面问题。
gradle企业版最大特性是支持中央缓存,构建更快,这里不开这个口子确实也可以理解。
没办法从任务内部来解决问题,想到能不能从外部让任务失败后进行重试,比如自定义task,插件等。然而多次尝试无果,gradle5.0以后不支持直接在一个任务里直接调用另一个任务。
task层不能重试,最后考虑在pipeline脚本上重试
在build阶段报错的时候,重试几次,这样即可保证任务最终是可以成功构建的,也提升了大多数任务的构建速度
评论区