java.io.FileNotFoundException: /tmp/agentXXXXXX.jar issue at tomcat startup avoiding the webapp to start [hotpatch-for-apache-log4j2] - TagMerge

java.io.FileNotFoundException: /tmp/agentXXXXXX.jar issue at tomcat startup avoiding the webapp to start [hotpatch-for-apache-log4j2]

richardthibaultAsked 5 months ago

Hi, The patch has been deployed in AL1 on one of our server via "yum update --security", and we had a problem with it. We have the following error in the tomcat log avoiding any web applications to start on this server :

java.io.FileNotFoundException: /tmp/agent693101429317028124.jar (Aucun fichier ou dossier de ce type) at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.(ZipFile.java:228) at java.util.zip.ZipFile.(ZipFile.java:157) at java.util.jar.JarFile.(JarFile.java:169) at java.util.jar.JarFile.(JarFile.java:106) at sun.net.www.protocol.jar.URLJarFile.(URLJarFile.java:93) at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:69) at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:99) at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122) at sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89) at org.apache.tomcat.util.scan.FileUrlJar.(FileUrlJar.java:48) at org.apache.tomcat.util.scan.JarFactory.newInstance(JarFactory.java:34) at org.apache.catalina.startup.ContextConfig.processAnnotationsJar(ContextConfig.java:1957) at org.apache.catalina.startup.ContextConfig.processAnnotationsUrl(ContextConfig.java:1932) at org.apache.catalina.startup.ContextConfig.processAnnotations(ContextConfig.java:1917) at org.apache.catalina.startup.ContextConfig.webConfig(ContextConfig.java:1322) at org.apache.catalina.startup.ContextConfig.configureStart(ContextConfig.java:878) at org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:388) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117) at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5566) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:145) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:1017) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:993) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:652) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1127) at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:2021) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

In fact, this server has a specifiticy : there are 20 concurrent JVM / Tomcat instances running and launched via chkconfig services… I'm not sure but this could be the root cause of the pb… We applied the "sudo touch /etc/log4j-cve-2021-44228-hotpatch.kill" to deactivate the HotPatch, and it worked. So it confirms that the issue is due to the Log4j HotPatch. I think you should publish a fix for this because I assume we won't be the only impacted customers.

A precision : we use amazon-corretto-8.302.08.1-linux-x64 under AL1

Best Regards.

Source: link

yishaigalatzerAnswered 5 months ago

I'll forward this to the AL1 team, and find where the work for the yum package is happening. It is not in this repo. Stay tuned

gaiazovAnswered 5 months ago

We have the same issue with Liquibase 3.9. It seems that the hotpatch jar gets deleted, and the affected code doesn't handle this case

jqassarAnswered 5 months ago

We have just finished coming off of an extensive troubleshooting session with Atlassian where the root cause was eventually traced to this hotfix.

Leaving aside the business impact, note that Atlassian quickstart deployments currently in use could be using affected versions of AL1. They have their own log4j implementations which are (theoretically) not impacted by log4shell, but are broken by the action of this script.

WFA-hhsiehAnswered 5 months ago

We have the same problem with java-11-amazon-corretto-headless on AL2. Temporarily solve the problem by removing log4j-cve-2021-44228-hotpatch.

raupachzAnswered 5 months ago

amzn2.x8664 #1 SMP Thu Nov 4 21:08:30 UTC 2021 x8664 x8664 x8664 GNU/Linux

openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime Environment Corretto-11.0.13.8.2 (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM Corretto-11.0.13.8.2 (build 11.0.13+8-LTS, mixed mode)

amazon-linux-extras install -y tomcat8.5

20-Dec-2021 12:17:27.912 SEVERE [localhost-startStop-1] com.sun.faces.config.ConfigureListener.contextInitialized Critical error during deployment:
    com.sun.faces.config.ConfigurationException: java.util.concurrent.ExecutionException: javax.faces.FacesException: java.nio.file.NoSuchFileException: /tmp/agent9162924747567038440.jar
        at com.sun.faces.config.ConfigManager.getConfigDocuments(ConfigManager.java:769)
        at com.sun.faces.config.ConfigManager.initialize(ConfigManager.java:355)
        at com.sun.faces.config.ConfigureListener.contextInitialized(ConfigureListener.java:237)
        at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4761)
        at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5232)
        at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
        at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:753)
        at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:727)
        at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:695)
        at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1016)
        at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1903)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
    Caused by: java.util.concurrent.ExecutionException: javax.faces.FacesException: java.nio.file.NoSuchFileException: /tmp/agent9162924747567038440.jar
        at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
        at com.sun.faces.config.ConfigManager.getConfigDocuments(ConfigManager.java:756)
        ... 15 more
    Caused by: javax.faces.FacesException: java.nio.file.NoSuchFileException: /tmp/agent9162924747567038440.jar
        at com.sun.faces.config.configprovider.MetaInfFacesConfigResourceProvider.getResources(MetaInfFacesConfigResourceProvider.java:138)
        at com.sun.faces.config.ConfigManager$URITask.call(ConfigManager.java:1390)
        at com.sun.faces.config.ConfigManager$URITask.call(ConfigManager.java:1358)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at com.sun.faces.config.ConfigManager.getConfigDocuments(ConfigManager.java:747)
        ... 15 more
    Caused by: java.nio.file.NoSuchFileException: /tmp/agent9162924747567038440.jar
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
        at java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
        at java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:149)
        at java.base/sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
        at java.base/java.nio.file.Files.readAttributes(Files.java:1764)
        at java.base/java.util.zip.ZipFile$Source.get(ZipFile.java:1259)
        at java.base/java.util.zip.ZipFile$CleanableResource.<init>(ZipFile.java:831)
        at java.base/java.util.zip.ZipFile$CleanableResource$FinalizableResource.<init>(ZipFile.java:857)
        at java.base/java.util.zip.ZipFile$CleanableResource.get(ZipFile.java:846)
        at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:248)
        at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:177)
        at java.base/java.util.jar.JarFile.<init>(JarFile.java:350)
        at java.base/sun.net.www.protocol.jar.URLJarFile.<init>(URLJarFile.java:103)
        at java.base/sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:72)
        at java.base/sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:99)
        at java.base/sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:125)
        at java.base/sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:92)
        at com.sun.faces.facelets.util.Classpath.search(Classpath.java:146)
        at com.sun.faces.facelets.util.Classpath.search(Classpath.java:110)
        at com.sun.faces.config.configprovider.MetaInfFacesConfigResourceProvider.loadURLs(MetaInfFacesConfigResourceProvider.java:167)
        at com.sun.faces.config.configprovider.MetaInfFacesConfigResourceProvider.getResources(MetaInfFacesConfigResourceProvider.java:111)
        ... 19 more
KvistA-ELSAnswered 5 months ago

We have the same issue :(

dpavlov-smartlingAnswered 5 months ago

Same here. We found that all service with Liquibase including versions 3.4, 3.5, 3.7 are affected by this hotfix infection. The worst problem is that it affects services inside the containers! For these who aren't sure how this hotfix appears on their AWS instances which uses custom AL2 AMI. This usually happens if your AMI has pre-installed OpenJDK.

  1. During the first boot of the instance cloud-init rules are being applied. Default module cloud_config_modules contains rule package-update-upgrade-install which applies security updates
  2. AWS has set package log4j-cve-2021-44228-hotpatch.noarch as a strict dependency for the new version of openjdk provided from their repos
  3. Because cloud-init updates all packages on the instances during the boot AND because you have pre-installed openjdk AND because AWS set package log4j-cve-2021-44228-hotpatch.noarch as a dependency for the new version of openjdk the new hotfix package is being installed on all new instances

AWS should immediately remove this fix and dependency from their repository. They inject custom java class into our applications without any permissions from our side!!!!

lutkerdAnswered 5 months ago

Is there something cleaning up /tmp on your machines? We are looking at this and how we can fix the issue and get an updated RPM deployed.

The release announcement has the preferred way disable the patch tool which is sudo touch /etc/log4j-cve-2021-44228-hotpatch.kill

heathheyAnswered 5 months ago

To further help us determine if the fix is correct, is there a good reproducer available?

prashantrajanAnswered 5 months ago

The way this was patched by injecting a running Java class into the customer owned environment has been a complete cluster fuck. We've been unable to launch production instances (or autoscale since it will apply the patch on first boot) trying to pin this issue down. You've killed any trust in allowing automatic security updates if this is what AWS considers best practice for security patching (in addition to pushing this out late on a Friday and without advance warning).

Best part is there's no obvious way to report this stupidity to the AWS folks that actually manage the changes to AL1/AL2 so we're to left to vent on this issue here.

stewartsmithAnswered 5 months ago

Please keep discussion civil and in line with https://aws.github.io/code-of-conduct

There are multiple engineers currently actively engaged in reproducing, root causing, developing, and testing updated packages to address this problem. The official AWS support channels can follow up further and discuss specifics there.

dpavlov-smartlingAnswered 5 months ago

@heathhey For us any application which try to use Liquibase lib - always fails during the bootstrap if this hotfix is enabled. Below is an example of logs

liquibase.exception.UnexpectedLiquibaseException: java.io.FileNotFoundException: /tmp/agent7661228846511958474.jar (No such file or directory)
    at liquibase.servicelocator.ServiceLocator.setResourceAccessor(ServiceLocator.java:127)
    at liquibase.servicelocator.ServiceLocator.<init>(ServiceLocator.java:61)
    at liquibase.servicelocator.ServiceLocator.<clinit>(ServiceLocator.java:44)
    at org.springframework.boot.liquibase.LiquibaseServiceLocatorApplicationListener$LiquibasePresent.replaceServiceLocator(LiquibaseServiceLocatorApplicationListener.java:55)
    at org.springframework.boot.liquibase.LiquibaseServiceLocatorApplicationListener.onApplicationEvent(LiquibaseServiceLocatorApplicationListener.java:45)
    at org.springframework.boot.liquibase.LiquibaseServiceLocatorApplicationListener.onApplicationEvent(LiquibaseServiceLocatorApplicationListener.java:35)
    at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:172)
    at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:165)
    at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:139)
    at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:122)
    at org.springframework.boot.context.event.EventPublishingRunListener.starting(EventPublishingRunListener.java:69)
    at org.springframework.boot.SpringApplicationRunListeners.starting(SpringApplicationRunListeners.java:48)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:292)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1118)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1107)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
    at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
    at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
    at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51)

I have tracked exception to this line in source . Hope this will help to research this.

@lutkerd

  • The first problem with this fix, is that you inject code into clients' applications without their permissions. Our applications don't use log4j at all, but your hotfix just tries to inject hotfix to any java process. Why?
  • The second problem - is the way AWS start to propagate it. AWS added this hotfix as a dependency for OpenJDK, so now we can't uninstall it from the system without removing the whole OpenJDK installation.
  • The third problem - is that hotfix appears on the NEW instances automatically if you use AL2 AMI. Because in cloud-init rules AWS installs security patches automatically. So, even for old AMIs with pre-installed Java you will get this "fix" to your system without any notifications.
  • And the final issue - it affects applications in the docker containers. One of the main reasons for all of us to use containers is to make sure that our applications will always be deployed as it was build, but now, due to some fantastic reason you decided that you can alter our applications packed to the containers. No, you can't do this!

Once again, please remove dependency which cause installation of this hotfix with OpenJDK and other Java versions which can be installed from AWS AL1/2 yum repositories. Let us clients decided how and when to install this fix. Removing of this dependency will make it possible to just uninstall this package from the system. Because the alternative way to handle this for general ECS cluster is:

  1. Update instance user-data to create this stop file
  2. Rotate all instances in ECS cluster to apply changes

As to your question regarding /tmp cleanup - no, nothing is cleaning it. Java automatically removes all files after applications has crashed.

Thanks,

jqassarAnswered 5 months ago

Is there something cleaning up /tmp on your machines? We are looking at this and how we can fix the issue and get an updated RPM deployed.

The release announcement has the preferred way disable the patch tool which is sudo touch /etc/log4j-cve-2021-44228-hotpatch.kill

Stopping and starting an instance will do this.

kertziAnswered 5 months ago

This hotfix also breaks things when you DONT use OpenJDK from repo but install it manually from tarballs. This is really something which AWS should not to do. We do:

  • Install tomcat from tarballs
  • Install OpenJDK from tarballs

And after all this manual installations you still get process /bin/bash /usr/bin/log4j-cve-2021-44228-hotpatch -w 1800 -m 10 running which injects stuff to java process and break liquidbase.

richardthibaultAnswered 5 months ago

Is there something cleaning up /tmp on your machines? We are looking at this and how we can fix the issue and get an updated RPM deployed.

The release announcement has the preferred way disable the patch tool which is sudo touch /etc/log4j-cve-2021-44228-hotpatch.kill

No, we have old files in /tmp directory…

richardthibaultAnswered 5 months ago

To further help us determine if the fix is correct, is there a good reproducer available?

Yes, I have a sandbox AL1 ec2 instance which is impacted by this issue and where I can easily test a patch

raupachzAnswered 5 months ago

To further help us determine if the fix is correct, is there a good reproducer available?

I setup a instance with a minimal Java webapp that shows the decribed behavior. Please DM me for credentials.

simonisAnswered 5 months ago

This issue is caused by an outdated version of the hotpatch-for-apache-log4j2 tool. Older versions of the tool assembled the agent jar file which is injected into the target JVM process dynamically at runtime. For this purpose it created a temporary file (e.g. /tmp/agent7661228846511958474.jar) which was deleted after the patching tool installed it in the target JVM process.

The target JVM process opens the agent jar file with the system class loader and keeps a reference (i.e. an open file descriptor) to the jar file. This cached file descriptor remains valid even after the patching tool terminates and the generated agent jar file was deleted.

The error described here happens because the application used ClassLoader.getResources() to look for resources like META-INF/MANIFEST.MF in the system class loader. Among other references, the system class loader will also return a URL pointing to the META-INF/MANIFEST.MF file from the agent jar file. However, if this URL will be used to re-open the corresponding agent jar file (i.e. open a new file descriptor for it), this will fail because in general, the file was already deleted. Notice that the system class loader is still able to load all the resources from the corresponding file, just the application will not be able to "re-open" it.

With newer versions of the tool, this issue has been fixed (see #11) because the tool will inject the jar file from which it is running into the target JVM. This means thatm as long as the tool remains installed on a host, the jar file should always be accessible.

I'm not sure which exact version of this tool has been packed in the corresponding .rpm files (check with the Amazon Linux support). According to https://alas.aws.amazon.com/announcements/2021-001.html the rpm has already been updated:

A previous version of the service applied the hotpatch every 30 minutes, only supported JDK8 and JDK11, and did not have the capability of patching JVMs running in containers. Make sure you have the latest version by running yum update log4j-cve-2021-44228-hotpatch.

and the new version may very well already fix the issue.

One (admittedly crude) way to find out which version of the tool is running is to use jps to get the PID of a patched JVM and then examine if its file descriptors contain a reference to a temporary, deleted jar file:

$ jps
1234 java
$ ls -la /proc/1234/fd | grep agent
lr-x------ 1 uid gid 64 Dez 22 12:10 22 -> '/tmp/agent5186358415800914896.jar (deleted)'

If there's a file descriptor pointing to a deleted agent jar file, you are using the old version of the tool.

richardthibaultAnswered 5 months ago

The AL1 package is indicated to be the 1.1 version : image

There is no updated package available in AL1 for the moment. AL1 team should shortly publish the update !

raupachzAnswered 5 months ago

AL2 still uses the old package. Just checked for updates. None.

$ sudo yum list installed | grep log4j log4j-cve-2021-44228-hotpatch.noarch 1.1-9.amzn2 @amzn2-core

$ sudo ls -la /proc/9169/fd | grep agent lr-x------ 1 tomcat tomcat 64 Dec 22 14:43 54 -> /tmp/agent12605062286418635796.jar (deleted)

jqassarAnswered 5 months ago

Our issue occurred on December 20th in a freshly installed instance.

niknaz87Answered 5 months ago

facing the same issue on freshly installed instance. Old ones are working fine

stewartsmithAnswered 5 months ago

The current RPM (1.1-9) shipped in Amazon Linux 1 and 2 was built while there was still some ongoing development on the hotpatch itself (this repo). The currently shipping RPM is one we could develop high confidence in, and as I mentioned above, we're actively looking at shipping an update to address this problem.

dpavlov-smartlingAnswered 5 months ago

@stewartsmith Please ask AL1/AL2 repo managers to remove the dependency between this hotfix and any version of Java from AWS repos. There should be a choice to use or not to use this hotfix. We don't have log4j, but we do have an issue due to it. And we don't want to make a fix on your fix with file creation to disable it. Please listen to the community.

Thanks,

richardthibaultAnswered 5 months ago

AL2 still uses the old package. Just checked for updates. None.

AL1 as well

ragebiswasAnswered 5 months ago

Faced the same problem and spent some time to debug - any workarounds for the same without disabling the hotpatch process?

Alex-SoAnswered 5 months ago

Faced the same problem and spent some time to debug - any workarounds for the same without disabling the hotpatch process?

I don't know if it will help you @ragebiswas or somebody else, but during debugging issue for Rundeck service I found a trick: service rundeckd start; service rundeckd restart After few tries I'm able to start the service without disabling the hotpatch.

aramirAnswered 5 months ago

Faced the same problem and spent some time to debug - any workarounds for the same without disabling the hotpatch process?

Not at the moment it seems. The best way to go (for now) is to mitigate log4j shell manually, disable the hotpatch with sudo touch /etc/log4j-cve-2021-44228-hotpatch.kill, wait for the official fix from Amazon team and re-enable with sudo rm -f /etc/log4j-cve-2021-44228-hotpatch.kill

stewartsmithAnswered 5 months ago

We have just pushed 1.1-12 to the AL1 and AL2 repositories which possibly addresses this issue, and would appreciate any testing and ack if it does, or any different error if it does not.

FoxUSAAnswered 5 months ago

Bitbucket came back up without needing the to disable the hotfix.

Recent Issues

    Programming Languages