jvm

#jvm#

已有27人关注此标签

内容分类

李博 bluemind

咨询下各位大佬一个问题 我在Idea配置文件里面配置了JVM参数为-XX:+UseConcMarkSweepGC;但是我在启动idea之后看到的是-XX:+UseParallelGC呢

看的是这个的进程本问题来自云栖社区【阿里Java技术进阶2群】。https://yq.aliyun.com/articles/690084 点击链接欢迎加入社区大社群。

李博 bluemind

JVM初始化内存和最大内存均为4096,用的1.8的CMS的垃圾回收器,应用内存使用率在50%左右。后来改成了G1,内存使用率就稳定在了80%左右,有大神遇到过这种问题,知道是什么原因么?

本问题及下方已被采纳的回答均来自云栖社区【阿里Java技术进阶2群】。https://yq.aliyun.com/articles/690084 点击链接欢迎加入社区大社群。

开源大数据EMR

错误提示:指定的 InstanceType 未授权使用

错误提示:指定的 InstanceType 未授权使用

mz111

Flink on Yarn启动的TaskManager只有一个

Flink On Yarn 模式,单个任务提交的时候,任务提交成功了,但是我的TaskManager只启动了一个,其他的TaskManager都没启动起来,当然资源也没法用,这是为什么啊???我的启动命令是这样的:flink -m yarn-cluster -yn 3 sse.jar命名要启动三个TaskManager,但是只启动了一个,我的配置如下: fs.hdfs.hadoopconf: /etc/hadoop/conf akka.ask.timeout:20s # The heap size for the JobManager JVM jobmanager.heap.size: 2048m # The heap size for the TaskManager JVM taskmanager.heap.size: 4096m # The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline. taskmanager.numberOfTaskSlots: 12 slaves: 172.16.0.18 172.16.0.19 172.16.0.20 172.16.0.17

宋淑婷

为emr上的`spark-submit`作业指定marksweep gc

如何spark-submit在emr上运行作业时指定我希望jvm使用MarkSweep gc ?我可以提交作业(即spark-submit -- conf...),如果是,那么命令是什么?这是否必须由spark启动时设置,如果是,我如何在emr配置中指定?

李博 bluemind

jvm做GC的时候会调整年轻代和老年代的大小么?

本问题来自云栖社区【阿里Java技术进阶2群】。https://yq.aliyun.com/articles/690084 点击链接欢迎加入社区大社群。

李博 bluemind

jvm如何学习?

jvm如何学习? 本问题及下方已被采纳的回答均来自云栖社区【阿里Java技术进阶2群】。https://yq.aliyun.com/articles/690084 点击链接欢迎加入社区大社群。

李博 bluemind

jvm性能分析方法?

jvm性能分析方法? 本问题来自云栖社区【阿里Java技术进阶2群】。https://yq.aliyun.com/articles/690084 点击链接欢迎加入社区大社群。

garwer

关于java偏向锁、锁膨胀的一些疑问?

学习这块的时候刚开始觉得有点理解,结果越想越懵,还是理解的不够,这边有几个疑问 场景1:如果当前临界区只有一个线程,那么当前对象是偏向锁,如果有一个新的线程进入临界区,锁将自旋变成轻量锁。 ①这里变成轻量锁的操作是持有锁的线程做的,还是正在自旋的线程做的? ②如果锁升级成轻量级锁,对象头发生了变化,那持有锁的的线程里的对象的对象头也会发生变化么?【同一个对象在不同线程里的对象头是否是一致的】 场景2:锁膨胀为重量级锁 ①如果占用资源的一个线程释放了锁,那对象头会发生什么样的变化呢?

李博 bluemind

jvm如何调优?

jvm如何调优?

李博 bluemind

jvm还有永久代吗?

jvm还有永久代吗?

Flink 1.6.2版本稳定运行一段时间后为什么会出现flatMap层没有数据流入问题

Flink处理的数据量每秒大概有20条,每条数据最大50MB,在稳定运行一段时间后出现flatMap层没有数据流入问题,没出现任何报错,任务也没死掉 Source(kafka)-->flatMap-->reduce(聚合)-->sink(写入redis中) kafka也没有问题,redis也正常,但是flatMap层就是不工作了,看到JVM的heap和TM的heap也都是正常的 求大佬指教

李博 bluemind

JVM堆内存的回收机制?

JVM堆内存的回收机制?

李博 bluemind

Java 和 C++ 都是高级程序语言, C++ 直接编译成机器码 运行,而Java 编译成class字节码后 读入到JVM中 通过JVM 这中间托管在转成机器码运行。 为什么Java需要这样做,这是基于什么考虑?

Java 和 C++ 都是高级程序语言, C++ 直接编译成机器码 运行,而Java 编译成class字节码后 读入到JVM中 通过JVM 这中间托管在转成机器码运行。 为什么Java需要这样做,这是基于什么考虑?

李博 bluemind

jvm的原理

jvm的原理

李博 bluemind

如何配置jvm最大堆内存

如何配置jvm最大堆内存

李博 bluemind

jvm内存模型

jvm内存模型

社区小助手

有没有办法了解spark如何加载类路径以及以何种顺序加载?

我正试图在EMR上运行一个带有自定义spark作业,并试图在驱动程序的额外类路径中使用自定义jar spark.driver.extraClassPath /usr/lib/hadoop/lib/hadoop-lzo.jar:/usr/local/java/avro-1.8.2.jar:/usr/local/java/avro-mapred-1.8.2-hadoop2.jar但不知何故,它仍然加载默认的avro jar(旧1.7.4),我通过类路径verbose选项找到了它 [Loaded org.apache.avro.generic.GenericContainer from file:/usr/lib/hadoop/lib/avro-1.7.4.jar]我想了解类路径加载的顺序和优先级。为什么它仍然选择旧的通用hadoop avro 1.7.4而不加载我想要使用的那个。 有没有办法看到为spark提交运行加载的确切类路径顺序,任何jvm选项等都会有所帮助。 简单地说类路径的顺序(顺序,首先,我的客户jar与spark jar与hadoop jar)

社区小助手

AttributeError:'NoneType'对象没有属性'_jvm - PySpark UDF

我有杂志订阅的数据及其创建时间,以及包含与给定用户关联的所有订阅到期日期数组的列: user_id created_date expiration_dates_for_user 202394 '2018-05-04' ['2019-1-03', '2018-10-06', '2018-07-05'] 202394 '2017-01-04' ['2019-1-03', '2018-10-06', '2018-07-05'] 202394 '2016-05-04' ['2019-1-03', '2018-10-06', '2018-07-05']我正在尝试创建一个新列,该列是一个包含created_date 45天内所有到期日期的数组,如下所示: user_id created_date expiration_dates_for_user near_expiration_dates 202394 '2018-05-04' ['2019-1-03', '2018-10-06', '2020-07-05'] [] 202394 '2019-01-04' ['2019-1-03', '2018-10-06', '2020-07-05'] ['2019-1-03'] 202394 '2016-05-04' ['2019-1-03', '2018-10-06', '2020-07-05'] []这是我正在使用的代码: def check_if_sub_connected(created_at, expiration_array): if not expiration_array: return [] if created_at == None: return [] else: close_to_array = [] for i in expiration_array: if datediff(created_at, i) < 45: if created_at != i: if datediff(created_at, i) > -45: close_to_array.append(i) return close_to_array check_if_sub_connected = udf(check_if_sub_connected, ArrayType(TimestampType()))但是当我应用函数创建一个列时...... df = df.withColumn('near_expiration-dates', check_if_sub_connected(df.created_date, df.expiration_dates_for_user)我得到了这个疯狂的错误: AttributeError: 'NoneType' object has no attribute '_jvm' at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:317)at org.apache.spark.sql.execution.python.PythonUDFRunner $$ anon$1.read(PythonUDFRunner.scala:83) at org.apache.spark.sql.execution.python.PythonUDFRunner $$ anon$1.read(PythonUDFRunner.scala:66)at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:271)at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)at scala.collection.Iterator $$ anon$12.hasNext(Iterator.scala:439) at scala.collection.Iterator $$ anon$11.hasNext(Iterator.scala:408)at scala.collection.Iterator $$ anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage17.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun$10 $$ anon$1.hasNext(WholeStageCodegenExec.scala:620) at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:49) at org.apache.spark.sql.execution.collect.Collector $$ anonfun$2.apply(Collector.scala:126)at org.apache.spark.sql.execution.collect.Collector $$ anonfun$2.apply(Collector.scala:125) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:112) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:384) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler $$ failJobAndIndependentStages(DAGScheduler.scala:1747)at org.apache.spark.scheduler.DAGScheduler $$ anonfun$abortStage$1.apply(DAGScheduler.scala:1735) at org.apache.spark.scheduler.DAGScheduler $$ anonfun$abortStage$1.apply(DAGScheduler.scala:1734)at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1734)at org.apache.spark.scheduler.DAGScheduler $$ anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:962) at org.apache.spark.scheduler.DAGScheduler $$ anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:962)at scala.Option.foreach(Option.scala:257)at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:962)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1970)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1918)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1906)at org.apache.spark.util.EventLoop $$ anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:759) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2141) at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:237) at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:247) at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:64) at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:70) at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:497) at org.apache.spark.sql.execution.CollectLimitExec.executeCollectResult(limit.scala:48) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset $$ collectResult(Dataset.scala:2775)at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset $$ collectFromPlan(Dataset.scala:3350) at org.apache.spark.sql.Dataset $$ anonfun$head$1.apply(Dataset.scala:2504)at org.apache.spark.sql.Dataset $$ anonfun$head$1.apply(Dataset.scala:2504) at org.apache.spark.sql.Dataset $$ anonfun$53.apply(Dataset.scala:3334)at org.apache.spark.sql.execution.SQLExecution $$ anonfun$withCustomExecutionEnv$1.apply(SQLExecution.scala:89) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:175) at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:84) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:126) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3333) at org.apache.spark.sql.Dataset.head(Dataset.scala:2504) at org.apache.spark.sql.Dataset.take(Dataset.scala:2718) at org.apache.spark.sql.Dataset.showString(Dataset.scala:259) at sun.reflect.GeneratedMethodAccessor472.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/databricks/spark/python/pyspark/worker.py", line 262, in main process() File "/databricks/spark/python/pyspark/worker.py", line 257, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/databricks/spark/python/pyspark/worker.py", line 183, in func = lambda _, it: map(mapper, it) File "", line 1, in File "/databricks/spark/python/pyspark/worker.py", line 77, in return lambda *a: toInternal(f(*a)) File "/databricks/spark/python/pyspark/util.py", line 55, in wrapper return f(*args, **kwargs) File "", line 9, in check_if_sub_connected File "/databricks/spark/python/pyspark/sql/functions.py", line 1045, in datediff return Column(sc._jvm.functions.datediff(_to_java_column(end), _to_java_column(start))) AttributeError: 'NoneType' object has no attribute '_jvm' at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:317) at org.apache.spark.sql.execution.python.PythonUDFRunner $$ anon$1.read(PythonUDFRunner.scala:83)at org.apache.spark.sql.execution.python.PythonUDFRunner $$ anon$1.read(PythonUDFRunner.scala:66) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:271) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator $$ anon$12.hasNext(Iterator.scala:439)at scala.collection.Iterator $$ anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator $$ anon$11.hasNext(Iterator.scala:408)at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage17.processNext(Unknown Source)at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)at org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun$10 $$ anon$1.hasNext(WholeStageCodegenExec.scala:620)at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:49)at org.apache.spark.sql.execution.collect.Collector $$ anonfun$2.apply(Collector.scala:126) at org.apache.spark.sql.execution.collect.Collector $$ anonfun$2.apply(Collector.scala:125)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)at org.apache.spark.scheduler.Task.run(Task.scala:112)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:384)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)... 1 moreudf中是否不允许使用datediff函数?或者这是某种导入错误?我正在使用最新版本在数据库上运行spark。

社区小助手

JVM - 为什么YoungGen在gc之间使用堆减少?

下面是一些apache Spark执行器的GCViewer图: 老gen使用堆年轻的gen使用堆GC时间有问题的现象我试着去理解(4)中的斜率。为什么gc会在使用整个年轻的gen堆之前启动(就像以前的gcs阶段一样)?为什么它会在恢复之前单调减少约5分钟?我认为如果分配了一个非常大的对象(例如,从io socket读取),就会发生这种情况。但这可能是错误的,因为在此之后老一代没有改变。我并不特别关心这个例子,而只是为了学习更多关于jvm内存管理的知识。