环境: Sqoop1 Install And Test MySQL/PostgreSQL
参数解析:
--connect: JDBC连接URL
--username:连接数据库用户名
--password:连接数据库密码
--table: 要读取的表
-m:map并行读取的数量
含义:读取user_info表数据到HDFS集群,并叧通过一个map任务
注意:此Sqoop命令没有指定HDFS目录,默认数据会放在/user/{user.name}/{--table参数指定表名}目录下。
问题1: org.postgresql.util.PSQLException: ERROR: relation place_openhoursv2_temp does not exist(表的relation不存在,也就是说该用户没有查到在默认schema下的这表)
[root@sht-sgmhadoopnn-01 bin]# sqoop import --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali --table place_openhoursv2_temp -m 5
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 10:51:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 10:51:57 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 10:51:57 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 10:51:57 INFO tool.CodeGenTool: Beginning code generation
16/07/30 10:51:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM place_openhoursv2_temp AS t LIMIT 1
16/07/30 10:51:58 ERROR manager.SqlManager: Error executing statement: org.postgresql.util.PSQLException: ERROR: relation place_openhoursv2_temp does not exist
Position: 17
org.postgresql.util.PSQLException: ERROR: relation place_openhoursv2 does not exist
Position: 17
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2310)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2023)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:421)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:758)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
16/07/30 10:51:58 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 bin]#
解决方法: 修改用户的默认的schema
[postgres@sht-sgmhadoopcm-01 bin]$ ./psql -U denaliadmin denali
psql (9.5.1)
Type "help" for help.
denali=# show search_path;
search_path
-----------------
"$user", public
(1 row)
denali=# alter role denaliadmin set search_path to factual_search_na_16q2_20160722_epl,public;
ALTER ROLE
denali=# show search_path;
search_path
-----------------
"$user", public
(1 row)
denali=# \q
[postgres@sht-sgmhadoopcm-01 bin]$ ./psql -U denaliadmin denali
psql (9.5.1)
Type "help" for help.
denali=# show search_path;
search_path
---------------------------------------------
factual_search_na_16q2_20160722_epl, public
(1 row)
denali=#
错误2: org.postgresql.util.PSQLException: ERROR: relation "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" does not exist
### --table factual_search_na_16q2_20160722_epl.place_openhoursv2_temp 错误的,--table 只能填写table名称,不能带schema的名称(其实无需带schema名称,因为rearch_path已经指定该用户的默认的schema的是factual_search_na_16q2_20160722_epl了)
[root@sht-sgmhadoopnn-01 hadoop]# sqoop import --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali --table factual_search_na_16q2_20160722_epl.place_openhoursv2_temp -m 5
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 12:39:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 12:39:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 12:39:50 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 12:39:50 INFO tool.CodeGenTool: Beginning code generation
16/07/30 12:39:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" AS t LIMIT 1
16/07/30 12:39:50 ERROR manager.SqlManager: Error executing statement: org.postgresql.util.PSQLException: ERROR: relation "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" does not exist
Position: 17
org.postgresql.util.PSQLException: ERROR: relation "factual_search_na_16q2_20160722_epl.place_openhoursv2_temp" does not exist
Position: 17
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2310)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2023)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:421)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:758)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
16/07/30 12:39:50 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
[root@sht-sgmhadoopnn-01 hadoop]#
解决方法: --table不带 schema
错误3: orm.ClassWriter: No Java type for SQL type 1111 for column new_value(查询语句或者表的字段 new_value类型 无法转换)
[root@sht-sgmhadoopnn-01 hadoop]# sqoop import --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali --table place_openhoursv2_temp -m 5
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 12:39:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 12:39:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 12:39:04 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 12:39:04 INFO tool.CodeGenTool: Beginning code generation
16/07/30 12:39:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "place_openhoursv2_temp" AS t LIMIT 1
16/07/30 12:39:05 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
16/07/30 12:39:05 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
16/07/30 12:39:05 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column new_value
16/07/30 12:39:05 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.sqoop.orm.ClassWriter.parseNullVal(ClassWriter.java:1377)
at org.apache.sqoop.orm.ClassWriter.parseColumn(ClassWriter.java:1402)
at org.apache.sqoop.orm.ClassWriter.myGenerateParser(ClassWriter.java:1528)
at org.apache.sqoop.orm.ClassWriter.generateParser(ClassWriter.java:1491)
at org.apache.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:1920)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1736)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 hadoop]#
解决方法: 在sql语句中将该列 new_value的字段类型由json改为 text
[root@sht-sgmhadoopnn-01 hadoop]# sqoop import --connect jdbc:postgresql://172.16.101.54:5432/denali --username denaliadmin --password rdyhdenali \
> --query 'select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where $CONDITIONS ' \
> --split-by place_id \
> --target-dir /sqoop1/test7 \
> -m 3 \
> --null-string '' --null-non-string ''
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /hadoop/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
16/07/30 12:36:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/07/30 12:36:40 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/07/30 12:36:40 INFO manager.SqlManager: Using default fetchSize of 1000
16/07/30 12:36:40 INFO tool.CodeGenTool: Beginning code generation
16/07/30 12:36:40 INFO manager.SqlManager: Executing SQL statement: select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where (1 = 0)
16/07/30 12:36:40 INFO manager.SqlManager: Executing SQL statement: select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where (1 = 0)
16/07/30 12:36:40 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop/hadoop/share/hadoop/mapreduce
Note: /tmp/sqoop-root/compile/b0a9ae11fdaa3e0b3754c0c70f8113f8/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/07/30 12:36:44 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/b0a9ae11fdaa3e0b3754c0c70f8113f8/QueryResult.jar
16/07/30 12:36:44 INFO mapreduce.ImportJobBase: Beginning query import.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/07/30 12:36:45 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/07/30 12:36:46 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/07/30 12:36:56 INFO db.DBInputFormat: Using read commited transaction isolation
16/07/30 12:36:56 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(place_id), MAX(place_id) FROM (select place_id,metadata_dictionary_id,"value",lang,source,create_time,update_time,new_value::text,new_value_text,formatted_value,trim_formatted_value from factual_search_na_16q2_20160722_epl.place_openhoursv2_temp where (1 = 1) ) AS t1
16/07/30 12:36:56 INFO mapreduce.JobSubmitter: number of splits:3
16/07/30 12:36:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1469795998430_0014
16/07/30 12:36:57 INFO impl.YarnClientImpl: Submitted application application_1469795998430_0014
16/07/30 12:36:58 INFO mapreduce.Job: The url to track the job: http://sht-sgmhadoopnn-01:8088/proxy/application_1469795998430_0014/
16/07/30 12:36:58 INFO mapreduce.Job: Running job: job_1469795998430_0014
16/07/30 12:37:09 INFO mapreduce.Job: Job job_1469795998430_0014 running in uber mode : false
16/07/30 12:37:09 INFO mapreduce.Job: map 0% reduce 0%
16/07/30 12:37:17 INFO mapreduce.Job: map 33% reduce 0%
16/07/30 12:37:18 INFO mapreduce.Job: map 67% reduce 0%
16/07/30 12:37:19 INFO mapreduce.Job: map 100% reduce 0%
16/07/30 12:37:20 INFO mapreduce.Job: Job job_1469795998430_0014 completed successfully
16/07/30 12:37:20 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=426603
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=373
HDFS: Number of bytes written=184494
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Job Counters
Launched map tasks=3
Other local map tasks=3
Total time spent by all maps in occupied slots (ms)=20412
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=20412
Total vcore-seconds taken by all map tasks=20412
Total megabyte-seconds taken by all map tasks=20901888
Map-Reduce Framework
Map input records=1000
Map output records=1000
Input split bytes=373
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=153
CPU time spent (ms)=3990
Physical memory (bytes) snapshot=525692928
Virtual memory (bytes) snapshot=2674200576
Total committed heap usage (bytes)=316145664
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=184494
16/07/30 12:37:20 INFO mapreduce.ImportJobBase: Transferred 180.1699 KB in 34.3762 seconds (5.2411 KB/sec)
16/07/30 12:37:20 INFO mapreduce.ImportJobBase: Retrieved 1000 records.
You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 hadoop]# hadoop fs -ls /sqoop1/test7
Found 4 items
-rw-r--r-- 3 root root 0 2016-07-30 12:37 /sqoop1/test7/_SUCCESS
-rw-r--r-- 3 root root 184208 2016-07-30 12:37 /sqoop1/test7/part-m-00000
-rw-r--r-- 3 root root 143 2016-07-30 12:37 /sqoop1/test7/part-m-00001
-rw-r--r-- 3 root root 143 2016-07-30 12:37 /sqoop1/test7/part-m-00002
You have mail in /var/spool/mail/root
[root@sht-sgmhadoopnn-01 hadoop]#