What is the importance of Record Reader and types of .XML files used in Hadoop?
All the Mappers and Reducers will work only with Key-Value pairs. Basically we can take different formats of the file.The basic four formats of file are
- Text Input Format
- Key Value Text Input Format
- Sequence File Input Format
- Sequence File As Text Input Format
So Record Reader converts any of those formats into key value pairs. Record reader is an interface between input splits and Mappers which reads only one file at a time from corresponding input file and converts it into key-value pairs and pass this key-value pairs as input keys to Mappers.
Input split files is read by Record Reader one by one and corresponding key-value pairs are given to mappers as input. At a time only one key-value pair can be send to mappers to execution. We can read many lines by building our own record reader.
What are the .XML files using in Hadoop?
There are three xml files used in hadoop. Some services will not start without xml file.
1)Core-site.xml – It will take care of all metadata.
<value> hdfs://localhost:8020 </value>
2)Mapred-site.xml – This will take care of all the jobs.
3)Hdfs-site.xml – This file have number of replication setups. It also has a setup of intermediate data.