Monday, October 3, 2022

Big Data And Hadoop interview Preparations

 

Big Data And Hadoop interview questions

1. What are the differences between regular FileSystem and HDFS?

  1. Regular FileSystem: In regular FileSystem, data is maintained in a single system. If the machine crashes, data recovery is challenging due to low fault tolerance. Seek time is more and hence it takes more time to process the data.
  2. HDFS: Data is distributed and maintained on multiple systems. If a DataNode crashes, data can still be recovered from other nodes in the cluster. Time taken to read data is comparatively more, as there is local data read to the disc and coordination of data from multiple systems.

2. Why is HDFS fault-tolerant?

HDFS is fault-tolerant because it replicates data on different DataNodes. By default, a block of data is replicated on three DataNodes. The data blocks are stored in different DataNodes. If one node crashes, the data can still be retrieved from other DataNodes.

3. What are the two types of metadata that a NameNode server holds?

The two types of metadata that a NameNode server holds are:

  • Metadata in Disk — This contains the edit log and the FSImage
  • Metadata in RAM — This contains the information about DataNodes

4. If you have an input file of 350 MB, how many input splits would HDFS create and what would be the size of each input split?

By default, each block in HDFS is divided into 128 MB. The size of all the blocks, except the last block, will be 128 MB. For an input file of 350 MB, there are three input splits in total. The size of each split is 128 MB, 128MB, and 94 MB.

5. How does rack awareness work in HDFS?

HDFS Rack Awareness refers to the knowledge of different DataNodes and how it is distributed across the racks of a Hadoop Cluster.

6. How can you restart NameNode and all the daemons in Hadoop?

The following commands will help you restart NameNode and all the daemons:

7. Which command will help you find the status of blocks and FileSystem health?

To check the status of the blocks, use the command:

8. What would happen if you store too many small files in a cluster on HDFS?

Storing several small files on HDFS generates a lot of metadata files. To store these metadata in the RAM is a challenge as each file, block, or directory takes 150 bytes for metadata. Thus, the cumulative size of all the metadata will be too large.

9. How do you copy data from the local system onto HDFS?

The following command will copy data from the local file system onto HDFS:

10. Is there any way to change the replication of files on HDFS after they are already written to HDFS?

Yes, the following are ways to change the replication of files on HDFS:

No comments:

Post a Comment

Spark- Window Function

  Window functions in Spark ================================================ -> Spark Window functions operate on a group of rows like pa...