Computer-Science

Reading a non-text file

(various file formats : http://en.wikipedia.org/wiki/List_of_file_formats)

1. file type

2. binary file

3. little endian, big endian

4. WAV file format

wav file์€ sound data๋ฅผ ํฌํ•จํ•œ๋‹ค.

WAV File Specification

์ถœ์ฒ˜ : https://ccrma.stanford.edu/courses/422-winter-2014/projects/WaveFormat/

The canonical WAVE format starts with the RIFF header:
(RIFF file types: WAV, AVI, RMI, โ€ฆ)
    0         4   ChunkID          Contains the letters "RIFF" in ASCII form
                                   (0x52494646 big-endian form).
    4         4   ChunkSize        36 + SubChunk2Size, or more precisely:
                                   4 + (8 + SubChunk1Size) + (8 + SubChunk2Size)
                                   This is the size of the rest of the chunk
                                   following this number.  This is the size of the
                                   entire file in bytes minus 8 bytes for the
                                   two fields not included in this count:
                                   ChunkID and ChunkSize. Expressed in little-endian.
    8         4   Format           Contains the letters "WAVE"
                                   (0x57415645 big-endian form).
The "WAVE" format consists of two subchunks: "fmt " and "data":

    The "fmt " subchunk describes the sound data's format:
        12        4   Subchunk1ID      Contains the letters "fmt "
                                    (0x666d7420 big-endian form).
        16        4   Subchunk1Size    16 for PCM.  This is the size of the
                                    rest of the Subchunk which follows this number.
        20        2   AudioFormat      PCM = 1 (i.e. Linear quantization)
                                    Values other than 1 indicate some
                                    form of compression.
        22        2   NumChannels      Mono = 1, Stereo = 2, etc.
        24        4   SampleRate       8000, 22050, 44100, etc.
        28        4   ByteRate         == SampleRate * NumChannels * BitsPerSample/8
        32        2   BlockAlign       == NumChannels * BitsPerSample/8
                                    The number of bytes for one sample including
                                    all channels.
        34        2   BitsPerSample    8 bits = 8, 16 bits = 16, etc.

    The "data" subchunk contains the size of the data and the actual sound:
        36        4   Subchunk2ID      Contains the letters "data"
                                    (0x64617461 big-endian form).
        40        4   Subchunk2Size    == NumSamples * NumChannels * BitsPerSample/8
                                    This is the number of bytes in the data.
                                    You can also think of this as the size
                                    of the rest of the subchunk following this
                                    number.
        44        *   Data             The actual sound data.

5. Reading a non-text file

์‹ค์ œ ์ฝ”๋“œ๋Š” ์—ฌ๊ธฐ๋ฅผ ์ฐธ์กฐ

char ChunkID[10];           // use char array for text data
int ChunkSize;              // use "int" for 4 byte data
char Format[10];
........
short AudioFormat;          // use "short" for 2 byte data
........

x=open("./f1.wav", ...........);
y=read(x, ChunkID, 4);      // read first 4 bytes into ChunkID[]
ChunkID[y]=0;               // to print as a string
y=read(x, &ChunkSize, 4);   // read next 4 bytes and store at address &ChunkSize
y=read(x, Format, 4);       // read "WAVE"
Format[y]=0;
.......
y=read(x, &AudioFormat, 2); // read next 2 bytes and store at address &AudioFormat
..........
printf("ChunkID:%s\n", ChunkID);
printf("ChunkSize:%d\n",ChunkSize);
printf("Format:%s\n",Format);
.......
printf("AudioFormat:%d\n", AudioFormat);
.......

lseek, fopen, fprintf, โ€ฆ.

int y;
y=lseek(x, 30, SEEK_SET);  // move file pointer to 30. return 30
y=lseek(x, 30, SEEK_CUR);  // move file pointer to current file pointer+30=60.
                           // return 60
y=lseek(x, 0, SEEK_END);   // move file pointer to the end of file. return this
                           // file pointer

FILE *f2;
char buf[100];
..........
f2=fopen("./yy","w");       // open ./yy for writing
fprintf(f2,"%s",buf);       // write the string in buf into f2

7. Exercise

1) Read swvader03.wav with xxd. Interpret all fields in the header.

First copy swvader03.wav file from ../../linuxer1 directory into current directory.
โ€.โ€ means current directory.

$ cp  ../../linuxer1/swvader03.wav  .
(or cp ../../linuxer2/swvader03.wav .  in 165.246.38.152)

Look at the file with xxd.

$ xxd swvader03.wav > x
$ vi x

์ˆซ์ž๋Š” Little-endian, ๋ฌธ์ž์—ด์€ Big-endian์ž„์„ ์œ ์˜ํ•ด์„œ ํ•ด์„ํ•ด์•ผ ํ•œ๋‹ค.

2) Write a program that reads swvader03.wav and displays the content as above.

        ..............
        char ChunkID[10]; // use char array for text data
        int ChunkSize; // use "int" for 4 byte data
        char Format[10];
        ........
        short AudioFormat;  // use "short" for 2 byte data
        ........
        x=open("./swvader03.wav", ...........);

        y=read(x, ChunkID, 4);  // read first 4 bytes into ChunkID[]
        ChunkID[y]=0;           // to print as a string
        y=read(x, &ChunkSize, 4); // read next 4 bytes and store at address &ChunkSize
        y=read(x, Format, 4); // read "WAVE"
        Format[y]=0;
        .......
        y=read(x, &AudioFormat, 2); // read next 2 bytes and store at address &AudioFormat
        ..........
        printf("ChunkID:%s\n", ChunkID);
        printf("ChunkSize:%d\n",ChunkSize);
        printf("Format:%s\n",Format);
        .......
        printf("AudioFormat:%d\n", AudioFormat);
        .......

3) Same as 2), but display the content in file โ€œsw2-wav.txtโ€. Using write() to write into a text file is very hard. Use fopen() and fprintf() for formatted output.

..........
x=open("./swvader03.wav", ...........); // input file
FILE *fout=fopen("sw2-wav.txt", "w"); // output file

y=read(x, ChunkID, 4); // read "RIFF"
ChunkID[y]=0; // to print as a string
y=read(x, &ChunkSize, 4); // read chunk size
y=read(x, Format, 4); // read "WAVE"
Format[y]=0;
.......
fprintf(fout,"ChunkID:%s\n", ChunkID); // write to sw2-wav.txt
fprintf(fout, "ChunkSize:%d\n",ChunkSize);
fprintf(fout, "Format:%s\n",Format);
.......

4) โ€œswvader03.wavโ€ contains a sentence, โ€œYes, my masterโ€. Write a program that modifies the file such that it contains only โ€œmasterโ€. Move the file read pointer to the start of the actual sound data with lseek() and write 0 for half of the sound data, since โ€œYes, myโ€ and โ€œmasterโ€ take about half of the sound data each. It will be better that you copy swvader03.wav to sw2.wav and modify sw2.wav. When you modified the file, you need to download it to your PC using psftp (look at Section 7 for the explanation for psftp).

5) Write a program that modifies the wav file such that it contains โ€œmasterโ€ twice. That is, when you play this file you should here โ€œmaster masterโ€.

6) Use gdb to debug the error in following code.

#include<fcntl.h>
#include<sys/stat.h>
#include<sys/types.h>
#include<unistd.h>
#include<stdio.h>

int main(){
	char chunkID[10];
	int chunkSize;
	char format[10];
	short AudioFormat;
	short NumChannel;
	int SampleRate;
	int ByteRate;
	short BlockAlign;
	short BitsPerSample;
	char data[20];
	int x,y;

	x = open("./swvader03.wav", O_RDONLY, 00777);
	x = read(x, chunkID, 4);
	chunkID[y] = 0;
	y = read(x, &chunkSize, 4);
	y = read(x, format, 4);
	format[y] = 0;

	printf("chunkID : %s ", chunkID);
	printf("chunkSize : %d ", chunkSize);
	printf("format : %s ", format);
	printf("\n");

	y = read(x, chunkID, 4);
	chunkID[y] = 0;
	y = read(x, &chunkSize, 4);
	y = read(x, &AudioFormat, 2);
	y = read(x, &NumChannel, 2);
	y = read(x, &SampleRate, 4);
	y = read(x, &ByteRate, 4);
	y = read(x, &BlockAlign, 2);
	y = read(x, &BitsPerSample, 2);

	printf("chunkID : %s ", chunkID);
	printf("chunkSize : %d ", chunkSize);
	printf("AudioFormat : %d ", AudioFormat);
	printf("NumChannel : %d ", NumChannel);
	printf("ByteRate : %d ", ByteRate);
	printf("BlockAlign : %d ", BlockAlign);
	printf("BitsPerSample : %d", BitsPerSample);
	printf("\n");

	y = read(x, chunkID, 4);
	chunkID[y] = 0;
	y = read(x, &chunkSize, 4);

	printf("chunkID : %s ",chunkID);
	printf("chunkSize : %d", chunkSize);
	printf("\n");

	return 0;
}
$ gcc -g -o ex2 ex2.c           ==> compile with -g to use gdb
$ gdb ex2
b main
r
     x=open("swvader03.wav",...);
n                              ==> run "x=open(...)"
     x=read(x, chunkID, 4);     ==> next statement to debug
p x                            ==> print x to see the result of "x=open(...)"
$1=7                          ==> swvader03.wav file is now file no 7
n                              ==> run "x=read(x, chunkID, 4)"
   chunkID[y]=0               ==> next statement to debug
p chunkID        ==> print chunkID to see the result of "x=read(x, chunkID, 4)"
$5="RIFF\000..."              ==> we have RIFF in chunkID
n                             ==> run "chunkID[y]=0"
   y=read(x, ...);              ==> next statement to debug
p chunkID                    ==> check chunkID again after "chunkID[y]=0"
......................