Saturday, August 28, 2010

An Overview of C++

1) What is a structured language?

The languages such as C and PASCAL are structure languages, which are characterized by their support for stand-alone subroutines, local variables, rich control constructs and their lack of reliance upon the GOTO statements.

2) What is the basic difference between the object oriented programming language and the structured pro-gramming language?

In structured programming language, the program is organized around the code, means, program is struc-tures into subroutines or functions which act on data, on the other hand the object oriented programming you define the data type, which will define how can access that data, in the sense the program is organized around data.

3) What are the characteristics of C++ or object oriented programming language?

The three main characteristics of an object oriented programming language is Encapsulation, Polymorphism and Inheritance. The class definitions explains encapsulation, the function and operator overloading explains polymorphism and the

4) What is Encapsulation in Object oriented programming language?

Encapsulation is a mechanism that binds together the code and the data it manipulates, and keep both safe from outside interference and misuse, the private data or code is accessible only within that object or class and public data or code can be accessed from outside, which gives an interface to the outside world to access the inside code and data of object.

5) What is polymorphism and what kind of polymorphisms are supported in C++?

One interface, multiple methods, in simple terms it is an attribute that allows one interface to control access to a general class of actions, the specific action selected is determined by the exact nature of the situation. Compiler will select the method appropriate for a particular situation, and programmers don’t need do it manually.

C++ supports both run time and compile time polymorphism. Compile time using function overloading and operator overloading. Run time using virtual functions.

6) What is inheritance in object oriented programming languages?

This is a concept of classification, where an objects property can be derived from another object, and it only needs to define those qualities that make it unique within its class. Example the Apple belongs to the class fruits and fruits belongs to class food etc.
7) How does bool data type is defined in C++?

There is standard Boolean data type in C++, which takes value true or false, true being 1 and false being zero automatic conversion to integer is supported, all non-zero value is treated as true and zero as false.

8) What is a class in C++

Class is a means to define an objects outline using data and code and access permissions, this is the inheri-tance is achieved in C++.

9) What is function overloading?

When you have similar tasks to be performed then you can declare functions with same name, keeping the number of parameters and /or their type different. Then the compiler will select the right function depending on the situation. This process is called function overloading. It is not only enough to keep the return type different. This is a polymorphism feature of C++. The other polymorphism feature available in C++ is operator overloading. There is run time polymorphism also using virtual functions using inheritance and derived class.

10) What are constructor and destructors?

It is very often that we need to initialize the class before we can use them, so C++ provides a way to auto-matically do it using constructor and then de-allocate any resources in the destructor. The constructor has same name as that of the class and it does not have any return type, because constructor cannot return anything. It should also be public member function of the class, because objects of that class needs to be defined in public scope. The same rules apply to the destructors also, destructors names has additional ~ at the beginning. The constructors are called when the declaration statements are executed, and destructors are called when the objects are destroyed

Wednesday, August 11, 2010

Dynamic Memory Management

1) What are the approaches for dynamic memory management?

Explicit: Explicit memory allocation and de-allocation using malloc() and free() by the programmer, this can be very efficient, but its error prone, when a programmer forgets to free the allocated memory.

Automatic: In this approach the programmer allocates the memory but the run time system takes care of claiming the un-used memory (garbage collection). This kind of method is used in languages such as Java, this can be very simple for the programmer but it can lead to slow performance, since run time system needs to decide which memory to be reclaimed.

2) How does malloc work?

Before calling main(), C runtime system asks OS for large chunk of memory. This will serve as the heap for that program. Malloc() carves out a piece of the heap records the fact that it is allocated, so subsequent calls to malloc will not use this storage. When free() is called, records the fact that piece of heap is no longer allocated, subsequent calls to malloc can use it.

3) How does malloc manage the Heap?
  • Free data is arranged in a linked list (free list)
  • Next pointer embedded in free data itself (rather than building a separate “list” structure as shown in figure
  • Each chunk is “tagged” with size (prefix or auxiliary table)
  • When malloc called,
  • Scan list for sufficiently large chunk, Chunk may need to be split
  • Returned chunk also “tagged” with size
  • When Free is called,
  • Adds chunk of data back onto the free list (at head)
  • Size available in “tag”

 
4) What are the problems associated with Malloc() and Free()

a. What if wrong address is freed?
- Consider the following code example,
p = malloc(x);
p++;
free(p);
In this case the size info will be wrong!, and hence the program will likely (later) fail in mysterious ways

b. What if same chunk is freed twice?
- Often results in free-list cycle

5) More Problems

a) Fragmentation
 • After a while, we end up with lots of small chunks
 • May have enough memory for some request, but not contiguous
 • Solutions
1) Allow heap to grow (ask OS for more memory)
2) Choose chunks wisely: best fit v. first fit
b) Slow allocation
 • Must traverse long list to find suitable chunk of memory
 • Solutions
1) Sort free list (now free is slower)
2) Multiple free lists (e.g., one for large chunks, one for small, etc.)
c) Bugs
 • Wild writes can corrupt size or next pointers
 • Solution?
1) Use auxiliary structure to record size (doesn’t actually solve problem)
2) Valgrind (provides its own implementation of malloc)

Initrd and initramfs

Initrd Creation

To create an (initially empty) initrd use the following steps:
Note: Change count to your required filesystem size. E.g. with count=8192 you will get a 8MB ramdisk.

host > dd if=/dev/zero of=/dev/ram0 bs=1k count=<count>
host > mke2fs -vm0 /dev/ram0 <count>
host > tune2fs -c 0 /dev/ram0
host > dd if=/dev/ram0 bs=1k count=<count>  | gzip -v9 > ramdisk.gz

Now, we have a (empty) gzipped ramdisk image with (extracted) size of <count>.

Filling

To fill empty ramdisk created above with all files needed for ramdisk, mount the image and fill it. Content would be e.g. BusyBox and/or other applications and/or libraries.

host > mkdir mnt
host > gunzip ramdisk.gz
host > mount -o loop ramdisk mnt/
host > ... copy stuff you want to have in ramdisk to mnt...
host > umount mnt
host > gzip -v9 ramdisk

The resulting ramdisk.gz is now ready for usage. Note its size is smaller than <count> cause of compression.

Note: Don't forget to create/copy some basic /dev/xxx nodes to ramdisk.

Note: If BusyBox or applications in ramdisk are linked dynamically, don't forget to copy dynamic libraries (*.so) to ramdisk (to correct directory) as well.

Kernel options

To make initrd work, you have to configure kernel properly:

#
# General setup
#
...
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
...
#
# UBI - Unsorted block images
#
...
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=1
CONFIG_BLK_DEV_RAM_SIZE=8192
CONFIG_BLK_DEV_RAM_BLOCKSIZE=1024
...
Note: The ramdisk size e.g. 8192 above has to be configured for your individual setup.

Installation

Now, you can install the ramdisk via u-boot e.g. in NOR flash. For this copy filled ramdisk created above to your tftpboot directory on host (e.g. /tftpboot/ramdisk.gz). Then start target and copy the data into RAM and flash:

UBOOT # tftp 0x87000000 ramdisk.gz
UBOOT # erase 0x2200000 +0x<filesize>
UBOOT # cp.b 0x87000000 0x2200000 0x<filesize>

Note: Replace filesize above by the value the tftp download command gives you as Bytes transferred.
Now, last step is to update kernel boot parameters and save them

UBOOT # setenv bootargs ... root=/dev/ram0 rw initrd=0x87000000,8M
UBOOT # setenv bootcmd cp.b 0x2200000 0x87000000 0x<filesize>; bootm
UBOOT # saveenv

Note: In example above with "8M" we assume that your ramdisk is 8MBytes. Adapt this to your needs.
Note: Your ramdisk filled above should have a /dev/ram0 node to make this work properly.

brw-rw---- 1 root disk 1, 0 Sep 11 1999 /dev/ram0

Now you should be able to start your kernel and it should find and mount the initrd:

Linux version 2.6.23-davinci1 ...
...
checking if image is initramfs...it isn't (no cpio magic); looks like an initrd
Freeing initrd memory: 8192K
...
RAMDISK driver initialized: 1 RAM disks of 8192K size 1024 blocksize
...
RAMDISK: Compressed image found at block 0
VFS: Mounted root (ext2 filesystem).
Freeing init memory: ...
...

Initramfs

To use initramfs a cpio archive is embedded directly into the kernel. I.e. you don't create an additional (ramdisk) image. Instead, the initial file system is directly incorporated into the kernel. With this, the kernel size increases by the file system size. It's like you embed above ramdisk directly into the kernel.

Creation

Cause initramfs is directly embedded in the the kernel, its creation is simpler. No dd & mount & gzip stuff like with ramdisk above. You simply have to fill a directory on your host with the target filesystem you like and then pass the path to this directory to the kernel build process.

Create target file system

host > mkdir target_fs
host > ... copy stuff you want to have in initramfs to target_fs...

Note: cpio system used for initramfs can't handle hard links. If you e.g. created your BusyBox using hard links, you will get a quite large initramfs cause each command is taken with its size and not as hard link. In cpio initramfs use symbolic/soft links instead.

Note: To be able to detect initramfs by kernel properly, the top level directory has to contain a program called init. This can be done by e.g. using a soft link from top level init to /bin/busybox

/init -> /bin/busybox

if you use BusyBox in your initramfs.

Kernel options

The only difference from creating an initrd is to give the kernel the path to the target file system you like to embed:

#
# General setup
#
...
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE="<path_to>/target_fs>"
...
#
# UBI - Unsorted block images
#
...
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=1
CONFIG_BLK_DEV_RAM_SIZE=8192
CONFIG_BLK_DEV_RAM_BLOCKSIZE=1024
...

Then, if you compile the kernel, e.g. by make uImage, the cpio archive is generated and embedded into the kernel:

...
CHK include/linux/compile.h
GEN usr/initramfs_data.cpio.gz
AS usr/initramfs_data.o
LD usr/built-in.o
...

Installation

No special installation like above with initrd is necessary. The initramfs is already in the kernel. If you start the kernel, the initramfs is already there. Therefore, there is no root=/dev/ram0 rw initrd=0x87000000,8M bootargs option necessary. Remove this if you still have it!

initrd vs. initramfs
  1. Using initrd, kernel and initial file system are splitted. Making changes to kernel or filesystem doesn't touch the other one. The download size (e.g. while development) of one component is smaller.
  2. Creating and modifying an initramfs is easier than with initrd (unzip & mount & unmount & zip)
  3. Having one big image (kernel & initramfs) is easier to handle (e.g. download or flashing) than having two splitted images.

Tuesday, August 10, 2010

Mixed-Language Programming and External Linkage

The C++ standard provides a mechanism called linkage specification for mixing code that was written in different programming languages and was compiled by the respective compilers, in the same program. Linkage specification refers to the protocol for linking functions or procedures written in different languages. Linkage is the term used by the C++ standard to describe the accessibility of objects from one file to another or even within the same file. Three types of linkage exist:
  1. No linkage
  2. Internal linkage
  3. External linkage
Something internal to a function, in regard to its arguments, variables, and so on, always has no linkage and hence can be accessed only within the function.

Sometimes it is necessary to declare functions and other objects within a single file in a way that allows them to reference each other, but not to be accessible from outside that file. This can be done through internal linkage. Symbols with internal linkage only refer to the same object within a single source file. Prefixing the declarations with the keyword static changes the linkage of external objects from external linkage to internal linkage.

Objects that have external linkage are all considered to be located at the outermost level of the program. This is the default linkage for functions and anything declared outside of a function. All instances of a particular name with external linkage refer to the same object in the program. If two or more declarations of the same symbol have external linkage, but with incompatible types (for example, mismatch of declaration and definition), then the program may either crash or show abnormal behaviour. The rest of the article discusses one of the issues with mixed code and provides a recommended solution with external linkage.

In the real world, it is very common to use the functionality of code written in one programming language from code written in another. A trivial example is a C++ programmer relying on a standard C library (libc) for sorting a series of integers with the "quick sort" technique. It works because the C implementation takes care of the language linkage for us. But we need to take additional care if we use our own libraries written in C, from a C++ program. Otherwise the compilation may fail with link errors caused by unresolved symbols. Consider the following example:

Assume that we're writing C++ code and wish to call a C function from C++ code. Here's the code for the callee, for example, C routine:

%cat greet.h
extern char *greet();

%cat greet.c
#include "greet.h"

char *greet() {
           return ((char *) "Hello!");
}

%cc -G -o libgreet.so greet.c

Note: The extern keyword declares a variable or function and specifies that it has external linkage, i.e., its name is visible from files other than the one in which it's defined.
Let's try to call the C function greet() from a C++ program

%cat mixedcode.cpp
#include <iostream.h>
#include "greet.h"

int main() {
        char *greeting = greet();
    cout << greeting << "\n";
        return (0);
}

 
%CC -lgreet mixedcode.cpp
Undefined                       first referenced
 symbol                            in file
char*greet()                    mixedcode.o
ld: fatal: Symbol referencing errors. No output written to a.out

Though the C++ code is linked with the dynamic library that holds the implementation for greet(), libgreet.so, the linking failed with undefined symbol error. What went wrong?

The reason for the link error is that a typical C++ compiler mangles (encodes) function names to support function overloading. So, the symbol greet is changed to something else depending on the algorithm implemented in the compiler during the name mangling process. Hence the object file does not have the symbol greet anywhere in the symbol table. The symbol table of mixedcode.o confirms this. Let's have a look at the symbol tables of both libgreet.so and mixedcode.o:

%elfdump1 -s libgreet.so

Symbol Table Section:  .symtab
index    value       size     type bind oth ver shndx       name
...
[1]  0x00000000 0x00000000  FILE LOCL  D    0 ABS         libgreet.so
...
[37]  0x00000268 0x00000004  OBJT GLOB  D    0 .rodata     _lib_version
[38]  0x000102f3 0x00000000  OBJT GLOB  D    0 .data1      _edata
[39]  0x00000228 0x00000028  FUNC GLOB  D    0 .text       greet
[40]  0x0001026c 0x00000000  OBJT GLOB  D    0 .dynamic    _DYNAMIC

%elfdump -s mixedcode.o

Symbol Table Section:  .symtab
index    value       size     type bind oth ver shndx       name
[0]  0x00000000 0x00000000  NOTY LOCL  D    0 UNDEF
[1]  0x00000000 0x00000000  FILE LOCL  D    0 ABS         mixedcode.cpp
[2]  0x00000000 0x00000000  SECT LOCL  D    0 .rodata
[3]  0x00000000 0x00000000  FUNC GLOB  D    0 UNDEF     
    __1cDstd2l6Frn0ANbasic_ostream4Ccn0ALchar_traits4Cc____pkc_2_
[4]  0x00000000 0x00000000  FUNC GLOB  D    0 UNDEF       __1cFgreet6F_pc_
[5]  0x00000000 0x00000000  NOTY GLOB  D    0 UNDEF       __1cDstdEcout_
[6]  0x00000010 0x00000050  FUNC GLOB  D    0 .text       main
[7]  0x00000000 0x00000000  NOTY GLOB  D    0 ABS         __fsr_init_value

%dem2 __1cFgreet6F_pc_

__1cFgreet6F_pc_ == char*greet()

char*greet() has been mangled to __1cFgreet6F_pc_ by the C++ compiler. That's the reason why the static linker (ld) couldn't match the symbol in the object file.
Note that a C compiler that complies with the C99 standard may mangle some names. For example, on systems in which linkers cannot accept extended characters, a C compiler may encode the universal character name in forming valid external identifiers.
 
How to solve this problem?
 
The C++ standard provides a mechanism called linkage specification to enables smooth compilation of mixed code. Linkage between C++ and non-C++ code fragments is called language linkage. All function types, function names, and variable names have a default C++ language linkage. Language linkage can be achieved using the following linkage specification
 
The string-literal specifies the linkage associated with a particular function, for example, C and C++. Every C++ implementation provides for linkage to functions written in C language ("C") and linkage to C++ ("C++").
The solution to the problem under discussion is to ask the C++ compiler to use C mangling for the external functions to be called, so we can use the functionality of external C functions from C++ code, without any issues. We can accomplish this using the linkage to C. The following declaration of greet() in greet.h should resolve the problem:

extern "C" char *greet();

Because we were calling C code from a C++ program, C linkage was used for the routine greet(). The linkage directive extern "C" tells the compiler to change from C++ mangling to C mangling for the function, and to use C calling conventions while sending external information to the linker. In other words, the C linkage specification forces the C++ compiler to adopt C conventions, which are not the same as C++ conventions.

So, let's modify the header greet.h, and recompile:

%cat greet.h
#if defined __cplusplus
        extern "C" {
#endif

        char *greet();

#if defined __cplusplus
    }
#endif

%cc -G -o libgreet.so greet.c
%CC -lgreet mixedcode.cpp
%./a.out
Hello!

It works! Since the header greet.h was used in both C and C++ files, it is necessary to guard extern "C" with the C++ compiler's predefined macro _cplusplus. This is because the C compiler doesn't recognize the "C" portion of extern "C", and throws an error message for the same.
Let's have a look at the symbol table of mixedcode.o one more time
 
%CC -c -lgreet mixedcode.cpp
%elfdump -s mixedcode.o

Symbol Table Section:  .symtab
index    value       size     type bind oth ver shndx       name
[0]  0x00000000 0x00000000  NOTY LOCL  D    0 UNDEF
[1]  0x00000000 0x00000000  FILE LOCL  D    0 ABS         mixedcode.cpp
[2]  0x00000000 0x00000000  SECT LOCL  D    0 .rodata
[3]  0x00000000 0x00000000  FUNC GLOB  D    0 UNDEF     
    __1cDstd2l6Frn0ANbasic_ostream4Ccn0ALchar_traits4Cc____pkc_2_
[4]  0x00000000 0x00000000  FUNC GLOB  D    0 UNDEF       greet
[5]  0x00000000 0x00000000  NOTY GLOB  D    0 UNDEF       __1cDstdEcout_
[6]  0x00000010 0x00000050  FUNC GLOB  D    0 .text       main
[7]  0x00000000 0x00000000  NOTY GLOB  D    0 ABS         __fsr_init_value

 The function name greet was not mangled by the C++ compiler, and hence the linker could find the symbol in the object file and was able to build the executable.

Friday, August 6, 2010

Name Mangling in C++

When C++ compilers compile a C++ program, it encodes all function names and certain other identifiers to include type and scoping information. This encoding process is called name mangling. Linker uses these mangled names to ensure type-safe linkage. These mangled names appear in the object files and final executable file.

What's a symbol?

In every C++ program/library/object file, all non-static functions are represented in the binary file as symbols. These symbols are special text strings that uniquely identify a function in the program, library or object file

The Need for Name Mangling:

C language programs does not use name mangling, because in C no two non-static functions can have the same name. i.e., the symbol name is the same as the function name: the symbol of myfunc will be myfunc
Because C++ allows overloading (different functions with the same name but different number of arguments) and has many features C does not, like classes, member functions, exception specifications — it is not possible to simply use the function name as the symbol name. To solve that, C++ uses name mangling, which encodes the function name and all the necessary information (like the number and size of the arguments) into some special string which only the compiler knows about

eg.,
bpte4500s001:/sunbuild1/giri/testcases/% nm hide.o

hide.o:

[Index]   Value      Size    Type  Bind  Other Shndx   Name
[3]     |        16|      56|FUNC |GLOB |3    |2      |__1cKCRectangleKset_values6Mii_v_
[4]     |         0|       0|NOTY |GLOB |0    |ABS    |__fsr_init_value
[1]     |         0|       0|FILE |LOCL |0    |ABS    |hide.cpp
[2]     |        88|      32|FUNC |GLOB |2    |2      |main
"__1cKCRectangleKset_values6Mii_v_" is the mangled name

But this kind of scheme is undesirable for the developers because the names are difficult to read & debug

Two utilities are available with Sun Studio C/C++ compiler collection to convert the mangled names to their original source code names:
1) c++filt &
2) dem

C++filt is a filter that demangles (decodes) mangled names.
bpte4500s001% echo __1cKCRectangleKset_values6Mii_v_ | c++filt
void CRectangle::set_values(int,int)

"dem" is another utility to demangle C++ names
bpte4500s001% dem __1cKCRectangleKset_values6Mii_v_
__1cKCRectangleKset_values6Mii_v_ == void CRectangle::set_values(int,int)

Note:
C++ standard does not define how names have to be mangled; thus every compiler mangles names in its own way. Some compilers even change their name mangling algorithm between different versions. This could be a problem if the developers hack & rely on how compiler mangles the C++ symbols, as the same algorithm may not work with the next version of C++ compiler

Thursday, August 5, 2010

Interview qeustions on C/C++

1. What is the size of the structure, assume int to be 4 bytes and float to be 4 bytes.

struct {
    enum  {
        integer,
        floating
    } type;
    union {
        int a;
        float b;
    };
} t1;

Answer: The enum would take one integer (4) and other union together will take one more space of larget between two (4), so total 8 bytes.

suppose the structure delcaration is changes as below, then what would be the size?

struct {
    enum type {
        integer,
        floating
    } ;
    union {
        int a;
        float b;
    };
} t1;

Answer: This would result in compilation error, as there is no member of 'type'

2. What doe the following function do? can it be given a better name?

DT* func(DT* a, const DT* b)
{
    DT *c = a;
    while(*a++ = *b++);
    return c;
}

Answer: Assuming that DT is an integer or char, the function would copy data from b to a, untill a zero in b is obtained, zero is also copied, so it can be renamed as string copy function.

3. Is the out put of line one and two inside the loop same, or different?

int a[100][100];
for(int i = 0; i<100; i++) {
    for(int j = 0; j<100; j++) {
        printf("%d", a[i][j]);
        printf("%d", a[j][i]);
    }
}

Answer: The output would be different as they refer to different row and colum, but at the end they out put all the data in the two dimentional array.

If one needs to be prefered over the other, which one will you prefer? and why?

Answer: The first one will be faster as it has less cache miss in case of a cache enabled system compared to the second one, so in a practical system the using first line would be faster, and so it should be prefered. Theoretically if memory access does not use cache, then both will be same in terms of time taken.

4. What is function overloading? how does compiler will know which function to call?

class{ 
    void func(int a);
    void func(int a, int b);
}

Answer: Having many function definitions for the same functions with different number of arguments is called function overloading, and the compiler will decide about the functions by seeing the type and number of params passed to the function call, and decide on which function to call at compile time.

Suppose if the function is defined below with a default argument and when calling if it is not passed then how would compiler decide about function?

class{ 
    void func(int a);
    void func(int a, int b = 0);
}

Answer: Then the compiler wont be able to decide, and it will throw a compile time error.

Wednesday, August 4, 2010

First bit set in an Integer

Write a C code which returns the position of the first bit set.fot eg. for number 104(1101000) output will be 4.

You need to right shift the number untill the number becomes zero and count the number of times right shifted, then you subtract it from the size of an interger (32 bit on a 32 bit machine), you will get the answer. Here is the code for your reference.

int fbs(unsigned int num)
{
    int pos = -1;
    while(num!=0)
    {
        num <<= 1;
        pos++;
    }
    return 32 - pos;
}

Mod 16 without using arithmatic

Write a function which takes an integer value as an argument and return its mod 16 value without using these (%,+,_,/) arithmetic operations

int mod16(int a)
{
    return a>>4;
}

The 4 bits left shift will give mod 16.

what is the difference between dynamic and static linking? which is better?

A dynamic-link library (DLL) is an executable file that acts as a shared library of functions. Dynamic linking provides a way for a process to call a function that is not part of its executable code. The executable code for the function is located in a DLL, which contains one or more functions that are compiled, linked, and stored separately from the processes that use them. DLLs also facilitate the sharing of data and resources. Multiple applications can simultaneously access the contents of a single copy of a DLL in memory.

Dynamic linking differs from static linking in that it allows an executable module (either a .dll or .exe file) to include only the information needed at run time to locate the executable code for a DLL function. In static linking, the linker gets all of the referenced functions from the static link library and places it with your code into your executable.

Using dynamic linking instead of static linking offers several advantages. DLLs save memory, reduce swapping, save disk space, upgrade easier, provide after-market support, provide a mechanism to extend the MFC library classes, support multilanguage programs, and ease the creation of international versions.

In case of free software licenses also, there are provisions with LGPL, using which you dont have to share the code which are linked dynamically.