Friday 7 March 2014

Working with C/C++ Static and Dynamic Libraries


In today’s world the applications we need to build can easily run into of thousands of lines of code. Applications with such big complexity can never be built monolithically as that will be a maintenance nightmare, prone to tight coupling and will require huge build time. Hence breaking huge applications into smaller logical modules is need of the day.
We need to break a big application into several modules, compile them separately and then link them together. This brings in the two linking techniques namely Static Linking and Dynamic Linking.


These linking techniques are not new things and lot of literature is available about them on the web. However I see a lot of misconceptions about them on how to use it, how to handle them in platform independents way and how they are done differently on two major compilers namely VC++ and gcc. In this post I want to show with examples how can we export smaller modules to be built as static and dynamic libraries. And how can we consume these libraries in bigger applications.

Static Linking

For linking a module statically in a big application we need to create a static library first. A static library is a set of routines, external functions and variables which are resolved in a caller at compilation and copied into a target application by a compiler, linker, or binder, producing an object file and a stand-alone executable.
For creating a static library we need to compile the source files of the library into object files(.obj files on windows .o files on linux) first and then package them together using an archiver. There are compiler command/flags available for doing the same
Its important to note that during creation of a static library there is absolutely no resolution of symbols done and hence if there are some un-resolved functions or static variable lurking in the static library, we will not come to know about them during the creation of static library.
These static libraries are then linked using a linker/loader to a final executable or a dynamic library(dll). At this time the linker tries to resolve all the functions required from the static library and directly copies the code from them into the final output application or dll. This leads to growth in size of the application or the dll.
An application created only using the static library is self sufficient and it does not require the library to be present at the runtime. Here is the flow for creating a static library and using it.

Example: Lets take a c library that calculates factorial.(fact.h and fact.cpp)
So in order to compile these source into a static library following command should be given the compiler.
Windows:
cl /Fofact.obj /c fact.cpp /TP /nologo /W3 /EHsc /O2 /GR /c /MD
lib /nologo /OUT:fact.lib fact.obj
Linux:
g++ -o fact.o -c fact.cpp
ar rc libfact.a fact.o
ranlib libfact.a

Steps to link to a static library. 
Lets use the static library created above in a testbench(testbench.cpp) file. We need to give following command to the compiler.
Windows:
cl /Fotestbanch.obj /c testbanch.cpp /TP /nologo /W3 /EHsc /O2 /GR /c /MD /I<library path>
link /nologo /OUT:testbench.exe /LIBPATH:<library path> fact.lib testbanch.obj
Linux:
g++ -o testbanch.o -c -I<library path> testbanch.cpp
g++ -o testbench testbanch.o -L<library path> -lfact 

Advantages of Static Linking 
a. The application can be certain that all its libraries are present and that they are the correct version. In case there is any version discrepancy then most likely it will be caught at compile time. 
b. Better performance
c. Easy packaging and distribution of final executable. 

Disadvantages of Static Linking
a. Huge application size and build time.
b. Any change in the library requires complete application rebuild.
c. Compatibility issues if library and application are built using different version of compiler
d. In convenient to share libraries across teams/organization having different build system(Make, Scons, VC proj)

Dynamic Linking

Dynamic linking is a technique where the operating system on which an application is running, loads the shared libraries at the run time, gets the entry point into the dll/.so and calls function at run time. At the time compilation, the machine code present in a shared library are not directly copied in the final executable as in case of static linking. Here only the sanctity of entry points are checked. Thus dynamic linking keeps the size of executable small and it is much more faster for compilation. 

There are two kinds of dynamic linking namely explicit dynamic linking and implicit dynamic linking

Explicit Dynamic Linking
In case of explicit linking the application has to open the dll first using OS API calls. Then it has to get the handle of symbols exported from the dll as function pointers, And then calls the APIs using this function pointer. The OS API calls are platform dependents and they need to be handled differently on windows and linux. However the concept is same. 
Here is the source code of a class that can be used to open a dll and gets its symbol for explicit linking using same API on windows and linux(dllUtils.h & dllUtils.cpp).

The diagram below depicts the flow for Explicit Dynamic Linking.


Implicit Dynamic Linking
In case of implicit linking the linker implicitly assumes that all the symbol exported by the dlls are available and it directly links the code that uses those functions using a library file that gets created when the dll/.so is built. An executable created like this will show a direct dependency on the dll when viewed through a dependency walker. The dependent dll need to be present in system PATH(On windows) or LD_LIBRARY_PATH(On linux) when the application is being executed else it will fail to start.
Another point to note is that all the functions, classes present in a library which is compiled as a shared library are exported differently by different compilers. The visual C++ ‘cl’ compiler exports only the symbols which are marked with __declspec(dllexport). However g++ compiler exports all the symbols in the library by default. Apart from that if we are exporting function names then we should mark it with extern “C” to avoid name mangling for those symbols.
The diagram below shows the flow for using implicit dynamic linking.

Steps to build a Dll/Shared Library. 
Lets consider two files (fact.h and fact.cpp) which we need to compile as a shared library. The following the the compiler command to accomplish that.
Windows:
cl /Fofact.obj /c fact.cpp /TP /nologo /W3 /EHsc /Z7 /O2 /GR /c /vmg /MD
link /nologo /dll /out:fact.dll /implib:fact.lib fact.obj
Creates fact.dll and fact.lib files.
Linux:
g++ -o fact.os -c -fPIC fact.cpp
g++ -o libfact.so -shared fact.os
Creates libfact.so

Steps to Link to a Dynamic library. 
Windows:
cl /Fotestbanch.obj /c testbanch.cpp /TP /nologo /W3 /EHsc /O2 /c /MD /I<library path>
link /nologo /OUT:testbench.exe /LIBPATH:<library path> fact.lib testbanch.obj
Note that fact.lib is the file that gets created while creating the fact.dll.

Linux:
g++ -o testbanch.o -c -I<library path>  testbanch.cpp
g++ -o testbench testbanch.o -L<library path> -lfact 
Note <library path> should contain libfact.so file.

Implicit Linking Vs Explicit Linking

Implicit Dll Linking
Explicit Dll Linking
The complete header file of the dll sources are required
Only the function signature of the symbols in the dll are required

No extra source code is required to open dll, get symbols etc in the user code


Code that opens the dll and get the symbols from the dlls is required in the user code.


The dll need to be placed mandatorily in the PATH(On windows) and LD_LIBRARY_PATH(on linux).


The dll can reside anywhere only its path need to be provided to the routine that needs to open the dll.


Less flexible if there is any change in the dll.


More flexible in case of change as long as the entry functions are not changed.


No platform specific code for opening the dll required as compiler does the job as per platform.


Platform specific code required. See the example provided


Works well even in case of mangled name


The symbols need to be exported with extern “c”



Exporting a class from a dll(Naïve Approach).

In order to explort a class from a dll we need to declare the class like below on windows.
class __declspec(dllexport) employee
{
public:
                employee(const char * name, unsigned int age);
                const char * get_name();
                unsigned int get_age();
private:
                string m_name;
                unsigned int m_age;
};

In case we need to only export a particular function from a class we can declare it like below.
class employee
{
public:
        employee(const char * name, unsigned int age);
        __declspec(dllexport)  const char * get_name();
        unsigned int get_age();
private:
        string m_name;
        unsigned int m_age;

};
Note: It is important to note that the above arrangement is required only for visual studio compilers. gcc/g++ by default export all the symbols from a .so file and that .so file can be linked implicitly to get complete functionality of all class exported from it.

Some Limitations with exporting a class:
1. Creates a very tight coupling between the dll and the executable using it.
2. We can only use implicit dynamic linking for a class exported from a dll.
3. The dll and the final executable need to be built with same version of the compiler else they will not work together.
4. It becomes messy to export a class containing STL classes as by default these classes are not exported with declspec directive.
5. If there is any change in the header of the class of exported dll then the executable using that dll also need to be rebuilt else very unpredictable crashes can occur.

How to find out what all symbols exported from a dll?
Windows
We can use dependency walker tool that can be downloaded from this link http://www.dependencywalker.com . When you open a dll in it we can see the list of all the symbols exported from it. It also list down all the dependent dll for a particular dll or an exe. A snapshot is shown below.

Linux:
On Linux we can run ‘nm’ command to spit out all the symbols present in a shared library and 'ldd' command can be used to find out what are the dependencies of a particular dll or executable. It should be noted that to run an executable or to use a dll we all these dependencies need to be present in path specified by LD_LIBRARY_PATH. Else dll load will fail or executable will fail to start.

ldd libfact.so 
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000002a9565a000)
        libm.so.6 => /lib64/tls/libm.so.6 (0x0000002a95879000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000002a959ff000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000002a95b0b000)
        /lib64/ld-linux-x86-64.so.2 (0x000000552aaaa000)

nm libfact.so 
..
0000000000101280 b completed.1
0000000000000bc4 T fact
0000000000000a50 t frame_dummy
..

Design pattern for inter operability between two modules via an interface.

If we are intending to export a class from a dll then we should never export the whole concrete class. We should work with interfaces. So as long as there are no change in the interface file the dll and the executable using it can change and mature independently. The idea is very simple if we want to export a class then first define an interface file that lists down all the API that are provided by the class. Then we should derive the class from the interface and implement its APIs. Now defining interface gives us flexibility to export the class even explicitly. Lest see how it is done.

//emp_intf.h
#ifdef _WIN32
   #define DLL_EXPORT_DECL __declspec(dllexport)
#else
    #define DLL_EXPORT_DECL
#endif

extern “c” DLL_EXPORT_DECL employee_interface_t * create employee(const char name, age);

extern “c” DLL_EXPORT_DECL void delete_employee(employee_interface_t * handle);

class employee_interface_t
{
public:
virtual const char * get_name()=0;
virtual unsigned int get_age()=0;

};

employee.h
#include” emp_intf.h”
class employee: public employee_interface_t
{
public:
        employee(const char * name, unsigned int age);
        ~ employee();
        const char * get_name();
        unsigned int get_age();
private:
        string m_name;
        unsigned int m_age; 

};

//employee.cpp
#include”employee.h”
employee_interface_t * create employee(const char name, age)
{
      employee * emp = new employee(name,age);
      return dynamic_cast< employee_interface_t *>(emp);
}
void delete_employee(employee_interface_t * handle)
{
assert(handle);
delete (dynamic_cast<employee*>(handle));
}
//Implementation of other functions of employee class 


The executable using this particular dll only need to include emp_intf.h header file  and it can choose to link either implicitely or explicetely. As long as this interface file is not changed, the dll and exe can be updated independently. 
One word of caution: We should try to use only the native data types(int,char,float etc) as parameters and return types of all the APIs defined in the interface. We can use complex class but then we will again land up into the same issue of tight coupling if any of those complex class keep on changing.

Download Source Code
The workspace for static and dynamic linking can be downloaded from following link. 
In order to build them I have provided scons Sconstruct file. For running scons build user need to install python 2.6/2.7 and scons from this website(http://www.scons.org/). 
After installation go to the respective folder and run 'scons' command.

Conclusion
In this post I tried to list down some of the care about to consider while breaking out a large code base into libraries and dlls. I tried to list down some of the nuisances of static and dynamic linking and shared some example code to explain the concepts. Hope readers find this post useful in practical use.
Take Care.

No comments:

Post a Comment