Saturday, January 5, 2013

[programming quickstart]: on programming languages

Introduction
This one is a short essay about my opinion on some of the modern programming languages.
Well, I've not finished it, but decided to post it as is and edit later

With the plethora of programming languages out there many of us are confused about which one to use for their project or as a default language to stick to. Those of us who only start their journey into the fascinating world of programming often find themselves confronted with the question of chosing their first language and I'm sometimes asked to give little advice.

Personally I enjoy trying out every other language I hear about - because I find it fascinating to imagine of what their authors were thinking and how they came to the design. Besides, what I generally like in engineering and IT is that learning a new thing introduces you to the history of the evolution of the industry and makes you get to know and meet a lot of amazing enthusiasts. So I figured I'd go ahead and write a small blog post about various languages I had experience with and my general opinion on the matter.

TLDR:
As usual, the post has some links for further reading at the end (so you can scroll down if you are bored). While I do not consider Wikipedia a reliable source of information, it is still quite useful because it allows you to find links to original papers and you can get a broader knowledge of the subject by skimming through "related" links.

Now, let me tell you some improtant points in case you didn't know or are still confused
  • Every algorithm or a problem can be solved in any language.
  • Some languages allow you to solve some problems in an easier way
  • Some languages bring more fun (positive one as opposed to misery of bug-hunting) to the process of writing code
  • It is usually possible to reuse code written in other languages, but the difficulty of the process may vary
What does really matter when choosing a language?
  • Popularity. A popular language has a community where you can ask questions.
  • Commercial support or maturity. A mature language or the one backed by an enterprise funding is likely to survive for a long time and there's a chance you'll be able to use your code unmodified many years after it's written.
  • Available libraries. Ok, in most cases you write algorithms yourself, but sometimes you'd rather shove an existing piece of code rather then reinvent the wheel. It's important that basic libraries for your tasks (UI, Net, data parsing (XML/JSON), media, etc) exist so you can concentrate on solving your problem. 
  • Native code integration. Well, it's the key point when choosing anything in engineering - how well it integrates into other commonly used solutions. If your language has a FFI (Foreign Function Interface) which allows you to easily reuse binary libraries written in C/Assembly or there exists an automated code generator allowing to plug in the code written in other languages - chances are you can use it for most of your projects. It is important to notice that some languages/platforms (C#/.Net, ruby/python, Haskell) allow you to easily import native code/data and specify data types (sizes, argument types) directly in code, without having to write a stub C library that would touch the dirty inners of the compiler or a VM (Virtual Machine) like it is done for Java and OCaml.
One important aspect to consider is the type system - that is, how data types and variables are declared and how the compiler/interpreter checks the code correctness. We can classify languages based on the typing system used in several ways:
  • dynamic (like ruby which allows to do a lot of stuff at runtime via introspection/reflection [getting type information like available functions and arguments dynamically] at the price of getting a huge crash if you make a typo)
  • static (haskell - where all types are checked during compilation and there even exists an opinion that if a program written in a statically typed language compiles, it is correct)
  • weak(implicit type conversion like in JavaScript and Perl meaning you can add integers and strings and get integers... weird)
  • strict (not even explicit type conversion is possible - like in Haskell/ML which allows to prevent the abuse of type checking)
  • explicit (like C, C++, Java) - you have to define the variable types all the time (like, "int x = 1" or "float Sqrt(int x)..." )
  • implicit (most functional programming languages like Haskell, F# and recently C# and C++0x) - variable type is deduced from the context [TypeInference]. That is, the compiler tries to find at least some variables for which it can tell the type for sure. Like, integer or float constants. Then it goes back substituting each occurence of the untyped variable with the type of the constant that can be used there. Sort of how Hindley-Milner type inference works [HindleyMilner]
Now, let's go over some languages and discuss each of them individually

Assembly
While you may not encounter the case when you really need to use the assembly language, learning a couple different assemblies (a RISC one like ARM or MIPS and a CISC one like X86) and some specific devices (like, DSP (digital signal processors) or a SIMD (single instruction multiple data which are essentially algebraic operations over vectors of data)) will give you a good insight into how computers work.

Modern compilers are typically good at generating optimized assembly code, so you should really only use assembly where it is needed (like, modifying coprocessor registers, flushing caches) and wrap CPU-specific assembly in the C code so that the major part of your application remains portable.

Please do not fall for those who shout that "assembly is so darn fast I gotta rewrite everything in it". In most cases the performance benefit will be outweight by the complexity of rewriting the code, maintaining it and porting it to a new architecture in future.

two rules of thumb in software optimization:

  • Always profile the software (that is, analyze which part of computation, which routine takes most time) before trying to optimize. Remember that Knuth quote, "premature optimization is the root of all evil". That means that you should be aware of the danger of wasting too much time optimizing the wrong part of the program which actually has a negligible influence on performance.
  • If some optimization gives you a linear increase in performance, forget about it. You can get the same optimization by doing nothing and just waiting for a next generation of CPUs to come out in half a year. If you really, really need performance, start with trying to decrease algorithmic complexity


C
Now, want or not, you need to know this language and its standard library for it has influenced the design of many subsequent languages and the standard library calls (like fopen, fprintf, fread) are found in the majority of languages (php, ruby, MatLab).

Besides, C is known as 'portable assembly' which means it gives you the precise control over data structures layout in memory, but adds type safety to assembly and you can just recompile your code for any architecture.

In essence, C is the glue that holds together the vast majority of all the languages out there. It is the lingua franca of modern programming.

C++
C++ brings many nice features to C: inheritance, virtual methods, namespaces. It also has a good STL library which has algorithms and data structures which makes it suitable for complex tasks from implementing communication protocols to building large-scale complex frameworks.

Advantages over C:
  • more strict type checking
  • metaprogramming (template classes)
  • STL
  • Constructors/Destructors (RAII - resource acquisition is initialization pattern). Kind of makes it easier to clean up memory/file descriptors without the bunch of goto-based error handlers.
However, there are still reasons why C++ may not be the best choice for some projects

  • No automatic garbage collector (which may be good for some realtime systems but is generally a shame)
  • No stable ABI (application binary interface) meaning binaries compiled with one C++ compiler may not work with the ones made with the another one
  • Complex and unclear specification and standard. As of 2012, there's no single compiler conforming fully to any standard revision, and some features like external templates, are not implemented by the major vendors

I would like to point out that there's an amazing Qt4 (now Qt5) framework which has libraries for everything including graphics, media, database access with a semi-automatic memory management which means you can build complex applications as easily as in Java or C# and get all the advantages of the native code (portability, performance, integration)

Java/JVM
Java is one of the many languages implemented on top of a VM (Virtual Machine, a software abstraction that behaves like a real computer), namely JVM.

It has the following nice properties
  • Lots of tutorials/quickstarters for beginners
  • Huge community of developers
  • Backwards compatibility - you can be sure your code still works a couple years after it's written
  • Amazing library/dependency management system - Maven
  • Comprehensive standard library (runtime) with lots of algorithms
  • Good threading support, with the whole range of synchronization primitives and even advanced stuff like memory barriers
  • Good support for asynchronous IO (java.nio) for files and sockets
  • Own packaging system with package hierarchy and signing. While most *NIX developers find this irritating, there's one huge advantage - you can pack your whole project with all dependencies and configs in a huge jar file and not worry about system updates breaking your application :)
  • Fast VM (JVM). Okay, maybe not always as fast as .Net, but more portable (well, at least it runs on linux and OSX), and by orders of magnitude faster than ruby/python. Hence the reason a lot of  developers who enjoy ruby switch to jruby - an implementation of ruby running on top of JVM (hmm.. sounds a bit like off-topic)

However, it is no silver bullet and there may be some reasons why you should avoid Java for some projects
  • No good UI toolkit out of the box. Unfortunately JavaFX is not yet (and probably never will be) the part of the standard JRE, and, unlike QML in Qt4 or WPF (Windows Presentation Foundation) in .Net, there's no easy way to create animations, gradients and generally desing the interface in a declarative way (that is, using only a markup language like html/xml without writing code). To some extent NetBeans and Eclipse compensate for that by having intuitive GUI builder tools.
  • No optimization for SIMD/vectorized computations (e.g., SSE, NEON) in JVM. Which means the capabilities of modern CPUs allowing for high-speed multimedia processing and power saving cannot be fully utilized until you write in C with JNI
  • No builtin syntax for raw memory access (like, unsafe in C#). Which makes interacting with JNI or manipulating raw data (like textures in games) quite complicated
  • Strange design solutions violating the so-highly-praised OOP (empty interfaces like Cloneable, erased types for generics in bytecode, mutable final variables -> System.out and friends). While these are not major problems, you should watch out while programming in Java and read documentation very carefully
Overall, Java is a good choice for most tasks, especially multi-threaded web servers due to the rich support for parallel programming, async IO and popular protocols and standards (including HTTP, URL encoding etc). However, it is probably not the best choice for interactive multimedia applications requiring low latency (for example, a music synthesizer) or small destkop apps (because while JVM is very fast, it takes some time to load, and sometimes it does matter whether the app takes 20ms or 2seconds to launch, the users are impatient).

However, the biggest problem is that popular frameworks like Hibernate or Spring are too difficult for beginners because most tutorials don't cover some important issues, and documentation is unfortunately rather vague. It pays off learning all this stuff though - you can set up a complex website  writing virtually no code, relying on the configuration files for customisation.


C#/.Net
C# is often called the "better Java" and .Net in general borrows hugely from the Java world and the JVM architecture. Let's see which strong and weak points are here.

Pros:

  • As C# is intended to be the major programming language of the windows platform, .Net CLR (common language runtime) and .Net VM were specially optimized for native code interoperability and multimedia capabilities. Thus, .Net includes the P/Invoke mechanism (DLLImport, typed imports of native functions so that you can use your native code without writing helper stubs in C).
  • The language incorporates nice features which make the code more compact and easier to write and read (type inference with the "var" keyword, lambda expressions)
  • LINQ. This is a built-in query language which makes interactions with databases and XML very easy. Take a look at the example which shows how easy it is to obtain data from  

 using (NorthwindDataContext context = new NorthwindDataContext())
  {
    var customers =
      from c in context.Customers
      where (c.ContactName.Contains("John")
      &&
      c.CompanyName.Contains("Enterprise")
      select c;

  }

Cons:

  • Major updates break binary compatibility. Probably not a huge problem except that you'll have to keep a copy of the older runtime around for legacy apps. I guess the advantages of this decision outweigh the problems
  • The default implementation is non-portable, non-crossplatform and not FOSS (free and open source software). Which means you're essentially locked to the Windows ecosystem and depend on Microsoft's design decisions. A typical vendor lock. Something you cannot afford when you need to guarantee your software reliability and availability
  • No free (or any) development tools for non-m$ platforms. Even the versions of mono (the FOSS .Net implementation) for Android and iOS cost money, and due to licensing and technical issues such apps will never find their way into application markets.

smalltalk
Smalltalk is an object-oriented language with the long history. Maybe you should try it out just to enrich your knowledge of various design solutions and break your brain

The good

  • Nice message-passing model instead of direct method calls. This eventually simplifies null-pointer handling and eliminates the need for type checking in most cases when we know the desired method (or, rather, signal handler) is implemented
  • Mixin-based object inheritance. Which means any object can be extended with custom methods, i.e., without inheriting from a superclass
  • Many implementations with low footprint. Can even run on bare metal hardware. There exists an operating system written in Smalltalk
  • Lots of libraries and bindings to popular libraries

The bad

  • Almost extinct today - no developers, no vacancies, no active communities


Perl
I don't like perl. But I'll find a couple minutes to write about it later

PHP
PHP is fairly called the fractal of bad design. And it's not surprising. There are positive moments, of course:

  • A lot of ready-to-use libraries, frameworks and web site engines
  • A lot of cheap web hostings
  • Lots of jobs and vacancies

But they are hugely overweighed by the following problems:

  • Weak typing with implicit coercions. This leads to numerous runtime errors
  • Eval() support for evaluating a string of PHP code, ability to concatenate PHP code with user-input variables - the result is a multitude of security holes
  • Awful standard library, with completely unlogical function namespaces and arguments
Ok, I wanted to write about some other languages, but just leaving the placeholder for now

javascript - pure evil
python - not a bad one, but I've not used it much
ruby - i absolutely love it
Lisp - too much parentheses
Erlang - it rocks
Ocaml/F# - also cool. my tool of choice
Haskell - cool, but too complicated


Now, I'll just reming you to take a brief look at some mathematical software. Might be useful or funny.
Maxima
A nice CAS (Computer Algebra System) written in LISP. Among other cool features it supports symbolic evaluation of expressions and symbolic integration/differentiation. Take a look at the examples and keep getting amazed

Matlab/Octave
Matlab is a commercial general-purpose CAS. Octave is a FOSS implementation of the language which aims to be as compatible with matlab as possible and allow directly reusing matlab code.

Unlike Maxima, Matlab is falls to numeric methods. It has a lot of common algorithms like FFT (Fast Fourier Transform), audio and image processing routines which is why it is commonly used for prototyping algoritms at universities.

One interesting property of Matlab is that internally all data structures are based on vector arrays which closely resembles the SIMD computation unit. Practically this means that parallel operations on multiple data, like adding two vectors, are instantaneous while by-element access is incredibly slow. This makes writing high-performance code in Matlab a challenging task, but it gives valuable experience in optimizing the software for SIMD processors and GPUs so you may want to spend more time practicing it.

Julia
Julia is a very new language similiar to Octave. There are however several reasons why it is worth looking at and has chances to be actually useful and not the lab toy

  • It uses LLVM (Low Level Virtual Machine, the latest buzzword in compiler construction and optimization. Chances are it has decent performance
  • It has builtin keywords for parallel computation blocks. Like OpenMP for C
  • It has the support for distributed computations


R
R is a well known statistical toolbox. It has a huge archive of statistical algorithms at CRAN (Comprehensive R Archive Network, just like CPAN for Perl). It gives you virtually unlimited possibilities to analyze data and draw beautiful plots. Just take a look at [RLinearSquares] and see how easy it is to do a least squares regression analysis with R!

Scripting Languages
Now, let's discuss some scripting languages that are not typically used to write large software but can be used to greatly simplify daily routine tasks (like, file renaming)

AWK is a simple language for text processing. It works by matching a text against a regular expression and then performing an operation on it. It can be used to quickly transform a formatted text
For example, assume we have a text file containing some organization salaries during the year: Name,  Month, Money

Bob 1 2000
Alice 1 2000
Jonh 1 1000
Bob 2 2000
Alice 2 2000
John 2 2500

We could use a simple one-liner to calculate the total amount of money Bob has earned
awk '/Bob/{ sum += $3 } END { print sum }'

Sed
Sed (streaming editor) is similiar to AWK and is also a classical UNIX tool. It is mostly used to replace phrases in text files or live streams. For example, here is how you can replace all the occurrences of "foo" with "bar" in the file named "test.txt"
sed -i s/foo/bar/g test.txt

Bash
Bash is one of many UNIX command shells. It has an imperative syntax much similiar to that of C and Ruby, and allows to automate most daily routine tasks without the need of delving into the depths of operating system internals.

A somewhat not very useful example. Assume you have a lot of files like DSC0001.JPG, DSC0002.JPG from your camera after a summer trip to the seaside. You could rename them altogether to make it easier to recognize them in the mess of media on your hard driver.

for i in DSC*; do mv "$i" "`echo "$i" | sed s/DSC/My trip to south/`"; done  


Programmable Circuits
Let us discuss some languages most people have not even dreamed of. There exists such an interesting area of electrical engineering and computer science as computational logic design. That is, designing the CPUs and all other kinds of VHSICs (very high speed integrated circuits which is nowadays is a collective term for any electronics that are too tiny to see with a naked eye).

From the university course of algebra we know that, in essence, all computations, like additions, multiplications and branching can be represented using common boolean operations. Now, these operations (like conjunction, disjunction and negation) can be implemented electronically as standalone units known as gates. Therefore, we can think in terms of gates with inputs and outputs and ignore the electrical characteristics of the circuit (remember, we live in the idealized digital world which is a no-brainer).

So, instead of writing a huge diagram comprising miles of wiring and tons of paper to print it out, we can try to make a language that would compile itself into boolean functions. And turns out, this has been done long ago and that's how modern electronics design is done. This is called HDL which stands for the Hardware Description Language.

There exist two major HDLsverilog and VHDL (actually, there exist some others like SystemC and SystemVerilog, but they are less widely used and eventually there's little practical difference between all of them). The primary difference between them is that verilog is weakly and implicitly typed and programming in it feels like a mix of C and Erlang and VHDL is strictly typed and gives a feeling of Haskell or Delphi.

I guess we'll leave examples till I write the article on circuit design basics.

Note to self: gotta explain transistors, npn vs pnp vs FET, clocks, latches. maybe write a post about electronics and circuit design?

To sum up,
as most of the stuff I do is either programming microcontrollers, writing OS drivers and playing with DSP and computer graphics (opengl), as I am using linux and prefer simplicity, my languages of choice are the following:

C - for most small tools and drivers
C++ - for complex projects involving UI and when I don't want to reinvent OOP with C-style macros

ruby - for simple text processing and to prototype algorithms

OCaml - for writing parsers and interpreters
Octave - instead of a calculator

References
[HindleyMilner]