Rethinking about MATLAB


I have used MATLAB extensively in my academic life. I was introduced to this software in my 12th grade. By the time I was in my junior year, I was using v7.0 regularly for signal & image processing projects. It had small memory footprint and worked well as a demonstration tool. I have enjoyed getting instant results from its high level syntax, often admiring the quality of output. This opinion has however fallen out of favor gradually. The lacunae in the MATLAB ecosystem are too hard to miss for people serious about software models. MATLAB makes implementations easy. It is deceptively inviting but a bad tool from the perspective of learning and doing research. It does not enhance problem solving abilities because the codes heavily depend on closed-source binaries. The following is a discussion about what I feel is wrong about MATLAB.

Cost

MATLAB is prohibitively expensive. For a Student license, I paid about $100 USD (2011). It consisted of a basic installation with few toolboxes. Additional toolboxes cost $49 per item. The Standard license was pricier at ~$2700 USD with additional toolboxes at $1350 USD per item. That is a deterrent enough for any person wanting to try. I have heard the argument that extra toolboxes aren't really needed since basic toolboxes provide enough functionality to 'code your way up'. The purpose of using MATLAB is to not reinvent the wheel. If I had to write complex routines for a task, I would rather switch to Python or C++. Not having the libraries at hand defeats the purpose of rapid prototyping. While working on few of the classes, I had to purchase a few toolboxes, such as Parallel Computing and Filter Design toolbox. It was primarily due to course requirements. Many faculty members have trained themselves on MATLAB during their grad school days and naturally expect to develop class materials, or set assingments on MATLAB. Error propagration in real. It makes very little sense to burn a hole in your pocket for these add-ons, unless you desperately need them.

Many labs choose to buy network licenses which have their own problems. Every time MATLAB needs to be fired up, license needs to be authenticated on the network. MathWorks Inc. enforces strict adherence to the number of instances running on the server, limited by how many licenses have been bought. There are no free give-aways and no wiggle room. Sometimes waiting queues could get long, especially if the lab works on physics-based simulations. If network licenses are indispensable, then the GUI should be a lightweight client. Each user currently uses SSH with graphic options. Work efficiency is limited by the speed of Xwindows renders.

Syntax and Language

MATLAB's syntax choices are hotly debatable. For regular programmers, there are several drawbacks overshadowing the ease of use. People who use it are not professional programmers. It is meant to be a tool. However many times, the lines between a tool and a programming environment becomes blurry. The most talked about issue is the choice of indexing. MATLAB indices start at 1, unlike conventional programming languages. Zero makes complete sense. It is the starting point in Boolean logic and all counting defaults to zero. For example, in a simplified memory layout, a 1-D array can be represented as,

 [offset + 0000h] -> A[0] //index 0:N-1

MATLAB employing the Fortran style, implements:

[offset + 0000h] -> A[1] //index 1:N

The single largest source of bugs in MATLAB is due to the array index going out of bounds. Also element referencing is done via parenthesis rather than brackets, making it look like a function call. The string manipulation in MATLAB can be very convoluted as string are treated as 'containers' rather than string of char (C/C++) or as objects (Python). Conversion of string to characters or vice-versa is not implicitly done. It requires special functions at times. MATLAB also introduces the construct cell, which is a matrix of matrices. Unfortunately, many MATLAB operations result in cell outputs. Wrangling the data out of such construct is wasteful in terms of time. It would have been easier to leave strings as character arrays, without marrying them to matrices or vectorization.

MATLAB functions are queer entities. Although many coders who are reliant on STL and APIs might brush this off, it is important to abstract the structure of data from the function. With MATLAB, formatting the matrices to fit the function prototype is common practice. The functions itself aren't flexible enough to be tweaked with positional or keyword arguments as required. I have had custom pre-processing steps in my pipeline as workarounds. This is an uncharitable design flaw for seasoned users.

Code partitioning is a drawback which has propagated through every version of this application. It is so pervasive that MathWorks probably accepts it as a feature. It is frustrating to see a function work only when it is saved to a file of the same name. It is hard to define two functions in one such M-file, making the codebase at times messy. The only respite is ability to create nested functions. I wish the namespace feature could have been emulated from the C++ coding style, but it will break functionality across all their production code. Messy codebase translates to harder bug-tracking.

There is a dire need to simplify the standard library by merging few obvious candidates such as rand() and randn() for RNG. In its place, a single function rnd() with appropriate arguments or flags could be used. The refactoring scope isn't just limited to redundancy removal. On close inspection, we can notice that the function prototypes do not follow a single design pattern. While some function prototypes are built mimicking POSIX standards, there are others which do not conform to it. This is a byproduct of several work-groups with their own design philosphies leading to a fractured design overall. Figure and data export options need to improve as well. Currently they are restricted to .fig and .mat formats. It is hard to use them anywhere except MATLAB or Octave. A mildly bothersome issue is about Long Term Support (LTS). Each version sees some modification and deprecation. It is more common in the toolboxes rather than the main MATLAB interpreter. It seems doubtful if software written utilizing any of the paid toolboxes might actually work predictably after couple of years. Such an irony that the toolbox, for which a premium amount was paid, becomes the first to fail!

Efficiency, Memory & Execution Issues

Users of MATLAB get cozy in the comfort-zone afforded by the package. We often forget the nuts and bolts in chasing the end results. At times it is hard to understand the nature of the bug because MathWorks hides the nitty-gritties in p-files. MATLAB is slow even by interpreted language standards. Looping and branching are much slower than Python. It takes a considerable time to load the base system, before any execution. It has a large RAM footprint but a considerable chunk is used on GUI/workspace management. Core execution speeds have not changed drastically over the last few years.

Memory management is a touchy topic with MATLAB since it sits on a large overhead of RAM. A potential work around for UNIX-like systems is to use the --nojvm --nodisplay flag from the command line. But that does not guarantee trouble-free execution. MATLAB performance is also plagued with memory leaks. The software juggles through rich graphics, dynamic workspaces and the core utilities. Garbage collection is done in MATLAB when required, not when possible. The GUI runs sluggishly as a result Over successive versions, MATLAB has improved the reliability issues, but there is a lot of room to improve since persistent JVMs, poor GC and option-heavy GUIs can sour user experience. Parallel computing option has to be included in the standard package, rather than making it an optional add-on. MATLAB is crippled by working on a single thread on modern CPUs.

MATLAB has a polished look. Its interface and reliability has inspired open source packages such as Jupyter. It also has a decently good documentation despite the language design issues mentioned above. This makes it a popular teaching tool. It is ideal for someone who has never done scientific programming and may not require much coding in general. But since it is a closed source model, there is not much of a community contribution except for the feature suggestions and bug-reports. Users have to keep their faith that the implementation of the function/algorithm is bug-free in their application. There is no way to validate it except to keep a lookout for inexplicable outputs. MathWorks does a fairly good job of shipping bug-free codes. However, implementations require extensive refactoring as discussed before.

Simulink: A tragedy in two acts

Simulink was developed as a blatant rip-off of National Instrument's LabVIEW. It was a product Mathworks should have never invested in (hence Act-I of this tragic story). Looking closely, anyone could see the huge disconnect between the two mammoth products. MATLAB's charter is to be efficient at solving vectorized problems via a text/script interface. Simulink on the other hand simulates real world elements - springs, resistors, diodes - graphically. One can't help but notice the conceptual divide. Simulink is advertised aggressively to be "able to solve real world, practical applications, including creating production systems and FPGA for embedded environments". In reality, LabVIEW is a more pervasive solution with better parts and driver support. National Instruments Inc. has a whole ecosystem to deliver end to end solutions. No one takes Simulink seriously (Act-II). In production scenario, Programmable Logic controllers (PLCs) are the mainstay of automation. Neither LabVIEW nor Simulink (or Wolfram SystemModeler) stand a chance.

Closing thoughts

For purposes of prototyping and proof-checking, MATLAB is an excellent tool. It is easy to fire up and run toy routines to make sure we are on the right track. One should never aim to build a career out of writing MATLAB codes. It never measures up to the reputation of a regular programming languages because it was never meant to be one. It is good in academia for non-CS/EE majors needing solutions computationally. Companies will never embrace this language because of efficiency issues. Do not trust the F-35 advertisement.