Rethinking about MATLAB
I have used MATLAB extensively in my academic life. I was introduced to this software
in my 12th grade. By the time I was in my junior year, I was using v7.0 regularly for
signal & image processing projects. It had small memory footprint and worked well as
a demonstration tool. I have enjoyed getting instant results from its high level syntax,
often admiring the quality of output. This opinion has however fallen out of favor
gradually. The lacunae in the MATLAB ecosystem are too hard to miss for people serious about
software models. MATLAB makes implementations easy. It is deceptively inviting but a bad
tool from the perspective of learning and doing research. It does not enhance problem solving
abilities because the codes heavily depend on closed-source binaries. The following is a
discussion about what I feel is wrong about MATLAB.
Cost
MATLAB is prohibitively expensive. For a Student license, I paid about $100 USD (2011).
It consisted of a basic installation with few toolboxes. Additional toolboxes cost $49
per item. The Standard license was pricier at ~$2700 USD with additional toolboxes at $1350
USD per item. That is a deterrent enough for any person wanting to try. I have heard the
argument that extra toolboxes aren't really needed since basic toolboxes provide enough
functionality to 'code your way up'. The purpose of using MATLAB is to not reinvent the
wheel. If I had to write complex routines for a task, I would rather switch to Python or
C++. Not having the libraries at hand defeats the purpose of rapid prototyping. While working
on few of the classes, I had to purchase a few toolboxes, such as Parallel Computing and
Filter Design toolbox. It was primarily due to course requirements. Many faculty members
have trained themselves on MATLAB during their grad school days and naturally expect to
develop class materials, or set assingments on MATLAB. Error propagration in real.
It makes very little sense to burn a hole in your pocket for these add-ons, unless you
desperately need them.
Many labs choose to buy network licenses which have their own problems. Every time
MATLAB needs to be fired up, license needs to be authenticated on the network. MathWorks Inc.
enforces strict adherence to the number of instances running on the server, limited by how
many licenses have been bought. There are no free give-aways and no wiggle room. Sometimes
waiting queues could get long, especially if the lab works on physics-based simulations.
If network licenses are indispensable, then the GUI should be a lightweight client.
Each user currently uses SSH with graphic options. Work efficiency is limited by the
speed of Xwindows renders.
Syntax and Language
MATLAB's syntax choices are hotly debatable. For regular programmers, there are several
drawbacks overshadowing the ease of use. People who use it are not professional programmers.
It is meant to be a tool. However many times, the lines between a tool and a programming
environment becomes blurry. The most talked about issue is the choice of indexing. MATLAB
indices start at 1, unlike conventional programming languages. Zero makes complete sense.
It is the starting point in Boolean logic and all counting defaults to zero. For example, in a
simplified memory layout, a 1-D array can be represented as,
[offset + 0000h] -> A[0] //index 0:N-1
MATLAB employing the Fortran style, implements:
[offset + 0000h] -> A[1] //index 1:N
The single largest source of bugs in MATLAB is due to the array index going out
of bounds. Also element referencing is done via parenthesis rather than brackets, making
it look like a function call. The string manipulation in MATLAB can be very
convoluted as string are treated as 'containers' rather than string of char
(C/C++) or as objects (Python). Conversion of string to characters or vice-versa is
not implicitly done. It requires special functions at times. MATLAB also introduces
the construct cell
, which is a matrix of matrices. Unfortunately, many
MATLAB operations result in cell
outputs. Wrangling the data out of
such construct is wasteful in terms of time. It would have been easier to leave
strings as character arrays, without marrying them to matrices or vectorization.
MATLAB functions are queer entities. Although many coders who are reliant on
STL and APIs might brush this off, it is important to abstract the structure of
data from the function. With MATLAB, formatting the matrices to fit the
function prototype is common practice. The functions itself aren't flexible
enough to be tweaked with positional or keyword arguments as required.
I have had custom pre-processing steps in my pipeline as workarounds. This is
an uncharitable design flaw for seasoned users.
Code partitioning is a drawback which has propagated through every version
of this application. It is so pervasive that MathWorks probably accepts it as a
feature. It is frustrating to see a function work only when it is saved to a file
of the same name. It is hard to define two functions in one such M-file,
making the codebase at times messy. The only respite is ability to create nested
functions. I wish the namespace feature could have been emulated from the C++
coding style, but it will break functionality across all their production code.
Messy codebase translates to harder bug-tracking.
There is a dire need to simplify the standard library by merging few obvious
candidates such as rand()
and randn()
for RNG. In its place,
a single function rnd()
with appropriate arguments or flags could be used.
The refactoring scope isn't just limited to redundancy removal. On close inspection,
we can notice that the function prototypes do not follow a single design pattern.
While some function prototypes are built mimicking POSIX standards, there are others
which do not conform to it. This is a byproduct of several work-groups with their own
design philosphies leading to a fractured design overall. Figure and data export options
need to improve as well. Currently they are restricted to .fig
and .mat
formats. It is hard to use them anywhere except MATLAB or Octave. A mildly bothersome
issue is about Long Term Support (LTS). Each version sees some modification and deprecation.
It is more common in the toolboxes rather than the main MATLAB interpreter. It seems
doubtful if software written utilizing any of the paid toolboxes might actually
work predictably after couple of years. Such an irony that the toolbox, for which
a premium amount was paid, becomes the first to fail!
Efficiency, Memory & Execution Issues
Users of MATLAB get cozy in the comfort-zone afforded by the package. We often forget
the nuts and bolts in chasing the end results. At times it is hard to understand the nature
of the bug because MathWorks hides the nitty-gritties in p-files. MATLAB is slow even by
interpreted language standards. Looping and branching are much slower than Python. It takes
a considerable time to load the base system, before any execution. It has a large RAM footprint
but a considerable chunk is used on GUI/workspace management. Core execution speeds have not
changed drastically over the last few years.
Memory management is a touchy topic with MATLAB since it sits on a large overhead of
RAM. A potential work around for UNIX-like systems is to use the --nojvm --nodisplay
flag from the command line. But that does not guarantee trouble-free execution.
MATLAB performance is also plagued with memory leaks. The software juggles through rich graphics,
dynamic workspaces and the core utilities. Garbage collection is done in MATLAB
when required, not when possible. The GUI runs sluggishly as a result
Over successive versions, MATLAB has improved the reliability issues, but there is a lot of
room to improve since persistent JVMs, poor GC and option-heavy GUIs can sour user experience.
Parallel computing option has to be included in the standard package, rather than making it
an optional add-on. MATLAB is crippled by working on a single thread on modern CPUs.
MATLAB has a polished look. Its interface and reliability has inspired open source packages
such as Jupyter. It also has a decently good documentation despite the language design issues
mentioned above. This makes it a popular teaching tool. It is ideal for someone who has
never done scientific programming and may not require much coding in general.
But since it is a closed source model, there is not much of a community contribution
except for the feature suggestions and bug-reports. Users have to keep their faith
that the implementation of the function/algorithm is bug-free in their application.
There is no way to validate it except to keep a lookout for inexplicable outputs.
MathWorks does a fairly good job of shipping bug-free codes. However, implementations
require extensive refactoring as discussed before.
Simulink: A tragedy in two acts
Simulink was developed as a blatant rip-off of National Instrument's LabVIEW. It was a product
Mathworks should have never invested in (hence Act-I of this tragic story). Looking closely,
anyone could see the huge disconnect between the two mammoth products. MATLAB's charter is to be
efficient at solving vectorized problems via a text/script interface. Simulink on the other hand
simulates real world elements - springs, resistors, diodes - graphically. One can't help but notice
the conceptual divide. Simulink is advertised aggressively to be "able to solve real world, practical
applications, including creating production systems and FPGA for embedded environments". In reality,
LabVIEW is a more pervasive solution with better parts and driver support. National Instruments
Inc. has a whole ecosystem to deliver end to end solutions. No one takes Simulink seriously (Act-II).
In production scenario, Programmable Logic controllers (PLCs) are the mainstay of automation. Neither
LabVIEW nor Simulink (or Wolfram SystemModeler) stand a chance.
Closing thoughts
For purposes of prototyping and proof-checking, MATLAB is an excellent tool. It is easy to fire
up and run toy routines to make sure we are on the right track. One should never aim to build a career
out of writing MATLAB codes. It never measures up to the reputation of a regular programming languages
because it was never meant to be one. It is good in academia for non-CS/EE majors needing solutions
computationally. Companies will never embrace this language because of efficiency issues. Do not trust
the
F-35 advertisement.