Yes, you can - but you'll be pretty limited. If you're a brick and mortar or service focused business, a website builder is great. If you rely deeply on a customized web experience, you need to do something custom.
Same with data science. You can get pretty far with some simple data analysis tool. If you need to go farther, then you need to build custom solutions.
Most of the things someone does in Photoshop don't have to be redone repeatedly. For system administration or I'd guess data science, a lot of things need to be redone regularly. Using the command is good for both doing that and getting in the mindset of doing that.
It took me 5 minutes to produce a working proof of concept with an approximate workflow...
But it took me 2 more hours to figure out how to change numericals parameters to precisely set the initial selection box instead of defining it imprecisely by hand. The most frustrating part being that the GUI displayed perfectly theses parameters but offer no f..ing way to edit them. Export/re-import the macro to plain text for edition was not an easy option because of proprietary binary format.
GUI are nice but when it obfuscate scripting capabilities it’s just another way to bind you to a plateform by making you learn plateform specific skills to work around limitations instead of learning universal coding principles.
There are good GUI around however. Just have a look at QGIS project for instance, it can be used purely as a GUI but offer a lot of opportunities to input custom formulas when needed for small adjustment. Heavy scripting extensibility is also possible but more hidden from basic user.
(NB: for Photoshop macro, I wasn’t using the latest CC release so I don’t know if it’s still the case)
The command line allows these two activities to be closer to one activity and so when you base your skill set on using the command line, you get both things and get easy switch between them. It seem clear that for the automating of little tasks, this is kind of necessary.
And when you need to you can run the application without GUI for heavy scripting. But still the two approach are fully compatible so you can easily define a layout with the GUI and reuse it via CLI for instance.
Maybe it’s common and I’m just a goof but this software workflow really impress me.
Those requirements will tend to drive you towards a CLI solution as the easiest/best available, but if you could get those requirements satisfied in any other programmatic way, then I think you'd probably be okay.
But then the problem becomes, can you do programming from the GUI? I think that's actually a lot harder to do.
Theoretically you can do these some other way. In practice, the CLI is the only way that's remained. The thing with the CLI is it is quite easy to a new component to it whereas someone creating a little app for a little task as a GUI tends to create a "cul-de-sac", stovepipe, a program with not relation to any other program.
So third approach would be great but it doesn't seem to be getting any closer.
The thing is that all its menu commands are stored in history, in ".do" file that you can send to anyone to reproduce your steps.
You can also just use the CL "shortcut" to address these same commands. It's pretty nifty.
I fundamentally disagree with that. I work mainly we geo-data analysis and use programming, command line tools and GUI tools. And honestly setting up a data processing pipe line in a GUI like FME is much easier and more reproducible out of the box than whatever happens to be left over after I've been screwing around with a bunch of random command line tools for a couple of hours. The main thing you lose is some flexibility.
I’ve been burned too often by the GUI first, top-down approach, where the software architecture evolves from the GUI. V1 gets shipped, then the UX designer gets bored and v2 has to have a totally different look and feel and you’re screwed because your software design is permanently tied to the original UI.
GUIs are a way to build an AST.
You can add, remove, transform those changes, and track that history. Music software, Unreal Engine's blueprints, and other node-based programming environments have been doing this, in some cases, for decades.
Saying that 'you can't do data science is a GUI' explains more about the speakers lack of understanding of GUIs than it does about their knowledge of data science.
Granted, since Photoshop itself is closed source (and on a subscription model) there's some very strong limits to scientific replication of a process.
But one could do something similar with gimp, additionally aided by python scripts.
So yeah, Photoshop bad; cli good isn't as clever as all that as a blanket statement (not implying anyone said exactly that; just making an observation).
I see the talk/article is about "data" science ; but the headline reminded me about an Alan Kay talk about teaching - where there's a clip of kids filming a fallen object and then juxtapositioning the video with a rendered sequence based on (v=at etc): whole video worth watching, but see the "16:17" link in the transcript ("Now, what if we want to look at this more closely?"):
I have been working with these for a side project and the documentation isn’t great, but I have been able to get up and running relatively quickly.
'save report settings'
It should be said that the people who want to put this tool in place are the least likely to use any sort of flow chart when putting out requirements. In fact I would say that the people who are most likely to buy this are also the most likely to use pantomime and postit notes to convey requirements.
It's so true what you say about how this category of applications are often presented as revolutionary and prove to be limited, bloated, difficult to debug.. I'm thinking web page builders especially, but also various attempts at graphical programming. At the same time, there are in history some (more or less) successful examples, like HyperCard.
I could post more but I would have to fire-up my old spreadsheet ;)
One issue with excel is that many of the built in functions and statistical measures are implemented in numerically naive ways (and presumably remain so for reasons of computation speed and backwards compatibility) so if you want to do robust analysis you have to avoid them entirely - at which point you are far better off with a language designed for this. This is particularly an issue with larger data sets, where accumulation errors can become acute. Excel also introduces additional error terms due to binary encoding.
By the way: it is misleading to think of "rounding error problems". Far better to think about it as "rounding properties"/"truncation properties" and the like, then realize that you can't (in general) write floating point operations as if they were utilizing real numbers and expect correct behavior. That doesn't mean correct behavior is not achievable.
It straddles this incredible balance between completely free-form input and structured data enabling very powerful functionality.
Here is a good paper from Alan Kay that relates:
Perhaps there is a systematic undervaluing of more polished, robust custom solutions by excel users across the world. That could be the case.
But there is also probably an under-supply of adequate custom solutions. I have seen comments over the years on HN from people who have had much success in consultanting gigs where they simply built custom tools to replace ad-hoc workflows and processes living in places like Excel.
In the long term, correctness may become more valuable. An analogy: when factories first switched to electric power, they simply connected electric motors to the existing driveshafts used to distribute steam power around the factory, and only realised small improvements in productivity that way. But once factories were more fully converted to electric power, it became possible to rearrange machines to suit workflows (rather than being arranged around the driveshafts) and this lead to much bigger productivity gains.
That's not true. People just don't know any better. Look around you and you'll see many people using the wrong tools. It doesn't mean they've made a rational decision to use those, it usually means they're not aware of a better alternative.
1. SQL queries are Smart Playlists.
2. Summary statistics are shown on the bottom, and apply to the selection (if there is one), or the playlist (if nothing is selected).
3. It's object-oriented data. This is a valid AppleScript command:
tell application "iTunes" to return name of first track whose artist contains "Blink 182"
4. A browser view along the top to quickly see Genre, Artist, Album.
5. Nested playlist folders, including smart playlists.
6. Support for other data types. I wrote a script to generate m3u files to add virtual "Radio stations" to iTunes. Clicking those triggers a PHP script that usually opens my browser with a URL, but could technically do anything else. (OK, this is a hack - I'll document it if anyone wants to know).
Excel is fine, but it doesn't have the hierarchical structure that iTunes is so good at. It also mixes the data and the instructions.
Yes and Excel is horrible for Data Science. You have hidden data and to be honest almost every complex spreadsheet has at least 2 errors in it. https://www.forbes.com/sites/salesforce/2014/09/13/sorry-spr...
The GUI comes with a whole host of built-in assumptions about the data and what it could tell you, and you're probably not smart enough to know what those are and how to correct for them.
You might as well just search for your keys at the base of a streetlight.
Also, Knime and similar apps show a workflow way easier to understand at first sight. Specially for people with low programming skills.
Not to mention that you don't have to care about the environment and quirks of the packages and so on, although I recognize this is not really a problem in R, but I had with Python from my noob perspective of programming.
I do pretty complicated stuff with Knime, PostgreSQL and Apache Nifi. I make workflows almost on the fly, snap many operations to them, etc. I'd love to see this guys doing something similar in the same timeframe I do.
Well made GUIs like Knime has, make your life easier, this kind of articles sound a bit snob tbh. This statements are true for developing websites and many other fields, not really for treating data. It doesn't make much difference given the current tools.
Edit: I see people arguing Excel is fantastic. It's true, it's amazing but it doesn't keep track of what you do with the data. It's difficult to understand what's going on if there's something even a bit complicated, not to mention if people just copies and pastes stuff everywhere, which is common.
Sorry for my poor english.
The main problem I have with it is in many cases the upstream data has to be available to a node in order to be allowed to open the node up and see the configuration. This has resulted in my spending a day or two updating and tweaking database queries on an old connection and fixing the resulting configuration errors just to get to one node and read the configuration. In this case I didn't care about the data itself, I just needed the config of that one node.
This would not have been necessary with a text-based tool, where I could just scroll down and read. Knime can be a powerful tool but for sharing work I think it's unnecessarily painful.
You also have Orange from Biolab, although is way less powerful but it has some cool nodes and AFAIK it was possible to open up nodes config without connecting them to anything.
2. Radiant: https://radiant-rstats.github.io/docs/index.html
Microsoft IDEAR looks good, too:
In addition, the automatic insights generated by Power BI are another example of how GUIs can help even the hardcore command-line ninja:
That the author seems to be warning of limitations of certain solutions, but generalising those limitations to GUIs as a whole.
This is wrong.
There is nothing inherent in a GUI that would make it unsuitable for coding. Code does not equal text. Code, can be represented in many ways. An AST is code. Text, in a certain syntax, would represent the same code. And so can a GUI.
Now, is there a GUI for general-purpose programming that I'd want to use today for production work? No.
Will there be one in the future? I believe so.
But people discounting coding GUIs left and right, just because they haven't seen a good example of it yet, only discourages others to explore it further. It's a self-fulfilling prophecy, to some degree.
Anyway, here is a text/GUI-based programming environment (single data / multiple representations) that you might want to play with: luna-lang.org
And there are good examples from the past, too. History is important and is often ignored by the wider programming community. Personal computing was more or less invented in GUI programming systems. There are environments and architectures from the 1970s that are highly relevant today but that are ignored or downplayed in the mainstream. Such an attitude will stifle progress for sure.
It's not just because I haven't seen a good example, it's because we have 50 years of bad examples. When something has been tried and failed by so many people for such a length of time you have to at least consider the possibility that the idea is just a fundamentally bad one.
I think I'm more likely to see flying cars in my life time then a decent GUI for general purpose programming. The problem with both is that they are fundamentally flawed.
We (programmers) are notoriously bad at advancing the tools in our field.
For a brief history of this, watch "The Future of Programming" talk by Bret Victor:
I think the tools of our trade have advanced tremendously. Visual Studio for example is an amazing experience for C# programmers, one that most languages don't have. And this is in text tools.
Programmers know that text is the most powerful and flexible, which is why we advance those tools that help in working with text.
GUI tools are good for people who only want to do something every once in a while. Something they don't need to repeat. Where there's a simple recipe for it. And, yes, programmers don't do that much to advance these, because they have no use for them themselves.
He talks about code being linear lines of text as though that's a bad thing. We've pretty much been stuck with this as state of the art in our writing systems for thousands of years, what would be your reaction if I suggested everyone should watch videos instead of read books? It's a flexible and easy way to represent a program that no other tool has come close to.
> We (programmers) are notoriously bad at advancing the tools in our field.
We've been trying to automate ourselves out of jobs for the entirety of the history of the industry yet programmers are in more demand than ever. Everyone wants to work on interesting problems and creating inner platforms is far more interesting than writing boring business logic. Yet for all our efforts we've barely progressed since the 70's, why do you think that is?
Videos are just another useful tool for learning; they don't obviate the need for books, but they're better at conveying some ideas/information than books alone.
Just like videos and books aren't mutually-exclusive tools for learning, graphical tools and textfiles aren't mutually-exclusive tools for building programs.
There's no reason that your wizard dialogue couldn't be exactly as expressive as SQL. In principle you could make a wizard that just built an arbitrary SQL query and ran it.
I said "nearly" before, the reason is graphical programming languages, for example unreal engine's blueprints. These are a "gui", and allow general purpose programming. One could imagine a tool with a similar style of programming, that extends them with inline data visualization and other tools, that would both undoubtedly be a GUI, and have all the nice features of programming.
I think the better compromise is probably something like jupyter notebooks, but that doesn't mean a GUI couldn't do it. And maybe a better GUI exists that I just haven't managed to imagine.
That's the advantage of the GUI.
In that case, IDEs would also be GUIs.
Elitist rhetoric and snobbery
It's true that tracking data and data provenance is very important and hard in GUI tools.
But it's not really that much easier in code either.
To be specific about the kinds of problems here, I'm thinking of things like when errors are found in a dataset, new labels are introduced, or you want multiple splits on the data, but you still want the old version to check your metrics against.
Yes, you can do things like versioned directories for different data versions (although this tend to break when you are talking about TBs of data).
Or you can try using traditional version control tools, but that involves switching between code and your version control tool.
Or you can try transformation orientated programming, where you keep the original version of the data and then always transform it to get to the new version. This is slow on large data and fails when new information is introduced.
Also, normal version control doesn't work well with modelling code, because you want to use both the old and new versions of the code simultaneously.
Greg Brockman talked about this exact problem in the OpenAI/Ycombinator podcast.
This is a hard problem to solve - not sure who is working on it.
This appeals to mangers because it’s hard to recruit people who know R and SAS.
Some of these tools are better than others, but none let you do everything you can do in R.
I think the point I was making is that “what is a GUI” is not objective. And its colloquial definition may also evolve over time. The fundamental question right now is, is text graphical? Some would say no. I’d say yes. What about code highlighting? Code hinting? The buttons on your text editor? aren’t those graphical? Of course they are. I’d argue that they are the best GUI for crafting custom computer instructions.
And IDE would straddle both worlds.
1. Making a video game of a particular type (racing game, shoot em up, 3D shooter etc)
2. Do an analysis of a particular type (regression, ARMA analysis)
3. Make a web page of a particular type (landing page, agency page, personal blog)
What you cannot do is use a GUI to do a new kind of thing that the maker of the GUI had not considered.
And that is why you can't use it to do any kind of "science" which involves experimenting to see if this new technique will work.
You can, of course, do science with graphical tools.
For instance, excel is a "graphical tool", and a lot of scientists use it.
But you can't do science on the field the tool is solving.
So you can use excel for biology, but data science is trying to trying to experiment on the data technique itself.
Theoretically speaking, of course, all Turing-complete languages are equivalent. You merely need a graphical environment that is expressive enough, i.e. one that can do conditional branching and store items in memory.
Turing-completeness would be a measure of computational equivalence, but that -- and I don't know if this is the right way to phrase it -- that is not the same as "presentational" equivalence.
A modern computer program does computations, but also displays them a certain way and TM's are silent on how you display the results of the computation. (To drive the point home, under no circumstances could Turing's ticker tape machine ever light up a single pixel on a screen.)
So, if you had an infinitely fast actual Turing Machine with infinite memory, you could tell me where the video game should go, and how the web page should render and what the regression visualization would look like, but not actually show it to me.
Ordinarily this is a pedantic argument to make but in the case of a GUI it is not. The GUI has to actually encode literally an infinite number of options to display the data in addition to computing it. This is particularly important for video games for instance.
So, I think you could have a theoretically complete Turing Machine and still not do the things that a modern computing language does.
Besides just data investigation using the GUI, all visualization workflows can be automated either through their python wrapper, or directly through the C++ API. The fact that you can automate the slicing/dicing and post-processing of CFD results on a server remotely using Python still blows my mind.
I wouldn't normally associate CFD with data science, but in a way, analyzing CFD results is starting to require the kind of scale of big data, and can certainly be done with the help of a GUI.
Further, as others have commented, exploratory data analysis is much easier in an environment like JMP than it is in R - when you're just playing around with the data and trying to get a sense of it, it's much easier to make quick approximate graphs in a GUI than in a command line.
Code, CLI gives data scientists infinite flexibility but setup, management etc. is a challenge. GUI provides very less flexibility but you will have output fast - works for simpler problems IMHO.
GUI or not data science has to move to cloud-based tools. Whether you write code in browser or CLI on local machine is matter of choice.
This is a pretty odd statement. Care to explain why?
Cloud provides the flexibility of choosing the hardware, many open source projects allow you to manage your dependencies and setup better. From 0 to something, cloud is better than custom.
Also beefy machine works for training jobs. But we need to deploy the models too.
Here's a version of the video with better sound.
* R packages with low use tend to have highly variable quality. YMMV
Its dominating many areas of data analysis, yet the terms of service prohibit any scripting or API access to the formation of the XML that compose the .twb workbooks. The user is limited only to GUI clicking, none of which can be stored.
So I think its fair to say "You Can't Do Data Science in Tableau," even though its a useful tool and well implemented as a reporting tool. To me it seems a bit of a trap to build proficiency with Tableau and get locked into a closed enterprise product, I'm seeing a lot of data engineers at my company getting sucked into what is essentially a "tableau developer" role.
There is often discussion about how to have more women in field X. Opening up field X to non-programmers is one way to do so.
You see with almost any task. I see it when working with IDEs e.g. You got complex project management and there is always something you need to do which the designers of the IDE never had in mind.
However GUIs make a lot of things easier to do, so I think the ultimate solution is always something that allows you to mix and match GUI and programming/plain text solutions.
I think the problem today is that we make these GUI behemoth tools, when it would have been better with a collection of much smaller tools with more dedicated usage.
Surely you aren't serious. Data manipulation is one of the areas in which R is vastly superior to Python.
I think we just need to be realistic, if your job is to collate sales results every month and create a presentation about something interesting the data, your aren't really a scientist and you can probably just use Excel.
Now imagine an alternate universe where there are no tools like photoshop/illustrator invented (No GUI or mouse based operating environments), we would have still created art through command line. Perhaps it would be a sophisticated version of SVG that mostly people with development skills would be producing with the input of designers who would be occasionally checking the design and giving inputs. Now this process have decades of tooling to make things better, like the same arguments we have on repeatability through various tools like macros etc. Now imagine someone coming up with the idea of a basic version of photoshop, we will most probably dismiss that idea. Very few mainstream programmers would adopt such a tool (Enterprises would, as seen with RapidMiner). That doesn't mean one day that would evolve into the Photoshop that we see today and would totally eliminate development effort in producing art.
p.s. edited for typos.
One of the problems with programming languages that begin with the letter "p" is that they can be used to create viruses.
And btw - more general question - why computer scientists of the 70s (SPSS is from 1968) had much better intuitions of what user needed that computer scientist of today?
Since it summarizes a talk, we changed the URL to that of the talk, in keeping with the HN guidelines' request for original sources (https://news.ycombinator.com/newsguidelines.html).
Anyone can skim the blog and get the gist of a 75 minute talk-a-thon in seconds, and move right along.
The presentation is just beating a dead horse about text expressions edited ina text editor and/or executed at the command line, and interpretted by R, vs. button clicking in R, in a nice IDE that's comfortable and approachable for non-programmers.
E.G. le tooling debate du jour, oui oui, monsieur...
In the Q&A at the end, the very last question asked (regarding visual pipeline GUI's or something like that) was the content probably most directly related to the title.
What I think is interesting is that even though he's largely correct, in the real world the ease of use of approachable UI's, even if built on top of objectively substandard application cores, very often wins the race, at least until the next "shift" occurs. This has saved Microsoft's bacon more than once in history.
Doesn't change that you can do anything in a gui :)
IMHO, the best data science people I've dealt with have supreme knowledge of statistics and math, combined with the ability to dive deep into python, R, Tableau, viz tools, d3, whatever is needed to get the job done. They generally rely on data engineers to provide sane data, but can jump in as needed and write tools (or at least give guidance) to properly clean up the data themselves. In this context, the GUI vs. non-GUI seems kind of irrelevant.