I want to try doing something different for Friday’s where I will reflect by posting my Interpretation and Imagination from the content of the past week. It has been shown that self-play is a powerful tool for learning, and this is me leveraging the advantages I still have over computers by trying to imagine what all of this may lead to.
One repeated theme I’ve seen in my research is the success of a architecture depends on its position in relation to the fundamental structure of the data. The transformer is successful because it is better positioned to “see” long term dependencies. I believe any successor will be even better at “seeing” even longer dependencies, or collating relevant information at relevant points. This is fundamental to language. It is a channel for connecting ideas to other ideas, however remote.
Another fundamental property to “data” is the sequence aspect. Our brains can only process in a sequential fashion. Time is a fundamental property of our universe, so it will be woven into the structure of everything we care about, and all structures we create, namely language, will be forced to comply to this fundamental force of time. I anticipate the models to succeed are the models positioned to manage these fundamental forces like sequential-ness efficiently.
Language is such a powerful and fundamental aspect of being human, and though hard as it is, the possibility is great in creating positive good for humanity. There is an unbelievable pool of content that text has, potentially, greater impact than other data we have. The PB of video data we consume is certainly massive, but it does not have the personal touch of text, that was formed by the mind of individuals. You can not grab a camera and go capture useless text, all text was written in an effort to communicate. Thus, it is exciting to be able to pursue algorithms that can start to understand that communication.
One thing I’ve been thinking about recently is the possibility of NLP tools being used to help write better. Grammerly is a great example of what I’m thinking about. It is quite intelligent to capture definitive mistakes and gently notify you of better ways of writing. However, I am more considering the task of augmenting the text that you do write to aggressively present a rewriting of your own text for evaluation.
It is orders of magnitude easier to recognize great art than it is to generate it. I believe that these language models are getting to the point that they are able to write comprehensible text, at the expense of simultaneously generating less than beautiful text. I don’t see a straightforward path to an intelligent agent that is able to write as cohesively as humans, because much of our communication is based on our experience, not a statistical representation of the words we read before. But this is quite wonderful, because it should offer a way for each human to have an expert editor or assistant that you can call on to do tasks for you. You let them do the work, and spend your time collaborating with the tool, than fully releasing the task to it.
I have not worked closely enough with these language models to have a great intuition of what they are currently able to do, but I do see how they are starting to affect my search engine results, and I see the collaboration I currently have with my search engine will only grow as more and more technologies catch up to the execution at Google.
I have read multiple times this week how benchmarks interplay with progress. A beautiful thing about Imagenet is the quantitative-ness you can put to a model. It allows the scientific method to be used by all, that is to be sure, but I think it influences us more than that. There is a level of competitiveness that is innate to the human condition that causes us to work simply to best another man (see Ecc 4:4). The benchmark also provides clear winners: either the results are SOTA or not. There is little gray area. Though benchmarks are not the silver bullet to progress, I see the good of them.