Furigana, Vertical Montage, and the Video Essay

There’s a device used (primarily) in printed Japanese that I first became interested in back in the earliest days of my Adachi Mitsuru manga fandom (Touch, I liked Touch – which, yes, puts this in the 1980s). Furigana (or ‘ruby/i‘) are phonetic glosses of written kanji (Chinese characters); typically, they serve the very functional purpose of indicating how the kanji they append should be pronounced (which, when a given character can have upwards of ten+ different readings depending on context, is pretty useful).

For example, if you combine the kanji meaning ‘big’ – 大 – and ‘mountain’ – 山 – into one word, it’s often read ‘Ooyama‘ (おおやま) and used as a place name.


But in Tottori Prefecture, where I lived for a number of years, these same kanji in the same combination (also as a place name) are read ‘Daisen‘ (だいせん).


For reasons. Very, very obscure reasons.

This is where functional furigana come in really handy, since they let you know which way you should pronounce this particular combination of these two characters:

Screen Shot 2016-02-07 at 11.47.37 AM

But they have another, far more playful function – what Wikipedia describes as “punning and double-meaning,” and it was this ‘double meaning’ that I was so fascinated with back in the day. It isn’t quite the same as something like double entendre, in which one word, in a given context, bears two possible (and typically contrasting) meanings that effectively split meaning in half. Rather, this kind of ‘double meaning’ is predicated on a visual merging of two concepts in such a way that the primary word (whatever is being glossed) is inflected by the furigana.

The one that’s stuck with me for nearly 30 years is one moment in Touch, when Tatsuya


is talking to Minami

Screen Shot 2016-02-05 at 2.52.00 PM

about his much-loathed high school baseball coach.

Screen Shot 2016-02-05 at 2.54.19 PM

And in referring to him at one point, Tatsuya calls him ‘ano kantoku‘ (あの監督, or ‘that coach’), in which ‘ano‘ vaguely connotes a kind of disdain. But what’s written in the dialogue bubble is

Screen Shot 2016-02-05 at 2.57.00 PM

in which ‘oni‘ (オニ, or ‘demon’) inflects ‘kantoku‘ in such a way that there’s no mistaking  the exact flavor (‘tormentor’) of Tatsuya’s negative feelings about the man.

Another more recent example is found in the limited special Kansai dialect version of the manga Shingeki no kyojin (Attack on Titan), in which one of the marauding giants is referred to, using the familiar kanji, as a ‘titan’ (kyojin)


but glossed in furigana as ‘huge geezer’ (dekkai ossan),¹ a bit of humor arising from the stark contrast between informal Kansai speech (ending in the familiar Kansai ‘ya‘) and the deadly seriousness of the titan attack.

What I love about both of these examples, and playful furigana in general, is how utterly visual they are; neither can be said out loud with their nuances intact, because only one of these words (glossed or gloss) can actually be said at any given moment. In the case of Touch, you could say ‘ano oni kantoku‘ and it would more or less capture the written text, but it lacks the immediacy of the written version, which coalesces in a burst of complex, layered meaning in the reader’s mind when it’s read.

So what does this have to do with vertical montage?

By way of theorizing montage, Sergei Eisenstein famously used the example of ‘ideograms’ (compound kanji created from simple ‘hieroglyph’ kanji) to illustrate how two unrelated ideas might collide in a third new meaning (a ‘tertium quid’). One of the examples he gives is the combination of the ‘hieroglyphs’ 刃 (yaiba/’blade’) and 心 (kokoro/’heart’) to create 忍 (shinobu), which Eisenstein translates as “sorrow” but which is more like “endurance,” but whatever.

The point is, he discerns in this kind of ideograph the same principle of montage that he’s been theorizing to this point, wherein if you juxtapose, say, a shot of people running from bayonet-wielding Cossacks (image 1 – just people running from other people with bayonets) with a shot of cattle being slaughtered (unrelated image 2 – cattle being slaughtered), you theoretically end up with communicated meaning 3: people are being slaughtered like cattle by the Cossacks.

You can see where he’s going with the ideogram idea here, however imperfectly, and indeed this is pretty much the principle that underlies film editing to this day – that if you line up individual shots back to back, in the absence of some indication that they are not related, viewers will typically discern some kind of relationship or even meaning (political or otherwise) in their linear juxtaposition.


So, what does all of this have to do with video essays?

There are a lot of different ways to approach the video essay, but one of the more visual of these seems to me to follow this general principle of montage – film clips linearly juxtaposed in such a way that the viewer comes away with a broader intellectual understanding of what’s being advocated by the videographer.

When it works – as I think it does in Drew Morton’s piece above – you come away with a fresh appreciation of the film under consideration; this, to me, is perhaps the greatest strength of the video essay. It bypasses tedious description in favor of direct intellectual address.

But there’s another approach that I find myself increasingly intrigued with to an even greater extent than this kind of linear arrangement. Rather than a descendent of Eisensteinian intellectual montage, this approach corresponds more closely to Eisenstein’s ancillary notion of ‘vertical montage’, which equally follows more closely the principle underlying that kind of playful furigana discussed above than Eisenstein’s ideograms.

In many ways, and particularly given how Eisenstein’s own theories of the juxtaposition of sound and image changed over the years – beginning in his interest in the effects of collision and ending somewhere around the synthesis of music and image – ‘vertical montage’ seems more effective as a heuristic for thinking about in-shot juxtaposition than as a strict theory of anything (which is basically my way of saying, no, I have not done all the reading and I’m kind of making this up a bit).²

In this sense, one of the more compelling aspects of vertical montage is how it effectively creates moments of synaesthesia, in which image/sound are overlaid in such a way that the one necessarily inflects the other. Which, yeah, we know this. But what happens when we overlay images?

The joke here, of course, is dependent on exactly that collision between what’s happening visually and what’s being communicated in the subtitles; that is, in the same way that playful furigana can communicate a joke/pun/double-meaning in the dialectical collision of gloss and glossed, so too does image overlaid on image here generate an immediate burst of humor that defies simple explanation. Rather than building meaning through linear juxtaposition, it creates it in a single moment of synaesthetic meaning.

As an aside, this kind of collision is of course just as possible in image/sound juxtaposition as image/image contrast (which, indeed, is where Eisenstein’s own theorization centered). This has probably been talked about at length elsewhere, but there are two film scenes in particular that I think illustrate this to particularly nice effect.

In Apocalypse Now, we get not only the synthesis of music and image (particularly when the helicopters and bombs are being cut to the beat and image movement aligns with sound, as when the one helicopter flies across the frame at 1:58), but also its collision. When I watch this scene – every single time – I get chills because IDK I’ve always been a bit susceptible to epic fascist orchestration (see also: The East is Red). Which I find fascinating, because once you see the village – the small children and elderly villagers in particular – any sense of excitement generated by image/sound synchronicity is fully implicated what happens to those innocents. My excitement is wrong (assuming a political position of post-Vietnam opposition to American involvement there).

At the same time, this scene also constitutes a nice example of criticisms of Eisenstein’s own understanding of vertical montage, which argue that directorial intent ≠ viewer interpretation (because, pesky viewers and their own complex contexts). When I’ve shown this in class, my students – who are not me – have had a variety of reactions to this juxtaposition of Wagner and American firepower. Some have expressed a similar sort of discomfort as mine; others have expressed an unquestioned visceral thrill, and still others are largely unmoved either way, seeing Coppola’s ‘manipulation’ for what it is. So that, there’s no guarantee that what you intend to communicate will actually be communicated – to the eternal frustration of creators everywhere.

The other scene is one I’m more recently interested in. In Bryan Fuller’s television series Hannibal, the first season (of three) ends with cannibal serial killer Hannibal Lecter having successfully seen to fruition his plan of framing FBI profiler (and would-be friend) Will Graham for his own serial killings. When, at the end of the last episode of the season, he arrives to see Will at the prison where he’s incarcerated, this is how it plays out.

[wpvideo KP9y8RBB]

At first glance, there’s not much particularly noteworthy happening here; if you’re unfamiliar with the music, it simply sounds rather elegiac, effectively communicating Hannibal’s rather smug self-satisfaction at having achieved his goal. If you are familiar with the music, you know that it doesn’t actually exist independent of the Ridley Scott film Hannibal (2002), and it appears in that film in a scene where Anthony Hopkins’ Lecter is singularly moved by the music and libretto.

So that, understood in this way, with knowledge of its intertextual connotations, this is arguably a far more profound moment for Mads Mikkelsen’s Hannibal. Yes, there’s self-satisfaction, but it’s imbued with a kind of – well, not ‘divine’ because Hannibal, so we’ll say ‘infernal’ – gravitas that borrows syneasthetically from the Ridley Scott film to inflect (some) viewers’ understanding of the scene.

OR it’s a really fantastic in-joke, because well played, Hannibal.

Either way, it makes me smile every time I see it, because it provokes an immediate pang of pleasure. This scene is – for the fan – a multivalent reward; not just belonging to the show’s narrative, it creates immediate intertextual recognition and, with it, pleasure in a meaning shared with a select group of others in the know (the show’s creators inclusive) – not dissimilar to, I would argue, that rush of pleasure one gets when presented with a particularly nice example of playful furigana.

So, seriously, what does this have to do with video essays?

My current obsession with interest in Hannibal has been the impetus for, to date, some 11 fanvids and short video studies based on the show, and one of these is, I think, reasonably successful in presenting an argument-in-miniature. Although only two minutes long, the more I turn it over in my mind, the more I think that this piece does, in fact, house not just pretty pictures and music (well… ‘pretty’ – there’s blood, lots of it), but also the seeds of an idea.

The shots are taken from a scene at the end of season 2 in which a deeply divided Will arrives at Hannibal’s home to confront him at the culmination of what’s essentially been a long-term sting operation, and Hannibal, feeling deeply betrayed by Will, openly acknowledges the truth of himself as the Chesapeake Ripper in all his bloody glory.

And the thing I feel works well here is an implicit suggestion about Hannibal and genre that arises in the dialectical collision of these shots from the show with text from Charlotte Brontë’s Jane Eyre. Rather than making its case through either scholarship or linear montage, I think this piece succeeds – if it does – in the burst of awareness that comes from the juxtaposition of vertically layered image and image (and sound – the music is from the corresponding scene in Cary Fukunaga’s 2011 film, Jane Eyre); that is, in the implication that Hannibal is, in fact, a gothic romance.

I feel like I’m kind of returning to the days of Dziga Vertov and the pursuit of pure cinema when I say that, to my mind, this kind of vertical montage makes for a uniquely visual kind of video essay – one that cannot be easily replicated on paper. It bypasses classic argumentation (both scholarly and, I’d suggest, linear visual) in instantaneously provoking an idea born of the synaesthetic juxtaposition (and even collision) of images and sounds within the same shot.

Put differently, it shows more than tells, and in so doing, it invites the viewer to participate exploring the idea presented. Meaning here, for being something directly communicated (when successful) to the viewer rather than explained or otherwise laid out, becomes collaborative – a shared moment of understanding between videographer and viewer that, in theory, might be a jumping-off point for further exploration.

¹’Ossan‘ is one of those Japanese words that just doesn’t translate neatly into English. Strictly speaking, it’s a vaguely derogatory way of saying ‘middle-aged man’, but since most of our derogatory reference to age in English tend to center on the elderly (dirty old man, old man, geezer, etc.), it just doesn’t really communicate well across languages.

² See Afra, K. (2015). “‘Vertical Montage’ and Synaesthesia: Movement, Inner Synchronicity, and Music-Image Correlation in Alexander Nevsky (1938),” Music, Sound, and the Moving Image 9.1: 33-61 for a far more sophisticated discussion of the various controversies and debates surrounding vertical montage.

Leave a Reply

Your email address will not be published.