Slide  1: Publish for Pleasure

            This talk is about how I am currently creating research
            publications and why I think the process and tools that 
            I use are absolutely genius.  
           
Slide  2: Disclaimer

            I acknowledge that not everyone will have the
            technical background or interest or freedom to indulge in
            these technologies.
           
Slide  3: Notation

            I will use the word "document" to describe the piece of work
            that I create to document my research.

            This is instead of more loaded words like report (which implies
            something less formal or worthy) or article (which implies
            something that must be published in a journal).
           
Slide  4: Three Things I Care About

            I care about not wasting my time and effort.
            I care about producing something good.
            
            I care about everyone being able to access my work
            and pass it on to others.

            I care about other people being able to repeat what
            I did and build on what I did.
            
            SIX things I care about!
           
Slide  5: Part I

            This section covers tools that I use to *create* a document.
           
Slide  6: Text Files

            This is an example of the sort of text file that I 
            write to create a document.

            EVERYTHING within the file is text.  Images are
            references to external files.

            The file consists of content plus a description of
            the structure of the content (e.g., what is a heading,
            what is normal text and what is code, ...).

            This is the same idea as a LaTeX file or a Markdown file,
            but NOT the same as a Word document.
           
Slide  7: Text Files

            I do not write the final document;  I write a text
            description of the final document, which is
            processed by various software tools to generate
            the final document.

            MANY programs can create and modify text files, so you are
            not tied to a particular piece of sofware to write your
            document AND you are not placing any burden on anyone else
            who wants to view or use your document.

            It is easy to write code to generate text, which means we
            can automate the generation of parts of a document.  This
            is the basis for tools like 'Sweave' and 'knitr' in R.

            Writing text also means that I can write exactly what I mean
            and I can write complicated things - this becomes more
            important when I layer XML structure on top ...
           
Slide  8: XML

            This is an example of the sort of XML that I write 
            to create a document.
            
            It is basically HTML, plus some tags that I made up for
            my own convenience.  It is VERY easy to transform this
            to pure HTML.  It is also very easy to transform it
            to other things (e.g., .Rhtml).
           
Slide  9: XML

            One weakness of text files is that the content can be
            unstructured, which makes it hard to process.  
            XML solves this problem by using tags
            to label the structure of the document.

            Tools like XPath and XLST take advantage of that structure
            to provide excellent processing tools.  This means
            that we can do more than just process from XML to a
            publication format (e.g., HTML).  We can also process
            to other formats, e.g., these slides come from an XML
            file that has been transformed to HTML, but the same
            XML file also gets transformed to produce speaker
            notes (and a version for printing as handouts).

            The fact that XML is an Open Standard means that there
            are lots of editors and processing tools for working
            with XML (so a low burden for me and for anyone I want
            to share or collaborate with).

            In effect, I create a document by writing code and the
            computer creates the document from my code.  BUT the
            code that I am writing is in a very simple language.

            I use XML rather than Markdown because the Markdown does not 
            give me enough control over the final HTML format and because 
            Markdown is limited to generating the final format, whereas
            XML allows for a much wider range of transformations.

            The source file for these slides is an XML document
            that is transformed to HTML for the slides and a
            .txt file for speaker notes (and a modified HTML 
            document for printing handouts).

            However, Markdown also works if XML is too much,
            especially if you have the freedom to select Markdown as
            part of your workflow and you have the freedom to make use
            of the final format that Markdown produces for you.

            LaTeX just does not process as easily as XML and it 
            only easily transforms to PDF.
           
Slide 10: XPath and XSLT

            This is an example of the XPath and XSLT code used to transform
            an XML document (to an .Rhtml file).

            The XPath bit is '//rcode', which matches ANY rcode element
            ANYWHERE in the document.

            The XSLT bit is everything else, which says, IF you get a
            match to '//rcode', start an XML comment, followed by
            'begin.rcode', followed by the content of the rcode element,
            followed by 'end.rcode', followed by the end of the XML 
            comment.

            The 'xsltproc' program can be used to apply the XSLT code
            to the XML document.
           
Slide 11: XPath and XSLT

            This is an example of the .Rhtml file that is produced by 
            processing the XML document that I wrote using XPath and XSLT.
           
Slide 12: Literate Documents

            A literate document allows code chunks to be included in the
            document so that, when the document is processed, the code
            can be run to generate some of the document content.
           
Slide 13: Literate Documents

            The process is now:  I write an XML document, then I transform
            it with XPath and XSLT to a .Rhtml document, then I transform
            it again with 'knitr' to generate an HTML document.
           
Slide 14: HTML

            This is an example of the HTML that is produced by 
            processing the .Rhtml file that was produced by 
            processing the literate XML document that I wrote.
           
Slide 15: HTML
            
            The final document that I produce is an HTML document.

            HTML does not do typesetting as well as LaTeX does,
            but you can still produce a nice-looking result 
            (like these slides), usually by making use of someone
            else's efforts with CSS.

            HTML is a great publication format because it is an
            Open Standard, so there are lots of (free) viewers
            (including web browsers), so no burden is placed
            on your audience.  Oh, and it's still text.

            Where HTML nails other options (like PDF) is that
            it is easy to produce dynamic and interactive
            effects in HTML.

            HTML is also text and (informal) XML, so inherits all of
            their nice editing and processing properties.  For
            example, suppose that the tool I used to generate the
            final HTML document (from my XML document) does not
            produce EXACTLY what I want; with HTML, I can easily tweak
            the final result with further processing (much more easily
            than I could if I had generated a PDF document as my
            final document).

            The relevance of HTML as a publication format is
            demonstrated by the fact that traditional publishers now
            offer HTML versions of articles online.

            Some web browsers now have "native" support for viewing
            PDFs, but not as part of a web page.  PDF is not a 
            web format.            
           
Slide 16: HTML

            This is an example of an HTML report with an interactive
            feature: click on the plus to toggle visibility of the code
            chunks.
           
Slide 17: SVG

            If the pubication format is HTML, then the best format for
            images is SVG.  It is vector (rather than raster), so it
            looks good at any size, it is an Open Standard so there is
            lots of software support (all browsers now have native
            support), PLUS it is XML with all the benefits previously
            mentioned about structured text.

            It is also easy to add dynamic and interactive features
            to an SVG image, which is handy if you are writing a 
            document that describes the creation of dynamic and
            interactive statistical graphics.
           
Slide 18: SVG

            This is an example of an SVG image (within an HTML
            document) with an interactive feature: drag the blue
            rectangle to scroll the window shown in the large plot.
           
Slide 19: Part I Summary

            This process is similar to the R Markdown workflow that we
            have our 20x students use (right?), EXCEPT that I am
            working at a lower level, so it is more flexible, and I
            understand more about what is happening, and I have more 
            of a focus on HTML as the final document format and SVG
            as the graphics format.
           
Slide 20: Part II

            Having created a document using efficient and effective tools,
            we now turn to the issue of disseminating the document.

            The tools in this section 
            focus on making it as efficient as possible for
            others to access and make use of my work.
           
Slide 21: Electronic distribution

            Electronic distribution implies BOTH an electronic format
            (rather than hard copy) AND availability on the world
            wide web.  

            The best way (the only sensible way) to publish a document
            is in electronic format (rather than print).  Copies
            are (virtually) free, copying is fast and *exact*,
            and we gain features like colour and interactivity.

            It is not controversial to claim that an electronic format
            is good, but how many of you are still preparing documents
            for print (e.g., PDFs in A4 page format) ?

            I enjoy writing a document with the screen as my
            main format (e.g., freely using colour and interactive
            content).

            It is possible to format HTML nicely for print, but the
            screen format is now the primary concern.

            JSS was originally aiming for HTML, but backslid to 
            LaTeX/PDF.
           
Slide 22: Electronic distribution

            Distribution of an electronic document via the web means
            that the publication can be accessed from virtually
            anywhere virtually instantly.

            Distribution on the web can be just a matter of placing
            material on a public web server, but existing search
            engines and social media can further increase the
            visibility and discoverability of the material.            

            People care about these things - the web gives them to
            us for free.

            Again, it is not crazy to suggest that putting a document
            on the web is good, but how many of you are writing for 
            the web (e.g., producing HTML documents or interactive SVGs) ?

            I enjoy writing documents for the web.
           
Slide 23: Creative Commons

            If you are conducting publicy funded research, the results
            have already been paid for.  It makes sense to provide
            the results as openly as possible.  The CC BY licence
            fits this situation perfectly.
           
Slide 24: Traditional Transfer of Copyright

            In the traditional publishing model
            the copyright is signed over to the publisher and they
            limit access in order to charge for access (business
            model).
            
            Note that the transfer of copyright is quite substantial
            and quite persistent.  My children are unlikely to live
            long enough to see my article enter the public domain.
            
            This is not good.

            I enjoy publishing documents with a CC BY licence.
           
Slide 25: Creative Commons

            Contrast the expressions used here with those used in the
            traditional publisher contract.  These words are compatible
            with sharing and unrestricted reuse.

            This is good.
           
Slide 26: DIY Publishing

            Publishing outside of a traditional journal makes it
            possible to really take advantage of the available tools.
            
            We are not restricted by journal format rules, we are
            not hindered by the slow peer review process, we are
            not restricted by journal copyright assertions.

            If we no longer need to publish through a journal, we can
            think about escaping other artificial constraints like
            bundling articles into volumes or issues.  A publication
            can be published on its own.  For example, the Journal of
            Statistical Software ONLY publishes in electronic format
            and it publishes individual articles.

            JSS and R Journal still dictate the format (PDF) and 
            still have a slow review process
            
            I withdrew an article from JSS after 2 years in the review
            process because the software had changed so much that the
            article had essentially become a bunch of lies!
           
Slide 27: DIY Publishing

            If you don't want to run your own web server, the department
            has a Technical Blog where you can easily publish a document.

            My latest publication is a technical report published on the
            department's technical blog.

            I now have authored (or co-authored) 29 publications on the
            technical blog.

            DIY publishing allows for a greater variety of publishing
            models - rather than a one-size-fits-all journal article,
            we can have shorter or longer pieces of work.

            Smaller publications allows for documenting smaller pieces
            of work, such as student projects.
            
            More than 10 of my publications on the technical blog
            are based on student projects (BScHons, Masters, or Summer
            Scholarship).  There are several more that are 
            single-authored student publications (PhD or research
            assistant).

            I could also have used something like ArXiv, though that is
            still focused on preprints of print articles.
           
Slide 28: Research Repositories

            In addition to making a document available *now*, we should
            be concerned with the document *remaining* available.

            There are independent services like figshare.  These
            promise to provide persistent storage and increase
            visibility for works.  UoA has its own figshare portal.

            UoA library has ResearchSpace.  This has the advantage
            of having a reasonable chance of existing for as long as
            UoA exists.
           
Slide 29: figshare

           
Slide 30: figshare

           
Slide 31: Research Space

           
Slide 32: Research Space

           
Slide 33: Part II Summary

           
Slide 34: Part III

            We have already talked about literate documents and tools like
            'knitr', which make it easy to create a document that 
            can be reproduced by someone other than the original author.

            This part of the talk describes some other important
            pieces that we can share.

            The focus is still on others having access to my work, but
            we have moved on to allowing others to do more than just
            *view* the work.
           
Slide 35: Supplementary Material

            EVERYTHING needed to reproduce the report is available
            online under a permissive licence.

            The open standard text formats and permissive licences
            also make it easy to create new work based on these
            materials.
           
Slide 36: Supplementary Material

            This shows the list of materials provided for 
            one of my recent publications.

            A lot of these materials have nice features like the fact
            that they are all text files.  However, even with all of 
            these resources, there is no guarantee that someone else
            has all of the tools available to work with them
            (I work on Linux).  This is where Docker comes in ...
           
Slide 37: Docker

            A Dockerfile is a text description of a computer (operating
            system and installed programs);  a sort of virtual machine.
           
Slide 38: Docker

            We can build a Docker image from a Dockerfile, then we
            can run that image (virtual machine) and run a command
            within the virtual machine.

            We can also publish the Docker image on the internet 
            (DockerHub) so that others can easily access and reuse
            the virtual machine (e.g., to reproduce the research
            document).
           
Slide 39: Docker Hub

           
Slide 40: Part III Summary

           
Slide 41: Part IV

            This section is about "recognition" in the sense of 
            avoiding confusion and ambiguity with regards to who
            wrote a document.

            There are a couple of slides at the end that briefly address 
            "recognition" in the sense of being rewarded for work 
            (with fame and professional advancementXS).
           
Slide 42: ORCID

            ORCiD provides a unique identifier for every researcher,
            so that I cannot be confused with someone else, just in
            case there is another Paul Murrell in the world who
            happens to specialise in producing slow statistical
            graphics software.
           
Slide 43: Summary

            Technology is allowing us to do more and more of the
            publishing process ourselves.
            
            DIY publishing means greater access to and freedom to
            explore these technologies.

            Another way to look at it is that these technologies 
            allow you to work (on your publication) more like
            a programmer - yet another reason for computing to
            have a greater presence in our curriculum!
           
Slide 44: Summary #2

            My *primary* concern is publishing and sharing my work.
            
            The problem of *measuring* my worth is *secondary*.

            I believe that is the correct order.

            Recently, I have been conducting small pieces of work,
            consisting of adding new features to an R package.  Within
            a very short time, I can develop the R package, 
            make it available, write a document describing the changes
            and demonstrating their use (if only as a reminder to myself
            of what I have done!), and make that document available.
            The entire research cycle can occur within the space of
            a couple of weeks.
           
Slide 45: Acknowledgements
 
Slide 46: Technology Roll of Honour
 
Slide 47: Finis
 
Slide 48: Peer review

            If you want independent peer review, there are already
            services offering that (for a price), e.g.,
            Rubriq.
           
Slide 49: Metrics

            Once it becomes easy to publish, number of publications is
            less meaningful.  The emphasis is more on quality, as 
            measured by things like citations, or other measures
            of how much your work is used and valued by others.
           
Slide 50: