Making latex documents look great, while satisfying, is generally a painful experienceIt would probably be a lot more bearable and maybe even fun, if latex had a sane scripting language like typst..However, I have a handful of snippets that require no tweaking but immediately make your documents look more polished.

First, use the `pdfusetitle`

option with hyperref. This puts the title of your paper into the metadata of the PDF file. Then, when somebody downloads your paper from arxiv, the tab title will be the actual title of your paper instead of `2006.09011v2.pdf`

. See this paper for an exampleYou can also put other metadata into the PDF file such as the author names or the publication venue by passing `pdfauthors`

etc to `\hypersetup`

..

Next, please set `colorlinks=true`

in hyperref. It has become less common, but I used to see papers with jarring red or green boxes around any clickable element all the time. With `colorlinks=true`

, hyperref will instead color the actual text, improving readability greatly. You can also customize the colors of course with `linkcolor`

, `urlcolor`

and `citecolor`

. I can recommend picking colors from the xkcd color name survey with the `xkcdcolors`

package.

Finally, a fix for the paper outline for papers with appendix. With this I mean the outline navigation that you have by default on the lefthand side in many PDF readers such as the built-in readers in firefox and chrome. By default, your appendix sections will appear on the same level as your main sections. This makes it difficult to see at a glance what belongs to the main paper and puts more emphasis on the appendix sections than they deserve. They are the appendix, after all. I have seen quick fixes for this where authors make appendix sections subsections of an appendix “section”.

However, the `bookmark`

package lets you fix the outline without messing with the section hierarchy in your document directly. With `bookmarksetupnext`

, you can “demote” appendix sections to subsections of a virtual “Appendix” section that will only exist in the outline.

The following has all the recommended snippets combined to copy and paste.

1 | \documentclass{article} |

When I was taking classes on machine learning, I learned about SVMs and the kernel trickto get non-linear SVMs. The last slide usually said something along the lines of “you canalso kernelize other methods” without giving anymore hints as to which methods this refersto that could be rewritten in terms of inner products and thus kernelized. So it came as asurprise to me when I began reading Gaussian Processes for Machine Learning and learnedthat not only is Bayesian linear regression (BLR) one of those methods but kernelizing itgets you Gaussian Processes (GPs). The distill journal has a nice visualintroduction to GPs.

Even after Rasmussen introduced the kernelization of BLR as the weight-view of GPs, I wasstill sceptical and expected to read that this is kernel-BLR and one can generalize itfurther to get GPs. But no, kernelized BLR and GPs are actually identical. To reallyconvince myself of this fact, I filled in the details in the kernelization process thatRasmussen omits.

In Bayesian linear regression we have a point $\vx \in \R^{D}$ and a noisy observation $y\in \R$ and we assume that the observation was generated by the model $y = \vw\T \vx +\epsilon$. Here $\epsilon \sim \N(0, \sigma^2)$ is independent Gaussian noise and $\vw \sim \N(\vzero,\mSigma_p)$ is a latent variable that describes the linear relationship between $\vx$and $y$. Making a prediction $\hat{y}$ at a new point $\hat{\vx}$ given $n$known data points $\mathcal{D} = (\mX, \vy) \in \R^{D \times n} \times \R^n$ in the Bayesiansetting means to marginalize over the latent variables, i.e.

\[\def\p{\mathrm{p}}\p(\hat{y} \mid \hat{\vx}, \mathcal{D}) = \int \p(\hat{y} \mid \hat{\vx}, \vw)\, \p(\vw \mid \mathcal{D}) \,\mathrm{d}\vw.\]

Some quite laborious transformations (or a high-level argument) will get you the resultthat $\hat{y}$ follows another Gaussian distribution

\[\hat{y} \sim \N\big( \underbrace{\sigma^{-2}\hat{\vx}\T\mA\inv\mX\vy}_{\hat{\mu}}, \underbrace{\sigma^2 + \hat{\vx}\T\mA\inv\hat{\vx}}_{\hat{\sigma}} \big)\]

where$\mA = \sigma^{-2}\mX\mX\T + \mSigma_{p}\inv$.

To kernelize the predictive distribution and thus BLR, we have to rewrite both$\hat{\mu}$ and $\hat{\sigma}$ using only inner products between data points. Let’s beginwith the mean.

\[\hat{\mu} = \sigma^{-2}\hat{\vx}\T\mA\inv\mX\vy = \sigma^{-2}\hat{\vx}\T \left(\sigma^{-2}\mX\mX\T + \mSigma_{p}\inv\right) \inv\mX\vy\]

Now we pull out $\mSigma_p$ using $(\mA + \mB\inv)\inv = \mB(\mA\mB +\mI)\inv$.

\[\hat{\mu} = \sigma^{-2}\hat{\vx}\T \mSigma_{p}\left(\sigma^{-2}\mX\mX\T\mSigma_{p} + \mI\right) \inv\mX\vy\]

Next, we move $\mX$ to form inner products using the push-through identity $(\mI +\mA\mB)\inv\mA = \mA(\mI + \mB\mA)\inv$ and push $\sigma$ inside.

\[\hat{\mu} = \hat{\vx}\T \mSigma_{p}\mX \left(\mX\T\mSigma_{p}\mX + \sigma^{2}\mI\right) \inv\vy\]

At this point we can introduce the kernel $k(\vx, \vy) = \langle \phi(\vx), \phi(\vy)\rangle$ with the feature mapping $\phi(\vx) = \mL\T\vx$ where $\mL\mL\T = \mSigma_{p}$is the Cholesky decomposition of $\mSigma_{p}$. Expanding the kernel, we see that $k(\vx,\vy) = (\mL\T\vx)\T(\mL\T\vy) = \vx\mL\mL\T\vy = \vx\mSigma_{p}\vy$ and get the finalkernelized mean

\[\hat{\mu} = k(\hat{\vx}, \mX) (k(\mX, \mX) + \sigma^2\mI)\inv \vy.\]

Kernelizing the variance begins in the same way with pulling out $\mSigma_{p}$.

\[\begin{aligned}\hat{\sigma} & = \sigma^2 + \hat{\vx}\T \left( \mSigma_{p}\inv + \sigma^{-2}\mX\mX\T \right) \inv\hat{\vx}\\& = \sigma^2 + \hat{\vx}\T\mSigma_{p} \left( \mI + \sigma^{-2}\mX\mX\T\mSigma_{p} \right) \inv\hat{\vx}\end{aligned}\]

This looks quite similar to what we had before but we are missing an $\mX$ on theright side that prevents us from forming inner products. Instead we need to invoke theWoodbury matrix identity

\[(\mA + \mU\mC\mV)\inv = \mA\inv - \mA\inv\mU(\mC\inv + \mV\mA\inv\mU)\inv\mV\mA\inv\]

with $\mA = \mI$, $\mU = \mX$,$\mC = \sigma^{-2}\mI$ and$\mV = \mX\T\mSigma_{p}$.

\[\begin{aligned}\hat{\sigma} & = \sigma^2 + \hat{\vx}\T\mSigma_{p} \left( \mI - \mX \left( \sigma^2\mI + \mX\T\mSigma_{p}\mX \right)\inv \mX\T\mSigma_{p} \right) \hat{\vx}\\& = \sigma^2 + \hat{\vx}\T\mSigma_{p}\hat{\vx} - \hat{\vx}\T\mSigma_{p}\mX \left( \sigma^2\mI + \mX\T\mSigma_{p}\mX \right)\inv \mX\T\mSigma_{p}\hat{\vx}\end{aligned}\]

At this point, the expression admits the same kernel formulation as the mean did beforeand we get

\[\hat{\sigma} = \sigma^2 + k(\hat{\vx}, \hat{\vx}) - k(\hat{\vx}, \mX) \left( \sigma^2\mI + k(\mX, \mX) \right)\inv k(\mX, \hat{\vx}).\]

Compare the equations for $\hat{\mu}$ and $\hat{\sigma}$ with equations (2.22)-(2.24) inGPML for GP prediction and you will find them equivalent except for additional predictionnoise on our part.

It is known that GPs are equivalent to deep neuralnetworks in the infinite-width limitApparently, GPs are also equivalent to spline smoothing but that looks to be more of atheoretical result. . Speaking informally, this shows that “Bayesianization” and kernelizing cangeneralize linear regression into something as powerful as deep learning. Goes to showthat I underestimated the increase in “power” that kernelizing a method brings.

]]>In addition to people in my group I also have access to advice from people all over theworld thanks to the internet. Here, I will collect my various bookmarks and list thepoints that resonate the most with me.

marlow41 listed some habits onredditthat they would adopt from the beginning if they were to begin their PhD today whichmostly agrees with SaraBilley. First,research should always take priority over other activities. You might have other dutiessuch as teaching or talks but you should try to make some actual research happen everyday. Their tool of choice is a journal to keep a list of accomplishments but also a stricttable of research time to keep yourself accountable. Second, you should always be readinga paper. The goal is to make yourself aware of methods that you might need in the future.Finally, you have to advertize your work. Give talks, talk to people, be active on theinternet.

Michael Nielsen andJohn Schulman both havewritten extensive essays on researching effectively in the field of machine learning. Mymain takeaways from Nielsen's text are the importance of developing a research strength,a field of expertise, and how he sees the creative process as two activities, the problemsolving and the problem creation. To me it seems useful to be aware that progress can comein different forms that rely on different skills.

Schulman gives another split of research into idea-driven and goal-driven, orthogonal toNielsen's. The former roughly focusses on improving existing methods with new ideaswhereas the latter is formulated in terms of a, usually applied, goal such as *make X workfor the first time*. He also emphasizes both the development of a "taste" for problemsand keeping a journal, in line with the other authors.

Mathematician Terry Tao has lots of advice onwriting some of which evengeneralizes across the boundaries of mathematics. In particular, I want to follow hisadvice on prototyping apaper.I am convinced that having a prototype paper from the beginning helps drive your thoughtprocess and it also agrees with the number 1 advicefrom Prof. SPJ.

Recently, I have begun to prototype more quickly and keep a hand-written journal ofobservations and thoughts. In the future, I need to find a way to make consistent progresson my research while also keeping up with teaching. Regarding research direction, I feelan inherent desire to do idea-driven work. It is pure and beautiful. At the same time Ibelieve that my profile and skills are much better suited to goal-driven research. So Iwill also need to figure that out.

]]>$$\sum_{i = 0}^n \sum_{\scriptstyle j, k \atop\scriptstyle \sp(i, j) < \sp(i, k)} f(i, j, k)$$

Here the inner sum iterates over all possible pairs of nodes that have a differentshortest-path distance to the node $i$. In thepaper, the authors derive the Monte-Carloapproximation

$$\sum_{i = 0}^n \mathbb{E}_{j_1, \dots, j_K \sim N_{i, 1}, \dots, N_{i, K}}\sum_{\scriptstyle 1 \le k, l \le K \atop\scriptstyle k < l} \left| N_{i, j_k} \right|\cdot \left| N_{i, j_l} \right| \cdot f(i, j_k, j_l)$$which leverages the runtime efficiency of independent, uniform sampling from node subsetsbut is a bit unwieldy as far as I am concerned. I would much rather write it as

$$\sum_{i = 0}^n \left|N_i\right| \cdot \mathbb{E}_{j, k}, f(i, j, k)$$

where $N_i = { (j, k) \mid \sp(i, j) < \sp(i, k) }$ is the set of all admissible pairsand the expected value considers a uniform distribution over $N_i$. At a first glancesampling from $N_i$ might be a problem because $|N_i|$ grows quadratically in the numberof reachable nodes from $i$. On second thought $N_i$ can be seen as the edge set of acomplete multi-partite graph of all neighbors of $i$ where the nodes are partitioned on$\sp(i, j)$. So the set actually has a simple form and uniform sampling should be possiblewithout enumerating the set.

Now let $G = (E, V)$ be a complete, undirected, $k$-partite graph with a set of nodes $V$that are partitioned into subsets $V_1, \dots, V_k$ with cardinalities $n_1, \dots, n_k$and edges

$$E = { (u, v) \mid u \in V_a, v \in V_b, a \ne b }.$$

So every node is connected to every other except the ones from its own partition. At firstI thought that uniformly sampling $u \in V_a$ from $V$ and then $v$ from $V \setminus V_a$could work because the probabilities might cancel out just right but the following exampleshows that that is not the case.

We see that edges between high-degree nodes receive too little probability weight.Therefore, a strategy that samples edges uniformly needs to assign higher probability tonodes that participate in many edges. Let $d_u$ be the out-degree of node $u$. In a graphsuch as $G$, $d_u = n - n_a$ where $n = \sum_{i = 1}^k n_i$ and $u \in V_a$. Choose thefirst node proportional to its out-degree and the second one uniformly from all nodes thatshare an edge with the first one, i.e.

$$p(u) \propto d_u \quad \textrm{and} \quad p(v \mid u) = \frac{1}{d_u}.$$

This leads to a uniform distribution over ordered pairs of nodes $(u, v)$

$$p((u, v)) = p(u) \cdot p(v \mid u) = \frac{d_u}{C} \cdot \frac{1}{d_u} = \frac{1}{C}$$

where $C$ is the normalization constant of $p(u)$. But $G$ is undirected and $(u, v)$ isthe same edge as $(v, u)$. Hence the probability of an edge $e = (u, v)$ is

$$p(e) = p((u, v)) + p((v, u)) = \frac{2}{C}.$$

Since $p(e)$ is independent of $e$, $p(e)$ must of course be the uniform distribution butjust to double-check we will compute $C$. $C = \sum_{j \in V} d_j$ is the sum of allout-degrees which would just be the number of edges in a directed version of $G$. But $G$is undirected so $C$ counts every edge twice. Therefore $C = 2|E|$ and $p(e) =\frac{1}{|E|}$.

The sampling method described in this post allowed me fine-grained control over theaccuracy-performance trade-off in my implementation of graph2gauss. It is howevergenerally applicable to any problem that can be modelled as sampling from a complete$k$-partite graph.

]]>`suspend.target`

. However, for security reasons user services are not notified ofsuch system targets. My way around this is a system unit `user-suspend@.service`

thatforwards any of the interesting targets to my user’s systemd instance.1 | [Unit] |

Then enable it for your user’s id with `sudo systemctl enable user-suspend@$(id -u).service`

and create the following proxy target in`~/.config/systemd/user/suspend.target`

.

1 | [Unit] |

Now you can use `suspend.target`

in your user services the same as in system units.

1 | [Unit] |

With this premise I put my favorite search engine to work and found loads of options butdecided quickly for Nikola, mainly to avoid analysis paralysis. Its first big plus is thatit supports restructuredText out of the boxwhich I wanted to learn for a long time. It always seemed superior to markdown with itsextensible roles and directives, for example the `math`

directive for LaTeX formulas.Speaking of which Nikola has built-in support for rendering LaTeX math in posts withMathJax which was actually also one of my exclusion criteriain the first place. However, I quickly replaced MathJax withKaTeX since it is noticeably faster – read jerking pagemotions as each equation is rendered versus subsecond rendering of the whole page – and Ibelieve that its reduced instruction set will suffice for me. Finally, I am thrilled toput jupyter notebooks directly on my page at some point.

The main disadvantage in comparison to Jekyll that I discovered is that theming is harder.Jekyll is pretty bare-bones in that regard and you necessarily have to do everythingyourself which has the upside of giving you full control from the beginning. Nikola on theother hand comes with core- and community-developed themes with a certain structure. Thiscan get you going fast but as soon as you want to customize your page, you find yourselfwriting your own theme. The process is only barely documented and in the end I copied andmodified the `lanyon`

theme which was ironically ported from Jekyll, among other thingstranslating it from Mako to Jinja2 that I already knewfrom ansible. Additionally I prefer its brace syntax over Mako's XML language.

My most invasive change to the `lanyon`

theme is the listing of categories and tags in thesidebar. The issue is that Nikola mostly consists of plugins, including a plugin forrendering a categories and tags page. This has the side effect that the list of categoriesand tags is only available on that particular page. So right after modifying my firsttheme I had to jump into Nikola's (admittedly somewhat messy) codebase to find out how Icould make this work. In the end I wrote the following mini-plugin to make the categoriesand tags available on every page.

1 | from nikola.plugin_categories import ConfigPlugin |

You are interested in switching from your preinstalled and probably bloated androidversion to CyanogenMod. Sadly this is not as easy as youwould want it to be when you have to gather every piece of this multi-step process fromdifferent blog and forum posts. In the end it turns out to consist of

- Flashing a custom recovery mode
- Installing CyanogenMod
- Optionally installing the Google Play Store

First and foremost you should download a version of CyanogenMod you want to install aswell as an extended recovery mode that allows you to install non-verified ROMs. There arevarious options to choose from though I picked Cyanogen Recovery since it is maintained bythe same people that develop CyanogenMod. Both files are available on the CyanogenModdownload page. It is important that you choose theversion specifically made for your device.

Furthermore you will probably want to have access to the Play Store. You can get GoogleApps distributions of various extent at OpenGApps. For the S4 youhave to select the ARM platform, the android version that you want to install, 5.1 as ofthe 27th of December, 2015. This should leave you with the following files, thoughpossibly more current versions.

1 | $ ls |

To prepare for the actual installation you should now transfer the OpenGApps and thenon-recovery CM files to your SD-card. You will later have to locate these with arudimentary file explorer, so better put them in the root directory.

Following these preparations, you have to flash an extended recovery mode onto your mobilephone because the provided one does not allow the installation of non-verified ROMs. Forthis I used heimdall, a reverse-engineeredimplementation of a Samsung-internal USB protocol for low-level control of androiddevices. Most importantly it lets you inspect your phones partition table and flash imagesonto them.

heimdall communicates with your phone in download mode, a special mode of operation. Youenter it by powering off your phone and then restarting with `Volume Down + Home + Power`

while only releasing the last one when the phone vibrates. If it worked, you should see anandroid logo and a confirmation question that asks if you really want to enter downloadmode (Yes, you want to). Then connect it to your computer via USB.

As a precaution you should first check your partition information table (PIT) and look forthe recovery partition, i.e. the one that stores the recovery mode. In my case it wascalled `RECOVERY`

but I have seen posts where it was called `SOS`

. It is very importantthat you pick the right one. A mistake here can leave your phone unbootable.

1 | $ heimdall print-pit --no-reboot |

Once you have identified the recovery partition, use heimdall to flash Cyanogen recovery.`--RECOVERY`

is `--`

followed by the `Partition Name`

of your recovery partition.`--resume`

makes it reuse the existing download mode session.

1 | heimdall flash --resume \ |

Finally we get to the actual installation. Start your phone in *recovery mode* this timewith `Volume Up + Home + Power`

again keeping the first two pressed down. In the recoverymenu select `Install from ZIP`

and navigate to the ZIP-files you transferred to theSD-card during preparation. You can install them by clicking on them with the powerbutton. Remember though that the order is important, CM first, then OpenGApps andultimately power off and start normally.

If you made it to this point, you voided your warranty, but installed a mobile operatingsystem without bloatware and some nice privacy built-ins.

]]>Write your paper right from the beginning. It will drive your thought process and let yousee more clearly, which parts need further refinement. The paper in progress also servesanother way of communicating with fellow students, researchers or your advisor.

You also should not be intimidated by great work, that others produce. Most ideas startsmall and only become great once you explore the details.

The paper should present exactly one key idea. You do not need to know, which your keyidea is, in the beginning, but when you finish, it has to be very clear. Make certain,that absolutely everyone understands, what this main contribution is. Your readers shouldnot have to ask themselves, what is actually novel about your approach. If you find, thatyou actually developed multiple ideas, split your paper and write one each. This lets youfocus on each idea and makes the final result more accessible to your readers.

Making people read your papers is hard business. They should be accessible and engaging.Imagine yourself explaining your current project to a colleague.

*Explain the problem.*You should start with specific instances, that are easilycomprehensible, before introducing the problem in its full generality*Why is it interesting?**It is unsolved.*Why are existing solutions not applicable to this specific problem?*Give an overview of your idea.*Start with the big picture, so that your readers know,that they are getting themselves into*Explain the idea in full detail.*Here come formalisms, proofs, benchmarks and resultson real-world data*Compare your idea to other people's approaches*

This narrative flow manifests in SPJ's typical outline

**Title**(1000 readers)**Abstract**(4 sentences, 100 readers)**Introduction**(1 page, 100 readers)**The problem**(1 page, 10 readers)**My idea**(2 pages, 10 readers)**The details**(5 pages, 3 readers)**Related work**(1-2 pages, 10 readers)**Conclusions and further work**(0.5 pages)

You can see this structure applied in his papers, for example one on advanced patternmatching.

Do not be too ambitious, when describing your problem in the introduction. Starting withthe general description is not accessible. Instead give specific examples and thenestablish, what your paper will contribute. These contributions are claims, that you haveto fulfill later. The reader will wonder, how you think to accomplish these things, and beinterested in reading on. SPJ recommends enumerating your contributions in a bullet listto highlight them. Every list item should contain forward references to the relevantplaces in your paper, so that your list of contributions is an implicit table of contents.Your claims should also be specific and refutable. In physics a hypothesis, that is notrefutable, is not worth much. It is the same with your claims.

Put related work at the end. If you put it first, it will form a barrier between theintroduction, where your readers are coming from reading top to bottom, and your actualidea. Also you do not have introduced terminology and formalisms yet nor have youexplained your idea. This makes meaningful discussion and comparisons difficult.

Make your paper an interesting read. Your introduction should not scare off your readerswith the most general description of your results. Instead guide your readers through with examples as you would in an actual conversation.

Once you get to the details, present your solution in a straight forward way. Do not diveto deeply into blind alleys that you had to explore, only to find out that they did notlead anywhere. This is exhausting for the average reader and will turn them away.

Present early versions of your paper to friends and ask them to review it. This will helpyou see, where readers could get lost. You should instruct them very carefully though,that this is the information you are looking for. Otherwise they might not tell you out ofembarrassment. Then try to guide them through the critical sections and listen toyourself, how you help them understand. Then put these explanations in your paper.

Occasionally it may be helpful to get expert feedback. Of course you could ask youradvisor, but you could also send a draft to authors of related work.

]]>